THE E V O L U T I O N

OF THE P R O T E I N S Y N T H E S I S S Y S T E M

L A Model o f a Primitive Protein Synthesis System *

HIROSHI MIZUTANI and CYRIL PONNAMPERUMA Laboratory of Chemical Evolution, Dept. of Chemistry, Universityof Maryland, CollegePark, MD 20742, U.S.A.

(Received 30 June, 1977) Abstract. An evolutionary picture of the early protein synthesis was presented in relation to the problem of the origin and the evolution of life. A model of an autocatalytic system was studied in this connection. The system in this model included a template nucleotide and two activated amino acid polymerases with or without a nucleotide polymerase. Variables defining the system were: (1) Catalytic activity of the polymerases, (2) Number of amino acid residues at the activity site of the polymerases, (3) Number of amino acid residues at the selectivity site of the polymerases, (4) Number of the polymerases, (5) Accuracy of polymerization and activity of the polymerases, (6) Number of evolutionary improvements, and (7) The probability of an occurrence of beneficial mutations. The population changes of the systems were obtained by computer calculations. The simulation results indicated that even a very small enzymic activity and specificity of the polymerases could eventually lead the system to the most accurate plotein synthesis, as far as transitions to systems with higher accuracy were allowed. The model study would encourage further quantitative investigations on catalytic activities of synthetic peptides, and on interactions between nucleotides and amino acids, and constructions of autocatalytic systems from a chemical evolutionary point of view.

1. Introduction The origin and the early evomtion o f the biological protein synthesis system must have been the most important and critical step in the history o f terrestrial life. In attempts to explain the origin of the genetic code, several models have been proposed based on the observed regularities of the code (Crick, 1968; Woese, 1965; Hartman, 1975). However, none of them provides a convincing explanation o f the universality o f the code and the cause of original assignments. Attempts to explain these by means o f stereochemical paring between codons and amino acids have been discussed (Lacey and Pruitt, 1969), but the experimental results did not fully support the idea (Saxinger and Ponnamperuma, 1974; Weber and Fox, 1973). An alternative explanation o f the universality is that the assignments were essentially determined b y chance. Eigen (1971) obtained a finite, though small, probability o f having a unique set o f assignments from a random start. Another alternative is that an ancestral organism arrived from outer space and their coding table was spread throughout the Earth (Crick and Orgel, 1973). Since it seems almost impossible to directly prove or disprove these alternatives, we feel that more effort should be directed at elucidating, b o t h experimentally and theoretically, the possibility o f

*This is a part of a dissertation of HM to be presented to the Graduate School of the University of Maryland in partial fulfilment of the requirements for the Ph.D. Origins of Life 8 (1977) 183-219. All Rights Reserved Copyright 9 1977 by D. Reidel Publishing Company, Dordrecht-Holland

184

HIROSHI MIZUTANIAND CYRIL PONNAMPERUMA

constructing a reasonably primitive genetic apparatus under prebiological conditions, and at inquiring whether the reason behind the universafity may be a result of the evolution of such an apparatus. Ishigami and Nagano (1975) recently proposed that the code might be selected from more than one code during the course of evolution; however, their model heavily relies on the nature of primitive tRNA's whose origin is not quite explained, and the basis of their arguments appears yet to be substantiated. Hoffman (1975) attempted to obtain the probability of having an autocatalytic system with a unique genetic code. His model (Hoffman, 1974), however, seems to require a considerable number of tRNA precursors, proteins (including 'adaptors'), and consequently many genes at the beginning, if it should work at all. The origin of these many components and putting them in a close proximity were explained as the result of a mere chance. The discussion appears rather futile, because it is quite obvious from the context that the probability of such an occurrence must be far less than that of having a unique code. The chances of the latter were, according to him, already very small. Here we would like to introduce a much simpler model of the origin and early evolution of the biological protein synthesis system. The essence of the model construction is that a given protein synthesis system has a certain genetic code and a replication table associated with it, giving the probabilities that a given codon will be translated into a given group of amino acids and that a given base will be replicated into a given group of bases. In addition, the rate of the translation and the replication depends on a given system. With these, the probability of the production of any given system is formulated. For instance, a system S(j) will produce a total of R(j) systems with the probability F(j) of the production of the system S(z). Any system is then related to the previous generation as follows: S(z)] =

all ]'s ~ {F(j) x S(j)i_ 1 x R(j)) ,

(l)

where the subscript i stands for the generation. Because of the lack of sufficient experimental data, our model study mainly concentrates on computer simulations to see whether the origin and evolution of the genetic code could be compatible with the concept of continuous evolution from chemical to biological. Though the model is simple, and the accuracy of the reactions may be very low initially, it will demonstrate that the system can evolve through mistranslations and misreplications, and becomes a complex system similar to the modern genetic apparatus with a comparable accuracy. The model also shows that the genetic code could have been selected from possible codes during the course of the evolution of the system. After a rough pattern of the assignment is established, fine-tuning of the assignment would follow. The origin of the genetic code might not be a mere accident, but a result of certain interactions among components of an autocatalytic system in a primitive milieu.

EVOLUTIONOF THEPROTEINSYNTHESISSYSTEM

185

2. The Model 2.1 COMPONENTS OF THE MODEL AND BASIC IDEAS OF TERMS The model presented here can be separated into two independent autocatalytic systems. One will be called Model I, and the other Model II. Model I is composed of one gene (A-gene) and two activated amino acid polymerases (O-polymerase and A-polymerase). Model II includes another gene (N-gene) and a nucelotide polymerase (N-polymerase) in addition to all components in Model I. Model I is supposed to be more primitive, and Model II may be considered to be the system evolved from Model I. A probable transition from Model I to Model II, which complements our model as a whole, is described in Part 3, Section 2.

2.1.1. Polymerase There are three different types of polymerases in the model. Among them, two have the same catalytic function of making a peptide bond between two amino acids, utilizing activated amino acids* as substrates. One type of the activated amino acid polymerases is not necessarily coded in the template polynucleotides. That is, in principle, this polymerase could be one which was made independently and entered into the system in order to trigger the evolution of an autocatalytic system. We call it O-polymerase. The other type of the activated amino acid polymerases is a product of the system, and so the sequences of the polymerases are in certain ways coded in the template polynucleotides. We refer them to A-po_lymerases. When the translation complex is formed between A-polymerase and the template nucleotide, it is supposed that such a complex incorporates activated amino acids discriminately depending on the sequence of the template. That is, the polymerase is a template-dependent polymerase, or has its own genetic code, as far as it does not lose its selectivity. The definition of the selectivity is following soon. The third type of polymerases is a template-dependent nucleotide polymerase, which makes a phosphodiester bond between two nucleotides, utilizing nucleoside phosphates. We shall call them N-polymerases throughout this paper. We further assume that each polymerase has an activity site and a selectivity site. A replacement of an amino acid at such sites causes either complete loss of catalytic activity or of selectivity depending on at which site it occurs. It may be conceptually possible that one amino acid belongs to both sites simultaneously. In this case, this amino acid will be grouped as the one at the activity site, for the change of such an amino acid practically causes complete loss of the polymerase activity. The replacement of an amino acid at the activity site is regarded as lethal for the polymerase, since it no longer has a catalytic activity of this kind. On the other hand, errors at the selectivity site may cause either loss or gain of the selectivity, depending upon the nature of the errors. * In this context, 'activated amino acids' means certain forms of amino acids and does not necessarily refer to the present form of activated amino acids used for the biological protein synthesis.

186

HIROSH1 MIZUTAN1 AND CYRIL PONNAMPERUMA

(a) A-polymerase. The number of amino acid residues which belong to the acitivity site of A-polymerase is m. The letter n signifies the number of the residues at the selectivity site of the polymerase. The recognition between certain codons and corresponding activated amino acids in the translation complex is assumed to be accomplished mainly by virtue of interactions of activated amino acids with each codon. In this context, activated amino acids could be regarded as the primitive form of amino-acyl tRNA's. (b) O-polymerase. At the very beginning of the evolution of the system, a foreign polymerase, O-polymerase, is necessary so that the first generation of the system can be produced. In nature, O-polymerase resembles A-polymerase. Once given a template, its function is to make polymerases (A-polymerases in Model I, and A-polymerases and N-polymerases in Model II) in a certain way so that an initial autocatalytic system is established. As the role of O-polymerase is to prepare the first set of translation apparatus, characterization of its catalytic ability in terms of activity site and selectivity site is not necessary. Rather it will be more clearly characterized, if we inquire how probable it is that the initial autocatalytic system be formed by an O-polymerase. Further characterization of O-polymerase is more properly discussed in each Model desciption. (c) N-polymerase. N-polymerase is a template dependent nucleotide polymerase. The recognition of specific substrate at each chain elongation step is performed mainly by virtue of the corresponding base in the template, however, N-polymerases may keep this recognition process in order thanks to their selectivity site. The number of the amino acid residues at the selectivity site of N-polymerase, l, might be relatively small comparing with that of A-polymerase, n, since the interaction between the template and the nucleotides may be more specific than the one between the template and amino acids. The polymerase has k amino acid residues at its activity site. 2.1.2. Accuracyof Polymerization and Gradingof Molecules (a) Accuracy. We assume that each polymerization complex (the translation and the replication) has its own template dependency, or coding table, so that it can produce its descendant in a certain distinctive way. The meaning of the coding table is that: given each codon, there is a certain substrate(s) which is (are) more rapidly incorporated into the products than the others. This particular correspondence between the codon and the substrate(s) constitutes the table. In case of N-polymerase, the coding table may be presumed to be Watson-Crick base pairing. And, in case of A-polymerase, it would be regarded as a genetic code of an autocatalytic system to which each A-polymerase belongs. The accuracy of a polymerase, C, is defined as; C = number of proper incorporations in terms of its coding table total incorporations of substrates into polymer products

(2)

Hereafter words 'proper' and 'improper' will be used in relation to the coding table involved. The accuracy of each polymerase is correlated with its selectivity. The

EVOLUTION OF THE PROTEIN SYNTHESIS SYSTEM

187

selectivity means that, if the selectivity of grade ] A-polymerase is SA(j), the rate of the incorporation of a proper substrate into product molecules is SA(]) times as fast as that of the incorporation of an improper substrate under the same concentration of proper and improper substrate groups. Consequently, if the number of different activated amino acid groups discriminated from others by A-polymerase is A, and if we assume that the concentrations of these are the same, the selectivity of a grade ] A-polymerase will be; SA(j)

=

the rate of proper amino acid incorporation X (/1 - 1) the total rate of improper amino acid incorporation

(3)

Hence the over-all ratio of proper incorporations to improper ones will be SA(j)/(A-1 ). Likewise, if N is the number of different nucleotide groups distinguished from one another by N-polymerase, the selectivity of grade ] N-polymerase, SN(]), is SN(j) =

the rate of proper nucleotide incorporation x (N-- 1). the total rate of improper nucleotide incorporation

(4)

and the ratio will be SN(/)/(N -- 1). Using these selectivities, Equation (2) can be transformed as follows. For grade ] A-polymerase, CA(/)

-

,

number of proper incorporations in terms of its coding table total incorporations of activated amino acids

= (rate of proper incorporation) x time (rate of total incorporation) x time SA(j ) x (rate of improper incorporation) (SA(/) x (rate of improper incorporation) + (A - 1) x (rate of improper incorporation)} = SA(/)/(SA(/)

+ A

-- 1).

(5)

For grade ] N-polymerase,

CN (/) = SN (])/(SN (./') + N - 1).

(6)

At this point, it should be mentioned that the distinguishable groups of activated amino acids during the translation is assumed to be also distinguishable in terms of their contribution to the enzymic activities, although it may not always be the case. (b) Grading. Polymerases are ordered in increasing selectivities and numbered, starting from an inactive polymerase* as grade 1, a defective (or template-independent) polymerase as 2, the lowest selective as 3, the second lowest as 4, and so on. Genes are defined and ordered according to the quality of their information. That is, genes which * A l t h o u g h a n i n a c t i v e p o l y m e r a s e is n o t a real p o l y m e r a s e , it is r e g a r d e d in this p a p e r as a

P01ymerasewith no activity.

HIROSHI MIZUTANI AND CYRIL PONNAMPERUMA

TABLE I Grading of the components Grade

A-polymerase and N-polymerase

1 2 3 4

No activity Random polymerization The lowest selectivity The 2nd lowest selectivity

A-gene and Nogene

Produce grade 1 polymerasea Produce grade 2 polymerasea Produce grade 3 polymerasea Produce grade 4 polymerasea and so forth

aAssuming no errors during the translation. ,' grade j polymerase under the given coding table will be considered as grade / genes. [e I summarizes the grading system. Note that the grading system shown in Table 1 ;erns only the activity and the selectivity. Therefore, it is quite possible that two Lence-wisely different molecules may be assigned the same grade. The accuracies are ~ected with the selectivities by Equations (5) and (6). MODEL I lel I is schematically shown in Figure 1. In Model I, we assume that either the ~evity of the polynucleotide is long enough or its replication is rapid so that the plate can remain in the system during the course of its evolution. Considering the ility of the helical form of polynucleotides, this assumption may not be so .~asonable as a first order approximation. "he model consists of activated amino acids, A-polymerase, and a template ,nucleotide. A-polymerase of ( i - 1)th generation catalyzes the polymerization tion of activated amino acids, which eventually produces polypeptides of ith :ration. A-polymerase serves as a polymerase of certain forms of activated amino s, and the selectivity of A-polymerases includes their selective preference for different

Model

I

0

A-polymerase of

(i-1)-th~neration

I

0

I

template polynucleotide (A-gene)

translation

>

A-polymerase of i-th generation

activated amino acids

Fig. 1.

Schematicrepresentation of Model 1.

EVOLUTION OF THE PROTEIN SYNTHESISSYSTEM

]. 89

forms of activations of amino acids and/or formations of different complexes with the template nucleotides in such a way that the complex makes activated amino acids differently interact with nucleotides. In other words, the codon recognition is achieved by the translation complex, and A-polymerase is not solely responsible for it. In order to avoid a possible complication in our description, however, the quality (or grade) of the translation complex will be referred to the grade of A-polymerase in later discussions on Model I. Therefore our description of Model I is not necessarily complete. The reader should be reminded of this, for the situation is different in Model II, where the main skeleton of the genetic code is considered to be more or less established. Given polynucleotides, it might be possible that more than one autocatalytic system does exist. The differences of these systems would be the forms of utilizable activated amino acids and/or the coding tables resulted from different interactions among different constituents. These different autocatalytic systems are also characterized by their reproductivity and accuracy of reproduction. A transition from one system to another may be caused by any one of several alterations in the system. For instance, when a misreplication of the template happens, a new polynucleotide whose sequence is different from the previous one would be produced. If it can form a translation complex, it would complete a transition. Another type of the transition occurs, when one or more errors of the translation take place at the selectivity site of A-polymerase. The product of such an erroneous polymerization might be an A-polymerase for a different autocatalytic system. In any case, the translation complex would begin to incorporate a new form of activated amino acids and to interact with the template in a different way, sometimes eventually establishing a new system. A very small portion of improper replications and/or translations could cause such a transition, and results in the modification of the feature of the interaction. Through these transitions, additional amino acids might be substituted, different forms of activated amino acids which were not used before might become utilizable, and one coding table might have been selected among many possible tables until the code became more or less frozen in the present form. General relation between (i - 1)th and ith generations in Model I is as follows:

PA(Z)i =

Y ~ (FA(]')XPA(])i_ j=2

1 XRA(]')),

(7)

where PA(/')i- 1 is the number of grade ] A-polymerases at (i - 1)th generation. RAft) is the reproductivity of grade f A-potymerase. The summation not only includes selective polymerases (j >~ 3) but also defective ones (/= 2). FA(/) is the transition probability of A-polymerase from grade ] to grade z, and is a function of the accuracies and the gains. Letter y is the total number of grades of A-polymerase. A transition from any system to any other system may be possible in principle, however, we consider only transitions to the same grade, ones to the next high, ones to inactive systems (loss of the activity), and ones to defective systems (loss of the selectivity, or random polymerization) as possible transitions. Figure 2 shows the evolutionary pathway of Model I. The figure assumes that the maximum grade for

190

HIROSHI MIZUTANIAND CYRIL PONNAMPERUMA transitions durin 9 the translation

Q 7J+1,-" ~'Y-I-~vf"~ 11 C->3J

Fig. 2.

Diagrammatic representation o f the evolutionary p a t h w a y o f Model I. For the explanation, see t h e text.

A-polymerase is y. In reality some skips over a few steps of the evolutionary ladder are quite likely and also a decrease in the selectivity may be possible as well as complete loss of the selectivity or the activity. The main reason for the assumption of step-wise evolution is that we do not exactly know the evolutionary pathway of the autocatalytic system (or the translation complex). No skips and no partial loss of the selectivity make the evolution of the system in our model more difficult. In Figure 2, the upward transitions are illustrated as a clockwise movement along the semicircle. Downward arrows to grade 2 shows the transition to defective systems. For a system of the highest accuracy, or grade y in case shown in Figure 2, only possible change in the selectivity is the loss of the selectivity and all the resulting products fall into grade 2 category. Another kind of downward arrows in Figure 2 shows the transition to inactive systems. This results, for instance, from any changes at the activity site of A-polymerase. Because of this transition rule, there are two possible pathways for the formation of grade ] (] ~ 3) system. One is self-reproduction, and the other is up-grading. The probabilities of the transitions to the next high system will be generally called gain. The gain in Model I, GAQ'), is defined as the probability of the formation of gradej system, when grade i - 1 system produces the systems other than grade ] - 1 ones. In other words, GA(]) is the fraction of changes occurred during the reproduction by grade ] - 1 system which results in the increase in the selectivity of its descendant. Though this value may be a function o f many variables, the largest factor would be initial and final points of the transition. Hence we consider it as a function of j, and ignore other factors. At first glance to Model I, the transition to the next high might appear fruitless, because such a gain of the selectivity does not seem inheritable at all. Even though the accuracy has been improved once, the very increase in the accuracy may no longer permit such an inaccurate incorporation of amino acids, and results in the loss of the selectivity in the next generation. However, as was mentioned earlier, the transition to another system in this context does not merely mean an increase in the rate of the accurate incorporation but a literal transition to another independent autocatalytic system. That is, the translation complex with the new component could be different in terms of its

E V O L U T I O N OF T H E P R O T E I N S Y N T H E S I S S Y S T E M

[9t

characteristics such as its genetic code, the utilizable forms of the activation of amino acids, etc. from the one with the old component. The selectivity of a grade 2 complex will be one, accordingly their accuracy, CA(2), is equal to 1/.4. A comeback to a selective system is possible for the defective ones (grade 2 complexes). With a small probability of GA(3), the defective systems can make systems of the lowest selectivity. No other comeback is considered possible. Here again, actual probability of making other selective systems than the one with the lowest selectivity might be as high as Ga(3), since random polymerization is assumed for the grade 2 complex. However, we took a pessimistic position that such a transition is not allowed. As will be shown later, the system will evolve to the most accurate system in spite of this assumption. At the beginning, we assume that there is (are) O-polymerase(s), whose translation accuracy is Co. Using the polynucleotide as a template, O-polymerase produces the first generation of A-polymerases. Since A-polymerase has m amino acids at its activity site, the total number of active A-polymerases in the first generation will be C~ x P o x R o , where Po stands for the number of O-polymerases and R o for the total number of molecules produced by one O-polymerase, the reproductivity of O-polymerase. Active A-polymerases can be further classified according to their accuracies. For instance, the number of molecules whose accuracy is CA(3 ) is C~ +nx P O x RO, since A-polymerase has n amino acids at its selectivity site. In the same way, the population of each autocatalytic system at the first generation is given as follows: PA(3)I = C~ +n XPO x N o ,

(8)

PA(4)I = C~ x (1 - Ca) x GA(4) x P o x R o ,

(9)

PA(Z)~ = 0 (z t> 5),

(10)

PA(2), = C8 x (1 - C~)) x (1 - GA(4)) x P o x R o ,

(11)

where PA(/)i denotes the number of grade ] A-polymerases at ith generation. Then the evolution of the system begins. It is implicitly supposed that the accuracy of O-polymerase is in terms of the genetic code of grade 3 A-polymerase. If Co is in terms of any other higher grade A-polymerase, we may regard it as the polymerase of the lowest accuracy. In effect, it does not cause significant modifications of our results and discussions. To derive actual formulae for Equation (7) after the first generation, let us consider the formation of grade 3 A-polymerases. The molecules are produced either via self-reproductions or via the comeback from defective systems. The total production of molecules by grade 3 A-polymerase is PA(3)i_ 1 x RA(3 ), where RA(3 ) denotes the reproductivity of the polymerase. Since the probability that one amino acid is accurately translated is CA(3), the probability that all amino acids at the activity site are accurately translated is CA(3)m. The fraction CA(3) m of the produced molecules has no error at the activity site. Among CA(3) m x PA(3) i- a x RA(3 ) molecules, the fraction CA(3) n has no errors at the selectivity site and consequently stays at the same selectivity. So the

192

HIROSHI MIZUTAN] AND CYRIL PONNAMPERUMA

+

x

I I

+

O c~

g::l 9

+

2 I •

Q x

0t) O

c; H

o

+

o~ r

EVOLUTIONOF THE PROTEINSYNTHESISSYSTEM

193

contribution from the self-reproduction is CA(3)m+nx PA(3)i_I x RA(3 ). Meanwhile the total number of molecules reproduced by the defective molecules is PA(2)i_I x RA(2), among which the CA(2)m fraction still remains active. Taking account of the comeback probability, the contribution to PA(3)i is CA(2) m x GA(3) x PA(2)I_ 1 x RA(2). T h e r e f o r e , P A ( 3 ) i is given by: PA(3)i = CA(3)m +n x P A ( 3 ) i _ 1 x R A ( 3 ) + CA(2) m x G A ( 3 )

(12) • CA(2);_ 1 x R A ( 2 ) .

In a similar way, PA(Z)i (z

>~4), and PA(2)i are:

Y

PA(Z)i = 2~ (FA(j) x PA(]')i- 1 XRA(j )} j=2

= C A ( Z ) m+n X P A ( Z ) i _ 1 X RA(Z ) + C A ( Z -- 1) m x

(1 -

CA(z -

1) n) x GA(Z ) XPA(Z -- 1)i_ 1 X R A ( Z -- l ) .

PA(2)i = CA(Z) m x (1 -- GA(3)) x P A ( 2 ) i _ 1 x R A ( 2 )

(13)

Y + ~ (CA(]') m X (1 -- CA(]') n) X (1 -- G A ( ] + 1))

/=a

XPA(])i-1 X/A(f)}

(14)

where FAQ ) is the transition probability from grade j A-polymerase to grade z A-polymerase. And GA(y + 1) is defined as zero. Table II summarizes the contribution by gradej A-polymerase of (i - 1)th generation to ith generation. 2.3 MODEL II

Model II is illustrated in Figure 3. An autocatalytic system in Model II includes two genes and two polymerases. It presumes an establishment of a certain degree of correspondence between activated amino acids and nucleotide bases so that the replacement of bases can affect the amino acid sequence consistently. The template consists of two genes. One is for an A-polymerase (A-gene), and the other for an N-polymerase (N-gene). The templates of ith generation are translated by A-polymerase of (i - 1)th generation, and produce Nand A-polymerases of ith generation. At this point, a complete set of an autocatalytic system of ith generation is established. Then N-polymerase of/th generation wilt catalyze the formation of the templates of (i + I)th generation, using the template of ith generation. An autocatalytic system in Model II is fully defined by four grades, i.e., two for the genes and two for the polymerases. A matrix, Sql, f2, ]3, ]4)i, fully describes an autocatalyric system at ith generation, where J1 shows the grade for N-gene,j2 for A-gene,j3 for N-polymerase, and J4 for A-polymerase. A set of jl and h defines the genotype of the

194

H1ROSHI MIZUTANI AND CYRIL PONNAMPERUMA

Model

II

\

0

I

V///////,4.? N-gene

A-polymerase of (i-l~t-th goneration> 0

ranslation activated amino acids nucleotides

A-gene

0

N-polymerase A-polymerase an autocatalytic system I of i-th generation I

i-th generati

V///////27

1

genes of (i+l)-th generation Fig. 3.

S c h e m a t i c r e p r e s e n t a t i o n o f M o d e l II.

system, and that of J3 and 1'4 the phenotype of the system. In later discussions, we may often classify the systems in Model II on the basis of their genotypes. Corresponding to the numbers of amino acids at the activity and the selectivity sites of N-polymerase, N-gene has k and l codons which are crucial of the activity and the selectivity of N-polymerase respectively. Let us generally call them activity codons and selectivity codons. Similarly A-gene has m and n codons which are decisive of A-polymerase activity and selectivity respectively. The number of the autocatalytic systems at ith generation, S(u, v, w, z)i, will be generally given as follows:

S(u,v,w,z)i=

x

y

x

y

E

~,

~

s

{FsxS(j~,I~,]3,j4)i_lxRs}

,

(15)

Jl = 2 i2 = 2 J 3 = 2 t 4 = 2

where S(jl, /.2, /.3, J4)i-1 is the number of thus-specified systems at (i - 1)th generation. R s is the reproductivity of the system S(/1, /.2, J3, /.4)i-1, i.e., the number of the systems, including non-viables, the system S(jl,/'2,/'3,/'4)i- 1 produces. The summation includes the defective genes (/1 and/or ]2 = 2) as well as selective ones (il and j~ ~> 3). Letter x is the total number of grades of N-polymerases, andy is that of A-polymerases. F s is the transition probability from S ( j l , j~, ]3, ]4)i- 1 to S(u, v, w, z)i, and is a function of the accuracies and the gains. Furthermore F s can be written as a product of four transition probabilities, corresponding to four reactions to form all four components in the system. That is, ( 1 6 ) F s =]'1 x f2 x f3 x f4. Function f l is the probability of the production of grade u N-gene employing grade 13 N-polymerase and grade J1 N-gene as a

EVOLUTION OF THE PROTEIN SYNTHESIS SYSTEM

195

template. Function f2 is the probability of grade v A-gene formation using grade J3 N-polymerase and grade Ja A-gene as a template. Function f3 is the probability of the formation of grade w N-polymerase from grade u N-gene and grade/4 A-polymerase. Function f4 is the probability of the formation of grade z A-polymerase from grade v A-gene and grade j4 A-polymerase. In Model II, we assume a step-wise evolution of the genes and the polymerases as follows. The transition probabilities to higher ones, gains, are non-zero only in case of transitions to the next high ones. Small fraction of errors at the selectivity site during the translation, either in N-polymerase or"in A-polymerase, may cause such transitions, and the rest of the changes will cause complete loss of the selectMty. Likewise, of small probability, genes can be upgraded one step during the replication through one or more base changes at the selectivity codons. Any changes either at the activity site or at the activity codons will result in a complete loss of the catalytic activity of the product polymerase. The assumptions above give certain restrictions to the constituents of the system. That is, a grade j gene (except the highest and the lowest grade genes) can produce only inactive polymerases, defective polymerases, grade ] polymerases, and grade y+ 1 polymerases, whatever the accuracy of A-polymerase would be. Consequently only grade j - 1 and/" genes can give birth to gradej genes in the next generation. In other words, if u (or v) >~ 3, grade u - 1 (v 1) and u (v) genes are the only producers of grade u (v) gene in the next generation. In case of u = 2 and/or v = 2, the system is a defective one, and other sources of its formation, i.e., contributions from errors of previously higher systems, should be included. Gains will be defined as follows. When one or more misassignments occur at the selectivity site of N-polymerase during the translation using grade w - 1 N-gene, the resulting polymerase either pocesses no selectivity or is grade w N-polymerase. GN(w ) signifies the probability of the occurrence of the latter. GA(Z) denotes the similar probability for erroneous translations from grade z - 1 A-gene to grade z A-polymerase. Corresponding errors may occur at gene level during replications, and Gx(U) and Gy(v) are used to signify the gains in N-gene and A-gene respectively. Actual formulae of S(u, v, w, z)i and their derivations from Equation (15) after the assumption of the step-wise evolution are given in Appendix III. To trigger the evolution of Model II, we assume that, at the beginning, there are a grade 3 N-gene, a grade 3 A-gene, and an O-polymerase. O-polymerase produces the first generation of N- and A-polymerases using the genes as templates. Since N-polymerase has k amino acid residues at its activity site, and the number of amino acid residues in the activity site of A-polymerase is m, the total number of active systems is equal to C~+mx To, where To is the number of sets of genes or that of polymerases whichever is smaller. Co stands for the translation accuracy of O-polymerase. These active systems can be further classified according to the accuracies of their polymerases. The population of each system at the first generation is as follows: S(3, 3, 2, 2)t = C8+mx (1 - C / ) 9 (1 - GNO) " (1 -- C8)" (1 - GAO) x To, (16)

196

HIROSH1 MIZUTANI AND CYRIL PONNAMPERUMA

S(3, 3, 2, 3)1 = C~+m+n x (1 - C~))- (1 - GNO) x To,

(17)

s(3, 3, 2, 4)1 = C~+mx (1 - C~)" (1 - - G N o ) . (1 - C8) x GAo x To, (18) S(3, 3, 3, 2)1

= cg

S(3, 3, 3, 3)1

= C k+l+m+n

• (1 X

(19)

c 8 ) - (1 - a A o ) • T o ,

To ,

(20)

(21)

S(3, 3, 3, 4)1 = C(~ + l + m x (1 -- C8) x GAO x T o , S(3, 3, 4, 2) 1 = c ~ + m x ( 1 - C t ) x G N o X ( 1 - C ~ ) ' ( 1

GAO) x T0,

(22)

S(3, 3, 4, 3)1 = C~}+m+nx (l - C / ) x GNO x To,

(23)

S(3, 3,4,4), = C~+mx (1 - C / ) x GNO x (1 -- C8) x GAO x To,

(24)

S(u, v, w, z), = 0 (u 4= 3 and/or v :~ 3),

(25)

where GAO denotes the gain to grade 4 A-polymerase from grade 3 A-gene catalyzed by O-polymerase, and GNO stands for the gain to grade 4 N-polymerase from grade 3 N-gene catalyzed by O-polymerase. Even if the grades of the first set of genes happen to be higher than 3, we can regard them to be equal to 3 as we did in Model I. Obviously, our results and discussions do essentially stay unchanged. In Model II, O-polymerase is needed as well as in Model I. O-polymerase in Model II, however, might not be a totally foreign polymerase to the genes. Rather it may be coded in A-gene from the beginning. Discussions on this possibility and the transition from Model I to Model II are included in Part 3, Section 2. The definitions of special terms used in this paper are listed in Appendix I. Appendix I1 is a glossary of characters used in the paper. 3. Simulation Results and Discussions 3.1.

RESULTS

There are a number of parameters in our model. Among them, the comeback probabilities, GA(3), GN(3), Gx(3), and Gy(3), are just complemental to the model, unless unreasonably high values are given to them. As easily expected, it does not change the simulation results so much. Throughout the simulations, all reproductivities were given the same value (=R). Though it is quite likely that a highly evolved autocatalytic system has larger reproductivity, no estimation seems plausible for these values. And the different values chosen arbitrarily would give uselessly complicated outlook. For this reason, we took rather pessimistic position. The results obtained will be only strengthened, if the reproductivity increases along with the evolution. To put the same value to the reproductivities makes them relative, that is, the different figure for R does not change the population ratio of any one group of the autocatalytie systems to others.

EVOLUTION OF THE PROTEIN SYNTHESIS SYSTEM

197

It may be reasonable to assume that the recognition of amino acids by the primitive genetic apparatus is somewhat ambiguous, and chemically and structurally similar amino acids might be recognized as the same substrate. Therefore the number of activated amino acid groups, A, may be less than twenty, although actual number of available amino acids in a primitive milieu would be considerably larger than twenty. In this simulation, A = 10 was used. The number of differently recognized nucleotide groups, N, was four throughout this simulation. It is likely that the interaction involved in the nucleotidenucleotide recognition in a primitive autocatalytic system was mainly Watson-Crick type base-pairing. If the number is larger than four, it would require a very drastic change in the system to adapt today's way of the translation. Hence it may be unlikely. It is suggested by several investigators that the number of nucleotide bases in the primitive replication was two rather than four, (Lipmann, 1965; Crick, 1968). However, this suggestion may need further debate before it enjoys a universal acceptance. Therefore, N = 4 seems to be most appropriate so far. After setting values for the selectivities, A, and iV, the accuracies are obtained from Equations (5) and (6). All the numbers of amino acid residues at the activity sites and the selectivity sites, i.e., k, /, m, and n, are put to be equal to five throughout the simulations. This figure might be somewhat large, if we consider a plausible small size of primitive polymerases. However, since the enzymic property in our model is based on an all-or-none assumption, a little bit large number might compensate our total neglect to all amino acids other than ones at those sites. The parameters which define the character of O-polymerase are Po, Ro, and Co. The number of the polymerases,Po, must have been small in Model i. It could be only one or two. Probably we cannot expect a large value for Co in Model I, i.e., a high accuracy of the translation, either. However, it is interesting to know that even an O-polymerase which polymerizes amino acids with little distinction is enough to begin the evolution of the autocatalytic system. As shown in Figure 4, an actual value of Co is not so significant. Figure 4a shows the simulation result of Model I with highly accurate O-polymerase, i.e., Co = 99%, and Figure 4b shows the one with a template-independent O-polymerase (random polymerization or no genetic code). The figure demonstrates how the population ratio of each grade complex changes along with the number of generations. The numbers in Figure 4 indicate the grade of the translation complex. The highest grade in these simulations is five. Though initial behavior of Model I varies in Figure 4a and 4b, this variation shows up only up to the 20th generation. And the outcomes after that are strikingly similar. The final takeover by the most accurate system occurs just before the 103th generation in both cases. Although this seemingly strange result may be due partly to our complete neglect to amino acid residues of A-polymerase other than those at the activity or the selectivity site, the essential cause of the result is that the overall effect of the accuracy of O-polymerase on the autocatalytic system is indirect, and once the systems of the first generation are produced, the entire role of O-polymerase is over. From Equations (8) - (14), it is apparent that Co directly affects only the population at the first generation. And indirect effect of low Co on PA(Z)i's gradually vanishes through generations as clearly shown in Figure 4. In this connection,

198

HIROSHI MIZUTANI AND CYRIL PONNAMPERUMA

4

100 %

5

100 %

80

80 o

_o o 60

"# 60. ,g

~=40

2

~_ 4 0 .

&

o

20.

~

I0

"JO2 'JO3 generation

'JO4

"lO

'102 'lO 3 generation

iO 4

Fig. 4a. Fig. 4b. Fig. 4a-b. Simulation results of Model I, demonstrating the insignificance of the translation accuracy of O-polymerase. The accuracy of O-polymerase is; (a) 0.99, (b) 0.1 (As A = 10, it means that the polymerization is random one.) Other variables are the same in both (a) and (b); y = 5, P o = l , SA(3)=102, SA(4)=104, SA(5)=106, A=10, m=5, n=5, GAq)=10 -3 (/'i>4), GA(3) = 10 -6. For definitions of variables, see the text or Appendix II. Grade 3 complex is not shown in Figure 4b. The maximum share for grade 3 complex is 0.42% at the second generation. there appears an important implication that the desirable characteristics for O-polymerase might have been quite different from those of its product, A-polymerase, which eventually developed into the ancestor of today's enzymes. Only desirable specification for O-polymerase is that the product C~O.~n x Po xRo is larger than or comparable to one so that one or more sets of the products which have the selectivity are produced. A sloppy, yet reproductive polymerase may be enough to carry out the task. The implication that little accuracy is sufficient to have the system started is quite important. Although they are low, enzymic activities such as esterase activity are found in polypeptides (Noguchi et al., 1971). Nakashima and Fox (1972) found a discriminatory incorporation of aminoacyl adenylates into proteinoids containing one of four homopolynucleotides. It appears quite important for the gain not to be equal to zero, though small number is sufficient to cause the evolution of the system. For instance, if GA(] ) = 0 ~ ~> 3) in Model I, PA(J)i's (j/> 4) are always equal to zero, that is, no evolution (cf. Equations (9) and (t3)), and the behavior of the model is very simple. The PA(3)i is equal to cry. +n x PO x R O x (CA(3) m+n x R A (3))/- 1, and this system will eventually disappear, unless the product CA(3) rn+n XRA(3) is larger than one. Figure 5 and Figure 6 show typical examples of how the gain affects the population distribution. In case of G A(1) = 0 (/t> 4) as shown in Figure 5, only two grades of molecules exist, 2 and 3, with considerably high amount of defective systems. However, once small possibilities of the transitions to the systems with a higher accuracy are given (GA(j) = 1 0 -6 (f ~ 4) in Figure 6), the behavior of the model changes drasticaly. Systems of grades 2 and 3, which are the only existent ones in case of GA(f) = 0 (f ~> 4), virtually becomes extinct after the tenth generation and the system with the highest accuracy eventually prevails through a transitory phase of the predominance by the less accurate system. A larger

199

EVOLUTION OF THE PROTEIN SYNTHES{S SYSTEM

ioc % 8c o

:3

P 60.

.g_

~

2

40.

20 lO generation

'10 2

Fig. 5. Simulation result of Model I in case of GAQ) = 0 q >14). Other variables are the same as in Figure 4b. After reaching the equilibrium, grade 3 occupies 65% and grade 2, 35%. The share for grade 4 and 5 complexes is 0% throughout.

value for the gain does not change the principal appearance o f the behavior of the model. When higher gains, GA(/") = 10 -2 (j" 1> 4), replaced lower ones, i.e., 10 -6, it altered the population ratios at early generations and shortened the time necessary for the predominance by the system with the highest accuracy. As a rapid evolution might have had some advantages in terms of a competition with other existing autocatalytic systems, large probability of the transitions to higher systems may not be negligible. However, it is also noteworthy that even small gains are adequate to make the system evolve. Equally or more important is the values o f the selectivities. Figure 7 shows the results of two different selectivities in Model I. Comparing with Figure 4b, lower selectivities were used in Figure 7a, and hi~:Ler ones in Figure 7b. All other conditions were the same as in Figure 4b. In Figure 7a, lower selectivities caused a quick domination by the most accurate system (grade 5). Just after the 10th generation the takeover by the highest grade system occurs, though its occupation never be complete (96%). Defective systems hang around forever thanks to the relatively low value of SA(5). On the other hand, higher selectivities increase the number of generations necessary for the highest grade to

4

/

10

Fig. 6.

5

'10~

generation

'I0~

io'

Simulation result of Model I in case of GA(/') = 10-~ q ~> 4). Other variables are the same as those in Figure 4b.

200

HIROSHI MtZUTANI AND CYRIL P O N N A M P E R U M A

I00% /

5

8O

260

4 .s

"640

lo8i 80

i

4

60

&

20

=

40

_o :::1 28

jr

2

0 on generat

10 2

j5 10

10 ~

18 3 generotion

184

18 5

Fig. 7b. Fig. 7a. Fig. 7 a - b . Simulation results of Model I in case of SA(3) = 10, SA(4) = 10 ~ , and SA(5) = 103 for (a), and SA(3) = 103, SA(4) = 105, and SA(5) = 109 for (b). Other Variables are the same as in Figure 4b. In (a), grade 5 occupies 96% and grade 2, 4.4% after reaching the equilibrium. The m a x i m u m share for grade 3 complex in (a) is 0.0025% at the second generation, In (b), the m a x i m u m share for grade 3 is 0.90% at the second generation.

occupy the entire biosphere, and result in a lengthy and seemingly complete domination by the less accurate system (Figure 7b). Only after around 3 x 104th generation, grade 5 complex begins to share more than 1% of the total population. Yet, once established, its occupation is far more complete (99.999995% at ~-th generation). Figure 8 shows the results of low and high selectivities in Model II. In this simulation, the number of different types of genes is five for each gene, i.e., grades 1-5. Therefore, the most evolved system in the simualtion has the genotype 55 and the phenotype 55. In Figure 8 and at1 following figures, systems in Model II are grouped according to their genotypes. For instance, the group 44 shown in Figure 8 actually consists of nine systems; S(4,4,2,2)i, S ( 4 , 4 , 2 , 4 ) i , S(4,4,2,5)i, S(4,4,4,2)i, S(4,4,4,4)i, S(4,4,4,5)i, S(4,4,5,2)i, S(4,4,5,4)i, and

~176 33 f 100%

-7

~oo o

44

55

.

100,

55

33

.0 80.

P

45

l:z 60 0

22

{.o

.~ 40

R

o 20

020

t?s, 10

a2

generation

"102

10

'10 2

generation

'10 3

104

Fig. 8b. Fig. 8a. Fig. 8 a - b . Simulation results of Model II. ( a ) x = y = 5, PO = 1, C 9 = 0.1, SA(3) = SN(3) = 10, SA(4 ) = SN(4 ) = 10 ~,SA(5 ) = S N ( 5 ) = l0 s ,A = 10, N = 4, k = 1 = m = n = 5, GAO = GNO = 0.1, GAQ) = GNQ) = Gx(J) = Gyq) = 10 -3 ( / > / 4 ) , GA(3) = GN(3) = GX(3) = G y ( 3 ) = 10 - 6 (b) All variables are the same as used in (a) except that SA(3) = S N ( 3 ) = 10 ~, SA(4) = S N ( 4 ) = 104 , and SA(5) = SN(5) = 10 ~. In (a), the genotypes whose m a x i m u m share is less than 10% are not shown except genotype 52. After reaching the equilibrium, genotype 55 occupies 97%, and 52 and 25, 1.5% each. In (b), the genotypes whose m a x i m u m share is less than 15% are not shown.

EVOLUTION OF ]'HE PROTEIN SYNTHESIS SYSTEM

201

|o0_~

80.I3 .2

88

60.

.o_ o 40. O.

o

~" 20.

10

~102

generation Fig. 9.

Simulation result of Model II. Variables are the same as in Figure 8a except that x = y = 8, SN(4) = 20, S A ( 5 ) = SN(5) = 30, SA(6) = SN(6) = 40, SA(7) = SN(7) = 50, and SA(8) = SN(8)= 60. The genotypes whose maximum share is less than 20% are not shown except genotype 82. Genotype 88 occupies 61%, 28 and 82, 17% each, and 22, 4.7% at the equilibrium. SA(4 ) =

S(4,4,5,5)i. Any systems which have at least one inactive polymerase, e.g., S(4,4,5,1)i, are excluded from the group, as they are incapable o f producing their descendants. Generally low selectivities result in an incomplete occupation of the system of the highest accuracy as well as somewhat complicated features at an initial phase of the evolution. In contrast with that, higher selectivities result in more complete occupation by the highest, and needs more generations to accomplish it. The former is an advantage for the system, however, the latter would be a disadvantage. Figures 9 and 10 illustrate an importance of the nature o f activated amino acids - nucleotides interaction which may characterize the selectivities. In this simulation of Model II, the maximum number o f grades for genes is eight. Figure 9 shows the result of low and additive nature of the selectivities. Figure 10 shows the result o f lower but multiplicative nature of the selectivities. Because of the higher accuracy at grade 8, systems with multiplicative nature of the interaction demonstrate eventual near-complete occupation by the highest genotype in spite o f the lower selectivity at the beginning. Contrary to this, tow and additive nature o f the selectivities (Figure 9) results in an incomplete occupation by the highest genotype, and in a coexistence with defective systems such as genotypes 28, 82, and 22. It is interesting that the second highest systems such as 78 and 87 do not stay large for many generations, although their selectivities are very close to those of the system with the highest accuracy (SA(7) =SN(7) = 50, SA(8) = S N ( 8 ) = 6 0 ) . As a matter o f fact, the maximum share for genotype 87

202

H I R O S H I M 1 Z U T A N I A N D CYRIL P O N N A M P E R U M A

100.

78

88

33

.9 80. r-

.o

60.

'4"-

" ~ 40. Q. 0

45

Q" 20

\ I

10

I

10 2

103

generation Fig. 10. S i m u l a t i o n o f M o d e l II. V a r i a b l e s are the same as in F i g u r e 8a e x c e p t t h a t x = y - 8, S A ( 3 ) = S N ( 3 ) = 5, S A ( 4 ) = S N ( 4 ) = 52 , S A ( 5 ) = S N ( 5 ) = 5 ~, S A ( 6 ) = S N ( 6 ) = 54 , S A ( 7 ) = S N ( 7 ) = 5 s , and S A ( 8 ) = S N ( 8 ) = 56. T h e g e n o t y p e s w h o s e m a x i m u m share is less t h a n 20% are n o t s h o w n .

throughout the simulation is 0.025% at the 25th generation. The maximum share for genotype 77 is 1.0% at the 22nd generation. If additive yet higher values are given to the selectivities, the occupation by the highest genotype is less incomplete. For instance, Model II with additive nature of the selectivities starting from 100 for SA(3 ) and SN(3) manifests that, after 715 generations, more than 95% of the systems has the highest genotype. On the other hand, a smaller number of grades further lowers the percentage of the occupancy by the systems with the highest accuracy. If we only limit the highest genotype to 55 in the simulation shown in Figure 9, the occupation by the highest genotype is far more incomplete. By about the 20th generation, the highest genotype occupies about 39%, and stays constant at this level. Defective systems such as 25, 52, and 22 hold considerable share after the 20th generation. This result is mainly due to the relatively low selectivity of the highest genotype (the highest selectivity is 30, while it is 60 in Figure 9). 3.2. D I S C U S S I O N S

Simulation results enable us to state that, given appropriate values to parameters, the systems with the highest accuracy eventually surpass all others. The number of generations necessary for the domination by the most accurate system depends on the values for the gains. The higher the values are, the smaller the number is. The extent of the occupation is dependent on the actual values for the selectivities, and on the number of possible transitions necessary to reach the highest. Higher values for the

EVOLUTION OF THE PROTEIN SYNTHESIS SYSTEM

203

selectivities make the evolution slower. In terms of the competition with other systems, the systems whose evolutionary pathway had large values for the gains and the selectivities might be most likely to survive in early chemical and biological evolution. The selective force in Model I is focused on overall interactions among components of the autocatalytic system. The system which can draw most from the given base sequences will eventually surpass all others. Given base sequences, the number of possible autocatalytic systems may vary greatly. No such system might be constructed from most of sequences. But our results obtained from Model I show that, if more than one is constructed, the system which has the highest accuracy is eventually selected among possible systems, no matter the accuracy of the first system is, as far as the transitions to higher systems are allowed. Through this selection process, the best form of the activations of amino acids and the best genetic code might have been chosen. By no means, the best genetic code implies today's genetic code. As is apparent from our assumption that A = 10, the refinement of the code would have taken place even after the evolution through Model II. A process of the refinement of the code has been discussed by Woese (1973). Concerning the evolution of Model I through which the genetic code would be more or less established, we feel that the following note should be mentioned. In the course of evolution, it may be possible that the same genetic code is not used in the next generation. That is, the first generation whose code is G1 might produce the second generation whose code, G2, is different. This sequence will reach either: (a) an end where the next generation has no genetic code, or (b) the generation whose genetic code,Gm, is the same as that of one of previous generations, G n. In case of(a), the evolution of such a line will vanish at the moment it reaches the end. The evolution of(b) does not seem very hopeful either. However, if the difference m- n (~> 1) is not very large, it could continue to change the code, where each generation uses one of m - n codes. Apparently such cases are quite unlikely to happen and to survive in reality. But, if m - n = 1, such an occurrence might be less unlikely to survivive. In this case, only one code would be finally conserved. This type of evolution is partially included in Model I. For more comprehensive inclusion, however, the expansion of Model I to something like the one to be discussed later in Part 5 may be necessary. Though these lines of evolution are conceptually possible, we did not steep ourselves into them so far because of following reasons: (a) still unlikely, (b) a difficulty of the evaluation of the selectivities, and (c) an anticipated minute improvement with tremendous complication of the model. A conceptual scheme of the transition from Model I to Model II is shown in Figure 11. Probably throughout the evolution of the system in Model I, it is likely that polynucleotides whose sequences are more or less different from that of the original template might be synthesized, as 100% accurate non-enzymic replication of the template is quite unlikely. As far as the replications are concerned, there is no advantage or disadvantage for those erroneous templates in Model I. Hence it is probable that they continue to replicate themselves, and to be subject to further changes in their base sequences. Through this process some of them might come to code a nucleotide polymerase rather than an activated amino acid polymerase. Polymerization of

204

HIROSHI

MIZUTAN1 AND CYRIL PONNAMPERUMA

(D

OJ

0 ~ 0 ~ 0 o

o

~:

'

~ o

,

O~ v

i

\

0 ~ 0 ~-~_~ ~0 ~ 00~

Z %

0

E

Z::

0

(D

-g

4J 0 o

I1) t~

~

I

I


t 4-)

c ~

i

I--

(D O

O

$4

~"

04o

0 c'1

o .~

4J (D ~ r~ 0 IZh 04

~D

"~

0-~ .~ ~

cJ -o o y~

O4

0 ~

O'3

~

o-~

0

EVOLUTION OF THE PROTEIN SYNTHESIS SYSTEM

205

nucleotides is, in a sense, quite similar to polymerization of amino acids, since dehydration reactions are involved in both processes. Not much modification of the amino acid sequence of A-polymerase might bring an N-polymerase. Once such an N-gene is formed, the advantage of having N-gene in the system is quite obvious. The system which succeeded in including N-gene could replicate its genes much quicker, eventually replacing all other existent systems. Then the evolution of Model II had begun. To include an N-gene in the system, there seem to be two ways. One is a localized coexistence of A-gene and N-gene. The other is to connect two genes chemically, resulting from a gene duplication. Both ways appear quite common in biological systems. The former is analogous to having two chromosomes in a cell. And it has recently been proposed that the gene duplication might be a quite basic phenomenon throughout entire biological evolution (Sparrow and Nauman, 1976). Furthermore an emergence of new genes in a similar manner has been observed in biological systems (Campbell et al., 1973; Rigby et al., 1974). It seems that this type of divergence of protein functions actually occurred in several cases. For instance, chymotrypsin and trypsin, both of which have the same catalytic function of protein hydrolysis, appear in their sequences to share a common ancestor, though their substrate specificities are different. If the transition from Model i to Model II had happened as described here, O-polymerase in Model II would be a descendant of A-polymerase in Model I. The number of O-polymerases in Model II could be moderately large. A reasonably high accuracy of the translation might be expected as the result of the antecedent evolution through Model I. The number for To may be high as well. The distinct difference between O-polymerases in Model II and in Model I is that the former is likely to be coded in the polynucleotide, while the latter may not. From the simulation results in Model II, it seems that, given even selectivities to N- and A-polymerases, the evolutionary rate of N-gene is slower than that of A-gene. For instance, we see in Figure 10 that the predominant genotypes after the first eight generations are 55, 56, 66, 67, 77, 78, 88. The upgrades of N-genes come after the corresponding changes in A-genes. Furthermore it is likely that the interactions between activated amino acids and nucleic acids (a genetic code) are much less selective in nature than those between nucleic acids (presumably Watson-Crick base pairing). Smaller selectivities may be expected for A-polymerases rather than evenhanded selectivities to both N- and A-polymerases. When smaller value is used for the initial selectivity of A-polymerase, the observed tendency becomes more perceptible. For example, the predominant genotype in Figure 10 would change in the order of 57, 58, 68, 78, 88 after the first nine generations, if a half of the initial selectivity of N-polymerase is used for A-polymerase. The fast upgrading, or evolution, of A-polymerase is more conspicuous. The results seem to suggest that the evolutionary rate of N-polymerase is slower than that of A-polymerase, at least, during the early evolution of the polymerases. The direct descendants of N-polymerase may be nucleotide polymerases such as DNA polymerases and RNA polymerases. Unfortunately, however, it is difficult to speculate on the direct descendant of A-polymerases. They could be proteins involved in the biological protein synthesis such as ribosomal proteins. Another difficulty is that the selectivities of

206

HIROSHI MIZUTANI AND CYRIL PONNAMPERUMA

A-polymerases at the beginning of the evolution of Model II could be higher than that of N-polymerases, thanks to the preceding evolution of A-polymerases through Model I. In spite of these ambiguities, it may be interesting to compare the evolutionary rates of these proteins, when they become available. Though yet marginal, some related studies has recently begun to appear in the literature (Delaunay et al., 1973; Hori, 1975; Chang, 1976). In connection to the transitions from Model I to Model II, comparisons of their amino acid residues and conformations involved in the polymerization reactions would be of interest too. In the present generic apparatus, it is said that the accuracy of the translation is as high as 3 x 10 a -- 1 3 x 10 a or more (Loftfield, 1963). To have this order of accuracy, the selectivity of A-polymerase should be around 6 x 104 (assuming A = 20). This value does not seem unreasonably high, though we do not know that how many improvements in N- and A-polymerases might have occurred before the autocatalytic system evolved further and changed the way of the replication and the translation. Assuming the mutiplicative nature of selectivities, small initial selectivities over random incorporation, and a small number of improvements will easily give the selectivity of this order of magnitude (e.g., see conditions used for Figures 8 and 10). The cause of the selectivity increase could be a result of favorable ways of the complex formations (the translation complexes and the replication complexes). If it is the case, the selectivity can be expressed in terms of the difference of the free energies of two complexes. To obtain a selectivity of 6 x 104, the free energy difference between a randomly polymerizing system and the one with today's translation accuracy is about - 6 . 4 kcal/mole at around a room temperature. A few hydrogen bonds may be enough to account the difference. It is interesting to note that some sort of localization of the system is necessary in Model II in order to continue the evolution of the system. In Model I, there is no positional requirement to A-polymerases. They can depart from the template nucleotide and can associate with other polynucleotides. If the polynucleotides happen to be another template, such an association can continue the evolution of A-polymerase. In other words, there is no problem in complete mixing between templates and product molecules in Model I. Rather it could be a necessary cause for the emergence of N-genes as mentioned earlier. This is not the case in Model II. For instance, if a complete mixing between the genes and the polymerases is allowed, the relation between the genes of (i-1)th and ith generations is as follows: u >~ 3, v >~ 3 x r ( u , v)i = r ( u ,

v)i _ 1 x

{PN(J)i- 1 x RN(/) x CN(]) k+l+m+n } j=2 x.

+ T(u - 1, v)i_ 1 x ]~ {PN(J)i-1 x RN(]) x CN(j) k+rn+n ]=2

207

EVOLUTIONOF THE PROTEIN SYNTHESISSYSTEM

X (t ---CN(]) l)

X Gx(u)}

x + T(b/,O-- 1)i_ t X ~

]=2

x (1 - c N ( j ) " )

(PN(])i-1

X R N ( / ) X CN(]) k+l+m

x G,(v)} x

+ T(u - 1, v - 1)i_t x ~ (PN(])i-1 XRN(/) ]=2

X

CN(]) k+m (26)

x (1 - CN(/)z) x Gx(u) x (1 - CNq) n) x Gy(O)}

where T(u, v)i is the number of genotype uv genes at ith generation. P•(])i is the number of grade j N-polymerases at ith generation. RN(j) is the reproductivity of grade ] N-polymerase. The replication of inactive template also happens, and the equation for the inactive template is as follows: u=l,v=l X

T(1, l)i =T(1, 1)i-1 x 1~ (PN(])i-1 XRN(])}

]=2

yET(1, v)i x[ 1)i_~ x ~ (PN(])~-~ x R ~ ( ] ) •

+ 2:

-1 x ~ (PN(J)i-1 x R N ( / ) x (1 ]=2

V=2

+ 2:

T(u,

u=2

x

+u~_ -2

y[

2

v=2

]=2

-

CN(i)m)} l k

))jq

x

T(U, v)i-1 x 2: (PN(])i-1XRN(]) j=2

x (1 - CN(j) m) x (1 -- CN(j)k))].

(27)

If we compare Equation (26) with Equation (27) term by term, it turns, out that all of the four terms in Equation (27) are consistently larger than the corresponding terms in Equation (26). That is, the lowest grade templates are ever increasing their share in the biosphere. Consequently they will produce more and more inactive polymerases. Eventually only very few active, if any, molecules may exist among many inactive ones in the biosphere. This type of system fails to evolve. The higher selectivity o f N-polymerases gained through evolution need be directed to a positive way, i.e., to the replication of higher grade genes rather than to a random way. In Model II, this problem is solved by considering a combination of genes and polymerases as one system. To achieve this in a physical ova chemical, sense, some sort of localization of molecules appears to be necessary. There may be many probable ways of the localization, starting from a mere lack of complete stirring, through a simple retardation of the diffusion of N-polymerases by certain associations with a less mobile phase such as clays, to an enclosure of the

208

HIROSHI MIZUTANI AND CYRIL PONNAMPERUMA

components inside of a membrane. Until now we do not have a clear-cut idea about which ways are appropriate and plausible to attain the purpose under prebiotic conditions. However, the necessity of the localization might suggest that the large pond in which organic molecules were metabolized could be hardly considered as our ancestors in the direct line. Through a localization, genetically acquired advantages can perpetuate and proliferate. Before or at the emergence of the enzymic replication, a certain compartmentalization of the system might have been needed. In our model, the process of the activation of amino acids is considered as a later event as well as the emergence of tRNA's, whereas Hoffmann's model (1974; 1975) centers around the activation process. Large differences in the size and the mechanism of the aminoacyl tRNA synthetases are known (cf., Kisselev and Favorova, 1974). Some synthetases work with a sequential order of substrate attachment, others according to a random attachment mechanism. Some require magnesium for their activity, others do not. They do not really represent a uniform class of enzymes. This seems to suggest that the entry of the synthetases in the protein synthesis system may be rather relatively late. The emergence of tRNA is discussed separately (Mizutani and Ponnamperuma, 1977).

4. Possible Experimental Approaches We would like to list several important experimental approaches to this model. The model primarily depends on three assumptions, i.e., the polymerizing activity of O-polymerase, the existence of activated forms of the monomers and a template polynucleotide, and the selective interactions among the components of the system. Each of them can be approached experimentally. Since O-polymerase is very likely to be a polypeptide, we need know how feasible it would have been to have polypeptides formed under plausible primitive Earth conditions and what fraction of randomly polymerized polypeptides is capable of polymerizing activated amino acids. Although some of polymerization studies have already yielded interesting results (e.g., Paecht-Horowitz, 1974; Chung et al., 1971), it seems that very little is known about the frequency of occurrence of specific catalytic activities in peptides. For instance, many studies on the hydrolytic activity of synthetic peptides have shown appreciable yet low catalytic activity, despite the fact that those peptides were synthesized with deliberate intention to imitate the active centers of hydrolytic enzymes (Photaki and Shakarellow-Daitsiotou, 1976, and additional references therein). Noguchi and his coworkers found that several polymers of L-amino acids do catalyze the hydrolysis of p-nitrophenyl acetate ester, and a kinetic nature of this reaction has been investigated (Noguchi et al., 1971). Although several kinds of catalytic activities of proteinoids such as a hydrolysis (Oshima, 1968), a decarboxylation (Hardebeck et al., 1968), and an oxidation (Fox and Krampitz, 1964) are known, the study on the catalytic activity of peptides appears to be mainly involved in the hydrolytic reaction. The polymerase activity of peptides does not seem to have been investigated so far. According to the implication of the model, a high activity is not necessarily required. Therefore, it is of interest to seek some polymerization activities of peptides. On the other hand, the search for the catalytic activity need not be restricted to

EVOLUTION OF THE PROTEIN SYNTHESIS SYSTEM

209

peptides. Mixed micelles of N-acylamino acids and a detergent, cetyltrimethylammonium bromide, enhance the rate of hydrolysis of p-nitrophenyl acetate more than one order of magnitude (Inoue et al., 1976; Gitler and Ochoa-Solano, 1968). A relatively simple molecule, N-(4-imidazolylmethyl)benzohydroxamic acid, is also known to catalyze the same reaction (Kunitake et al., 1976). Furthermore, inorganic ions and crystals such as clays would be another possible catalyst. Plausible forms of 'activated amino acids' should be investigated simultaneous!y. That is, what forms of amino acids can be, if any, more easily polymerized by randomly synthesized peptides?, and what forms of amino acids are more feasible to exist under primitive Earth conditions? Since the final form of activated amino acids for life today is aminoacyl-tRNA and a great deal of data on the structure of tRNA's are becoming available, speculations concerning the primitive form of the activation can be also within reach and fruitful. The ancestral form of tRNA's and its evolution have been discussed by several authors (e.g., Dayhoff and McLaughlin, 1972; Jukes, 1974). As discussed earlier, O-polymerase need not be very specific to substrates (cf. Figure 4). It might be even possible that O-polymerases are not necessary at all for the evolution of Model I, if a (template-dependent) non-enzymic formation of peptides from activated amino acids is very efficient. Therefore the study on the plausible and effective ways of (template-dependent) non-enzymatic polymerization of activated amino acids is very important. The importance of nucleic acids in the model is paramount, as is usual in any genetic apparatus of terrestrial organisms. Yet we know little of polynucleotides on the primitive Earth. Several experiments on the oligomerization of nucleotides or nucleosides under more or less primitive conditions have been reported (Schwartz and Fox, 1967; Schneider-Bernloehr et al., 1968; Oro et aI., 1969; Ibanez et al., 1971), however, the degrees of polymerization of those oligomers are not large and the products generally seem to have considerable number of incorrect, i.e., non 3 ' - 5 ' mono-phosphodiester, bonds. This sort of study should be continued further, and good yet geologically plausible conditions for the polymerization should be pursued. Also of paramount interest is the interaction among the components, nucleic acids, amino acids, and proteins. The interaction between nucleic acids and amino acids or peptides has been already one of the main subjects of investigation from chemical evolutionary point of view (for instance, see Saxinger and Ponnamperuma, 1974). This kind of study is especially interesting, when we consider the origin of the genetic code, since the simulation results show that the genetic code could have appeared and selected through the interaction which had taken place in the primitive autocatalytic system. Today's genetic apparatus is highly evolved one and the extrapolation to the original one is difficult. Today the code uses a triplet code and yet we do not even know if or not the triplet code was used from the beginning (Crick et al., 1976). Needless to say that the assignment of amino acids to each codon in the original code might have been quite different from the present one (Jukes, 1974). In these connections, the study on the probable forms of 'activated amino acids' again appears important. It seems that the studies on the interaction so far reported have been involved with limited kinds of amino acid forms, mainly free amino acids. The stereochemical problem in the interaction must

210

HIROSHIMIZUTANIANDCYRILPONNAMPERUMA

be also taken into account as well as it was in molecular biology when the idea of the adaptor molecule, tRNA, arose. It should be noticed that, in contrast with the base pairing, the specific correspondence between amino acids and triplets, the genetic code, appears only when the synthesis of proteins is involved. This might suggest that the experimental study on the origin of the code should be performed in systems where actual formations of peptide bonds were taking place. Therefore, the investigation into the effect of (poly-) nucleotides on the polymerization processes of (activated) amino acids is of particular interest. If any effects were observed, it could provide some clues to the problem of the origin and the history of the genetic code. 5. Conclusion We presented a picture of continuous evolution of the protein synthesis, starting from an amino acid polymerase and a template polynucleotide, and ending at the translation system which has the basic features of the biological protein synthesis system (for a full description of the evolution, cf. Mizutani and Ponnamperuma, 1977). It was shown that the -origin and the evolution of the genetic code might have occurred during the early evolution of the translation system. The selection principle for the genetic code from possibly many codes could lie in the nature of interactions among components of the autocatalytic system. Unlike what some investigators claim, the model demonstrated that, even if the original polymerase was a template-independent one, i.e., no genetic code, the emergence and subsequent selection of the genetic codes still could have taken place. Later evolution of the system caused by errors of the translation and the replication would make this system more precise and viable. The beginning of our model, that is, (1) To have an O-polymerase, a proper template polynucleotide, nucleotides, and activated amino acids synthesized, and (2) To get these components in a close proximity, might be well a matter of a fortuitous chance, if not a matter of probability. This point is yet to be clarified by future studies. However, the origin and evolution of the codon assignments may not be a matter of mere accident, but might be a result of the interactions among components of the primitive protein synthesis system. The experimental study on the interaction between nucleic acids and proteins or amino acids is of greater importance. Though such studies so far reported do not seem very promising (Nakashima and Fox, 1972; Saxinger and Ponnamperuma, 1974), and some investigators are disinclined from studying the physicochemical basis of the genetic code (Hoffmann, 1975), the simulation results indicated that even a small amino acid preference could be enough to eventually make a very accurate translation system, and to exclude all other less accurate systems from the biosphere. Furthermore, very solid evidences which convince us that there must be physicochemical bases for specific interactions between nucleic acids and proteins are being accumulated. Such examples are plentiful in biological systems (e.g., Mizushima and Nomura, 1970; Nathans and Smith, 1975). Physicochemical bases for these specific interactions are being investigated (e.g., Helene, 1975), and will yield fruitful results before long. A similar physicochemical basis

EVOLUTION OF THE PROTEIN SYNTHESIS SYSTEM

211

might have served for establishing the genetic code in a primitive genetic apparatus. Several probable forms of activated amino acids as well as free amino acids should be investigated in the interaction studies. Finally we would like to mention some variations of the model. The evolution of the autocatalytic system in the model is stepwise and unidirectional. That is, for instance, grade n molecules (x or y > n >~ 3) can produce only grade 1, 2, n, and n + 1 molecules. However, it may be more likely that there are more ways to change their grades. There might be finite probability of the transition from grade n to, say, grade n - 2, n + 2, etc. Also more than one relatively independent pathway of the evolution seem to be plausible. Therefore, interconnected evolutionary pathways may be more probable and more relevant to the evolution. To examine the competitions among systems in such expanded models is of interest, though it obviously adds considerable amount of complications in actual calculations. It will be interesting too, if the increase of the selectivities and the reproductivities are different from one another. Judging from the results we obtained, it is probable that the route whose accuracies at initial steps may not be so high but both whose accuracies and reproductivities at final steps are higher than others can eventually occupy the entire biosphere, conquering all other systems. Though such modifications seem interesting, it might be wise to wait rather than trying to carry out the calculations of these versions of the model until the experimental study on the interactions accumulates solid data and gives us an insight so that we cantselect more probable values for the variables used in the present study. In this paper, neither did we try to characterize the transitions and the gains in more detailed terms. For instance, we gave the value 10-6 to each comeback probability (GA(3), GN(3), Gx(3), and Gy(3)) in all simulations. It automatically assumes that there are about 10 configurations of amino acids at the selectivity site of the polymerase which is selective, since there are 5x~ 107) ways to place amino acid groups in the selectivity site of the polymerase (A = 10, n = l = 5). Similar arguments apply to the other gains too. These detailed arguments may become significant when the exact role of each amino acid in proteins comes to be known to us. An important development of an experimental technique for this type of study appears to be in progress (Kolata, 1976).

Acknowlegements This work has been supported in part by NASA Grant No. NGR 21-002-317. We thank Dr R. Linebarger, NASA Ames Research Center, for the use of the computer NASA Ames 7600. Dr K. Trabert, NIH, and Dr M. Hasegawa, the Institute of Statistica! Mathematics, have provided interest and discussion for this study. We are indebted to them for their useful comments on the early drafts of this paper. We are also grateful to Professor H. Pattee and Professor M. Ycas for their constructive criticism.

References Campbell, J. H., Lengyel, J. A., and Langridge, J.: 1973, Proc. Natl. Acad. Sci., U.S.A. 70, 1841. Chang, L.M.S.: 1976, Sc&nce 191, 1183.

212

HIROSHI MIZUTANI AND CYRIL PONNAMPERUMA

Chung, N. M., Lohrmann, R., Orgel, L. E., and Rabinowitz, J.: 1971, Tetrahedron 27, 1205. Crick, F.H.C.: 1968, J. Mol. Biol. 38, 367. Crick, F. H. C., and Orgel, L. E.: 1973, Icarus 19, 341. Crick, F. H. C., Brenner, S., Klug, A., and Pieczenik, G.: 1976, Origins of Life 7, 389. Dayhoff, M. O. and McLaughlin, P. J.: 1972, Atlas o f Protein Sequence and Structure, Volume 5. The National Biomedical Research Foundation, p. 111. Delaunay, L, Creusot, F., and Schapira, G. : 1973, Eur. Z Biochem. 39, 305. Eigen, M.: 1971, Natu~avissenschaften 58,465. Fox, S. W. and Krampitz, G.: 1964, Nature 203, 1362. Gitler, C. and Ochoa-Solano, A.: 1968, J. Amer. Chem. Soc. 90, 5004. Hardebeck, H., Krampitz, G., and Wulf, L.: 1968, Arch. Biochem. Biophys. 132, 72. Hartman, H.: 1975, Origins o f Life 6, 423. Helene, C.: 1975, Nucl. Acids Res. 2, 961. Hoffmann, G, W.: 1974, J. Mol. Biol. 86,349. Hoffmann, G, W.: 1975, Annual Rev. Phys. Chem. 26, 123. Hori, H.: 1975, J. Mol. Evol. 7, 75. Ibanez, J. D., Kimball, A. P., and Oro, J.: 1971, Science 173,444. Inoue, T., Nomura, K., and Kimizuka, H. : 1976, Bull. Chem. $oc. Japan 49, 719. lshigami, M. and Nagaso, K.: 1975. Origins of Life 6,551. Jukes, T. H.: 1974, Origins o f Life 5, 30. Kisselev, L. L. and Favorova, O. O.: 1974. Adv. in Enzymology 40, 141. Kolata, G. B.: 1976, Science 191,373. Kunitake, T., Okahata, Y., and Tahara, T.: 1976, Bioorg. Chem. 5,155. Lacey, J. C. and Pruitt, K. M.: 1969, Nature 233, 799. Lipmann, F.: 1965, The Origins of Prebiological Systems and o f their Molecular Matrices, S. W. Fox (ed.), Academic Press, Inc., New York, p. 259. Loftfield, R. B.: 1963, Biochem. J. 89, 82. Mizushima, S. and Nomura, M.: 1970, Nature 226, 1214. Mizutani, H. and Ponnamperuma, C.: 1977, in preparation. Nakashima, T. and Fox, S, W.: 1972,Proc. Natl. Acad. ScL, U.S.A. 69,106. Nathans, D. and Smith, H. O.: 1975,Ann. Rev. Biochem. 44, 273. Noguchi, J., Tokura, S., Komai, T., Kokazi, K., and Azuma, T.: 1971,Z Biochem. 69, 1033. Oro, J., Kimball, A. P., and McReynolds, J.: 1969, Absts. Fed. Eur. Biochem. Soc. 6th Meeting, Madrid, p. 64. Oshima, T.: 1968, Arch. Biochem. Biophys. 126, 478. Paecht-I-Iorowitz, M.: 1974, Origins of Life 5, 173. Photaki, I. and Sakarellou-Daitsiotou, M.: 1976, J. Chem. Soc., Perkin Transactions I, 589. Rigby, P. W. J., Burleigh, Jr., B. D., and Hartley, B. S.: 1974,Nature 251,200. Saxinger, C. and Ponnamperuma, C. : 1974, Origins o f Life 5,189. Schneider-Bernloehr, H., Lohrmann, R., Sulston, J., Weiman, B. J., and Orgel, L. E.: 1969, J. Mol. Biol. 37, 151. Schwartz, A. W. and Fox, S. W.: 1967, Biochem. Biophys. Acta. 134, 9. Sparrow, A. H. and Nauman, A. F.: 1976, Science 192, 524. Weber, A. L and Fox, S. W.: 1973, Biochem. Biophys. Acta. 319, 174. Woese, C.: 1965,Proc. Natl. Acad. Sci., U.S.A. 54, 1546. Woese, C.: 1973, Naturwissenschaften 60, 447.

Appendix I Definition o f Terms Accuracy is generally defined as proper substrate incorporations in terms of its coding table accuracy = total incorporations of substrates into polymer products

EVOLUTION OF THE PROTEIN SYNTHESIS SYSTEM

21. 3

Activity Codon is: the codon which codes an amino acid at the activity site of the polymerase.

Gain is the probability of the selectivity change to the next higher value, and is defined as: gain =

number of the products which increased their selectivity total number of the products produced which have improper monomer(s) at their selectivity site (or, in case of genes, at one or more corresponding codons)

Reproductivity is generally defined as follows: reproductivity is the average number of autocatalytic systems or molecules produced by one complex of a certain grade.

Selectivity is generally defined as rate of proper monomer incorporation x ( H - 1), where H stands for rate of improper monomer incorporation the number of monomer groups differently recognized by the polymerases, i.e., in case of the translation, H = A, and in case of the replication, H = N. selectivity =

Selectivity Codon is: the codon which codes an amino acid at the selectivity site of the polymerase.

Appendix II

Definition of Characters A

Co CAq) c~q) FAq) FS

J,j fl 22 f3 f4 GAO

number of activated ~mino acid groups recognized differently by A-polymerase and by 0-polymerase. accuracy of the translation of 0-polymerase. accuracy of the translation of grade j A-polymerase. accuracy of the replication of grade j N-polymerase. transition probability from grade j A-polymerase to grade z A-polymerase. transition probability from F ( ] I , / 2 , / 3 , / 4 ) i - l to S(u, v, w, z)i. transition probability as defined in Table III. the probability of the formation of grade u N-gene employing grade J3 N-polymerase and grade/1 N-gene as a template (cf. Equation (16) in the text). the probability of the formation of grade v A-gene using grade/3 N-polymerase and grade J2 A-gene as a template (cf. Equation (16) in the text). the probability of the formation of grade w N-polymerase from grade u N-gene and grade j4 A-polymerase (cf. Equation (16) in the text). the probability of the- formation of grade z A-polymerase from grade v A-gene and grade/4 A-polymerase (cf. Equation (16) in the text). gain of A-polymerase to the grade 4 from grade 3 A-gene catalyzed by O-polymerase.

214

HIROSHI MIZUTANIAND CYRILPONNAMPERUMA

gain of A-polymerase to the grade / from g r a d e / - 1 A-gene catalyzed by A-polymerase. gain of N-polymerase to grade 4 from grade 3 N-gene catalyzed by O-polyGNO merase. aNff) gain of N-polymerase to grade ] from grade ] - 1 N-gene catalyzed by A-polymerase. gain of N-gene from grade ] - 1 to grade ] catalyzed by N-polymerase. axq) avq) gain of A-gene from grade ] - 1 to grade ] catalyzed by N-polymerase. number of amino acid residues at the activity site of N-polymerase. k number of amino acid residues at the selectivity site of N-polymerase. 1 number of amino acid residues at the activity site of A-polymerase. m number of amino acid residues at the selectivity site of A-polymerase. n number of nucleotide groups recognized differently by N-polymerase. N number of O-polymerases. Po P a ff )i number of grade ] A-polymerases at ith generation. Pr~q)i number of grade ] N-polymerases at ith generation. reproductivity of O-polymerase. Ro R Aff) reproductivity of grade ] A-polymerase. RNq) reproductivity of grade ] N-polymerase. reproductivity of system S. Rs number of the systems at ith generation, which may be written more S appropriately as Sffl, h , ]3, ]4)i that consists of grade ]l N-gene, grade ]2 A-gene, grade ]a N-polymerase, and grade ]4 A-polymerase. This can be also used as a symbol to specify an autocatalytic system with above specifications. S f f l , ]2, ]3, ]4)i see above. SA(j) selectivity of grade ] A-polymerase. SNff) selectivity of grade ] N-polymerase. To number of sets of genes at the beginning of Model II evolution, or number of sets of polymerases produced by O-polymerases, whichever is smaller. T(jl, ]2)i number of genotype ]1]2 genes at ith generation. x number of the grades of N-polymerase. y number of the grades of A-polymerase. GAff)

Appendix lII Derivations and Formulae o f Equations which Relate (i - 1)th Generation with ith Generation in Model II *

As mentioned in the text, four different reactions are involved in Model II. In each of these polymerization reactions, there are generally two different patterns, One is to form * We cannot, strictly speaking, derive the probabilities in Model II unless the structure of the coding tables is known. We believe, however,that the method is adequate for the present purpose.

EVOLUTION

OF THE

PROTEIN

SYNTHESIS

SYSTEM

I

I

I

I

I

I

I

I

II

H

II

II

[r

II

II

lI

o,

~.~, ~.~ ,.-,

et

z

Z

t

I

%

II

0

8~

et

+

+

II

t~

~

I

I

~

+

H

~

II

215

216

HIROSHI MIZUTAN1AND CYRIL PONNAMPERUMA

active product molecules which may be classified into three categories in reference to the template molecule. Three categories are: (1) increase in grade; (2) maintain the same grade; (3) produce a grade 2 molecule. The production of grade 1 molecules is not included, as it forms inactive systems and does not affect future population. Many reactions belong to this pattern. The transition pattern from the highest grade genes belongs to this pattern too. The upward transition probabilities from the highest are simply defined as being equal to zero. However, in case of grade 2 genes 0'1 or]2 = 2), the product molecule can only (1) increase their grade or (2) maintain their grade, since the decrease of the grade from 2 means the formation of inactive molecules. Therefore, in general, the formulae of the transition probabilities can be grouped into four different types; (1) upward (increase in the grade), (2)stay (hold the same grade,]a or j2 ~> 3), (3) non-upward (hold the same grade,/1 or J2 ---2), (4) downward (produce a grade 2 molecule using higher grade genes). Now, for instance, consider the formation of N-genes through the catalytic activity of grade ]3 N-polymerase. Since N-gene has k codons crucial of N-polymerase activity, CN(ja) k fraction of total product genes can form active systems. To maintain the

TABLE IV Figures and the transition probabilities to be used in Equation (A1) L,L + 1

q~ = 2)

2,]l,Jt + I

(]1 >/3)

]2,12 +

(/'= = 2)

2,

1

], 2,j2,]2 + 1

(]2 /> 3)

fll

( / ' = u - l)

L=

q, = u)

f2,

(J= = v - 1)

f22

(]'2 ~ 0 )

f3=4

,,(w = 2)

y~;

(w = u)

fal

(w=u+l)

f44

(z =

f~

(z =v).

.1",,1

(z=v+l)

L

.t=

.r,

2)

217

EVOLUTIONOF THE PROTEIN SYNTHESISSYSTEM TABLE V Figures and the transition probabilities to be used in Equation (A2)

1st term

2nd term

3rd term

J4

2,3(]2 =2) 2,j2,J2 +l(j2 /> 3)

~ ~

2 , 3 ( v - 1 =2) 2, v - l , v ( v - 1 /> 3)

L

L~

L~

L4

L

L, (J~ = v - 1) L~ q, = v)

h~

L,

w=2

L,

L~

L3

w=3

L,

L~

L,

z=2

L,

L,

L,

z =v

L,

A,

A,

z=v+l

L,

L~

L,

L

L

same grade, 1 codons decisive of N-polymerase selectivity must be repficated exactly as well. Hence the transition probability of stay,f12, is: f12 = CN(]a) k+l. Upward transitions occur at the probability of G x ( h + 1), when a misplacement or more of the bases happen in place of the t selectivity codons. Therefore the upward transition probability,fl 1, is: fl 1 =CN(/3) e x (1 -- CN(j3) t) x Gx(]I + 1). Since the sum o f upward, stay, and downward transition probabilities is equal to the probability of the formation of active N-genes, the downward transition probability, f14, is: f14=CN(j3)Xx (1 -- CN(/'a)l)x (1 -- Gx(]I + 1)). Also the sum of stay and downward transition probabilities is equal to non-upward transition probability, .f13. Therefore f13 = C N ( / 3 ) k x l -- ( 1 - C N ( j 3 ) t ) x G x ~ x + 1)). The same holds true for three other reactions. Table III shows all transition probabilities from S(J'l,j2,j3,J4)i-i to S(u, v, w, z)i. Depending on actual values for u, v, w, z, h , ]2,/3, and/'4, appropriate probability should be. used. If u ~> 3 and v 1> 3, the formation of S(u, v, w, z)i from S(jt,j2, ]3, J4)i- 1 is possible only when/.1 = u - 1 or u, and J2 =-v - 1 or v. If/.1 = u - 1, the transition is upward type. The corresponding transition probability is fl 1. The transition probability is stay type f12, when ]1 = u. Furthermore, i f h = 2, is can be either 2 or 3. In case of all other values o f / a , ]3 can be one of these three values, 2, Jl, o r b + 1. The same consideration applies to the values for j4. In effect, the Equation (15) will be as follows: u~>3, v~>3; u

S(u,v,w,z)i=

Y.

~

~,

~, {FsxS(II,J2,Ia,I4)i_l x R s }

]l=u--1 j~=v--1 j3=Y3 j4 =J4

(al)*

218

HIROSHI MIZUTANI AND CYRIL PONNAMPERUMA

TABLE VI Figures and the transition probabilities to be used in Equation (A3) 1st term

2nd term

2 , 3 ( j l =2)

/

3rd term 2,3(u-1=2)

J3

2 , j ~ , / ~ + 1 (J'~ /> 3)

L

L , ql q, =u) =u - 1) L~

L~

L1

A

L~

L,

L,

w=2

L4

L,

L,

w =u

f,,

L=

L=

w=u+l

L,

L,

L~

z:2

A,

L,

A~

z:3

.f,,

L,

L~

f~

2, u - 1 , u ( u - 1 i> 3)

A

where J3, J4, f l , f2, fs, and f4 are given in Table IV. In case o f u = 2, v>~ 3, the system S(2, v, w, z) i is formed from the downward transition from S(jl, ]2, ]3, ]4)i-1 (downward in regard to N-gene formation, ] i / > 3 and ]2 = v - 1 or v) in addition to the non-upward transitions f r o m S(2, ]2, ]a, ]4)i - 1 (non-upward in regard to N-gene formation,] 2 r- v - 1 or v). The Equation (15) in this case will be as follows: u = 2 , v>-3; o

8(2, v, w, z)i =

Y. Z Z ( F S x S(2,]2,]3,]4)i-1 x R s ) jz=v--1 / s = 2 , 3 j4=3"4 X

Y_, Y_, Z ( F s x S ( / l , v,]3,]4)i_ 1 x R S j1=3 ja=2,jl,jl+1 J4=2, v,v+l X

+ ~_,

~..

Z

(FsxS(]I,v-I,/3,/4)i_lXRS)

Jl =3 J3=2,J~,/i +i J,~=J4

(A2)* where J4, f l , f~, f3, and f4 are given in Table V. Similarly, Equation (15) will be as follows in case o f u i> 3, v = 2.

* By des

S(x, ]'2, x + 1,]'4) i = S(].1, Y, ]'3, Y + 1)i = O.

219

EVOLUTIONOF THE PROTEINSYNTHESISSYSTEM

TABLE VII Transition probabilities to be used in Equation (A4) 1st term

2nd term

3rd term

4th term

fl

fl2

fi4

f13

f14

L

/;~

L~

f=,

L,

w=2

fs2

f3,

f,3

fa,

w =3

fal

fal

fa,

fa,

z=2

f42

f43

f43

L3

z=3

L,

L~

L,

L1

L

L

u>~3, v = 2 . u

S(u, 2, w, z)i =

Z

Z

Z

{ F s x S(jl, 2,/3, J4)i - 1 x RS}

jl=u--1 ja=Ja /.4=2,3 Y

+ Z

E

( F S x S(bl, f2,/3,f4)i_ 1 xRs)

Z

]~=3 ]3=2,u,u+l ]4=2,j2,]2+1, Y

+ ~ Y~ ~, {FsxS(u-I,j2,ja,f4)i-I• /'2=3 1"3=J3 ]4=2,/2,/'2+i (A3)* where J3, f l , f2, f3, and f4 are given in Table VI. The systems both of whose genes are grade 2 can be formed through doubly downward transitions from S(jl, j2, j3, J4)i-a (downward in regard to both N- and A-gene formations, j~ 1> 3 and J2 1> 3) in addition to all other pathways considered before. Therefore Equation (15) will be as follows in case of u = 2, v = 2. u=2, v=2; S(2,2, w,z)i =

Y~

~,

{FsxS(2,2,j3,/4)i_lXR

S)

/'3=2,3 /'4=2,3

x (F S x S(/1,2,/3,f4)i_1

x Rs)

j1=3 j3=2,jl,/t+l J4=2,3

Y

Z

Z

Z

j2=3 ja=2,3 /4=2,j2,j2+l x

+ 2~

( Fs x S(2,h,/3,/4)i-1 • R s )

y

~

E

E

(Fs

j1=3 /2=3 /'a:2,/.t,/'t+l /.4=2,./2,/'2+1

x S(jl, J2, J3, J4) i - 1 x R s ) where f l , f2, f3 and f4 are given in Table VII.

(A4)*

The evolution of the protein synthesis system. I. A model of a primitive protein synthesis system.

THE E V O L U T I O N OF THE P R O T E I N S Y N T H E S I S S Y S T E M L A Model o f a Primitive Protein Synthesis System * HIROSHI MIZUTANI and...
2MB Sizes 0 Downloads 0 Views