The Use of Information Theory to Analyze Genomic Changes in Neoplasia WAYNE S. KENDAL Department of Medicine, The Princess Margaret Hospital, Toronto, Ontario, Canada M4X IK9 Received 7 June 1989; revised 18 February 1990

ABSTRACT How the cell maintains and uses its heritable information may be a critical factor in neoplasia. For example, neoplastic development is thought to depend upon the interplay between random genomic instability and nonrandom selective forces. Information theory provides a means to analyze these processes. One may quantitate not only the amount and lability of information contained within a segment of genetic code, but also the genotypic heterogeneity and the degree of selection affecting a population of cells. In addition, it is theoretically possible to monitor genetic information as it is processed by cells during replication, transcription, and translation. These parameters could permit a detailed analysis of the evolutionary changes hypothesized to underlie neoplastic development.

INTRODUCTION What allows a neoplastic cell to abandon the regulated behavior of normal tissue, to proliferate, invade, and ultimately metastasize? The answer remains elusive, although in recent years genetic change has become increasingly implicated in this process [2]. The genetic changes associated with neoplasia are complex. Alterations may occur with oncogenes [2, 181 or with tumor suppressor genes [lo, 251. Multiple loci, multiple steps, and diverse mechanisms seem to be involved [2, 18, 301. However, neither the source of these changes nor the mechanisms of their concerted action are clear. One contemporary view, the clonal selection hypothesis [21], postulates that neoplastic cells have an acquired propensity for heritable change. New mutants arise randomly and are selected for increased proliferative advantage. A neoplasm with increasingly abnormal genetic and biological properties results [ll]. M4THEMATICAL

BIOSCIENCES

OElsevier Science Publishing 655 Avenue of the Americas,

100:143-159

(1990)

Co., Inc., 1990 New York, NY 10010

143 00255564/90/$03.50

144

WAYNE S. KENDAL

Although the clonal selection hypothesis could account for the diverse genetic changes seen in neoplasia, these findings might be attributable to other phenomena. Further substantiation of the hypothesis would require the definition of both the source of the increased mutation and the relationship between mutation and selection. The strongest support for the clonal selection hypothesis comes from qualitative cytogenetic observations [21, 221. However, attempts at a more quantitative analysis, based on mutation rates, have not been as fruitful [4, 8, 131. A more stringent analysis is indicated, but this would require methods not widely available in genetics. The issue of how heritable information is propagated in the presence of random mutation (noise) has an analogy in telecommunications, where a mathematical theory of information has been developed [28]. Methods from information theory are readily adaptable to molecular and cytogenetic observations, enabling the quantitative analysis of heritable information as it is processed and maintained. These methods could allow a more stringent analysis of neoplastic development. (For that matter, these methods could find considerable utility in genetic problems unrelated to neoplasia.) In this article, information theory will be introduced and adapted to the analysis of the genetic changes associated with neoplasia. GENETIC

INFORMATION

In colloquial usage, information denotes knowledge; in information theory it denotes a measurable quantity. For example, if there are n equally probable arrangements for a coded message, then the amount of information, I, encoded within the message is

Z=klnfl.

When the constant k is set at l/(ln2>, the unit of information becomes the bit (binary digit). Information, as defined here, is related to the physical quantity entropy. However, increased entropy indicates increased disorder, whereas information indicates order. This definition for information has two additional properties: First, information is an additive quantity. Second, in the absence of noise, the amount of information is conserved. In genetics, noise could result from random mutation or epigenetic variation; this may be referred to as genomic instability. Its consequence is a decrease in the amount of information conveyed by the message. To measure the information content within a heritable message, such as a sequence of DNA, we require the number of nucleotide base pairs (bp),

INFORMATION

THEORY

AND

u, and the average information

145

NEOPLASIA

per nucleotide,

H. Then the content

is

Z=uH. H is often referred to as the entropy of the signal [28]. It is estimated from the probability of occurrence, Pj, for the jth symbol within a given sequence. In the case of DNA there are N = 4 symbols corresponding to the four nucleotides, and j=N

H=-k

c

Pjlnq.

j=l

The entropy is maximal frequency; that is, Pj = l/N

when all the symbols occur with the same for all values of j. The maximum value is H ,,=klnN.

At this extremum, the mal. On the other hand, ately often throughout The redundancy, R, is

information

conveyed

by the message is also maxi-

if one particular symbol is repeated disproportionthe message, then the message becomes redundant. given by R = I-

H/H,,.

it can carry. The more redundant a message is, the less information Redundancy is a consequence of syntactic structure within the code. Such structure may allow a more efficient and accurate information transfer. The genetic code reveals its intrinsic structure in more than one way. For example, as its syntax changes from codons to amino acids (aa), the maximum information carried by a nucleotide triplet, Hmax, decreases (Table 1). This corresponds to a 28% redundancy, a direct consequence of codon degeneracy. Additional structure may result from the disproportionate use of nucleotides, codons, or amino acids. Table 1 reveals how such disproportionate use may vary between different groups of genes: ssRNA viral genes [9], mammalian genes [9], and poorly or highly expressed E. coli genes [29]. In all cases the nucleotide syntax showed essentially no redundancy. However, there was significant redundancy as a result of unequal codon usage, a well-recognized phenomenon attributed to both selection and the availability of specific transfer RNAs [12]. Notably the highly expressed E. coli genes showed the greatest codon redundancy, and the RNA viruses the least. These differences may reflect a mechanism to influence the relative expression of genes.

146

WAYNE TABLE Syntax Structure

S. KENDAL

1

in Various

Genes

E. coli genes RNA viral

Highly expressed

H,,,,, (bits/bp) H (bits/bp) R Codons

2 2.00 0

2 2.00 0

H,,,,, (bits/codon) H (bits/codon) R Amino acids

6 5.81 0.032 4.32 4.13

Parameter

Poorly expressed

Mammalian genes

Nucleotides

H,,,,, (bits/as) H (bits/as) R

0.044

2 2.00 0

2 1.99 0.005

6 5.08 0.15

6 5.61

6 5.64

0.065

0.060

4.32 4.07

4.32 4.13 0.044

4.32 4.17 0.035

The data are taken from 11 ssRNA viral genes 1121, 27 highly expressed E. coli genes [13], 58 poorly expressed E. coli genes [13], and 26 mammalian genes [12].

In addition, the four groups of genes showed disproportionate use of amino acids (Table 1). However, the redundancies were approximately similar between the groups. This likely indicated certain requirements for protein structure common to all the genes. The hereditary message not only encodes amino acid sequences, but also contains epigenetic information that controls gene expression. This additional information may be encoded through DNA methylation [27] or chromatin binding [7]. An information-based analysis of gene expression would require measurement of the relative activity of individual genes within the genome. Unfortunately, this is difficult to assess at the level of the gene. However, an approximate assessment of gene expression may be garnered from the relative abundances of gene products, such as individual proteins measured by two-dimensional polyacrylamide gel electrophoresis. For example, 891 proteins from E. coli [23] were used to calculate the entropy H and redundancy R for the pattern of gene expression (Table 2). The redundancy was 0.33, indicating the disproportionate production of certain proteins. The redundancy in this context provides a quantitative assessment of cellular specialization. In all these examples, redundancy indicated organization within the hereditary message. Could this organization be disrupted by mutation? Spontaneous mutations occur at a rate of about 1O-9 errors per bp

INFORMATION

THEORY

TABLE Pattern No. of protein

species

of Protein



0.8 0.5 0.2 0.1 0.05 0.03 0.01 0.007 0.003 0.002 0.0005

19 39 53 57 100 119 133 111 117 110 891

abundance redundancy

groups was was 0.33.

99.2

2

Expression

Abundance

3 5 10 15

The relative abundances electrophoretic pattern

147

AND NEOPLASIA

(%I

in E. coli Information

contributed

(bits)

0.806 0.759 1.129 0.836 0.726 0.699 0.528 0.313 0.351 0.158 0.128 0.050 0.037 0.010 6.53

of 891 proteins were measured from a two-dimensional [17], and the information contributed by each of the for the pattern was 9.80 bits, and the totaled. H,,,

replicated [17], which corresponds to a loss of lo-’ bits per nucleotide triplet replicated. (This loss was calculated using methods detailed in the next section.) Certain genomic regions may be more unstable, as within the V-D-J regions of immunoglobulin genes where the mutation rate may be lop3 errors/bp replicated [S] with a consequent information loss of 0.03 bits/triplet replicated. These higher noise levels could potentially disrupt the structure of the genetic code. However, it is presently unknown whether any significant changes in redundancy occur within either immunoglobulin genes or genes of neoplastic cells. Mutation may not significantly affect the code’s redundancy, since the selection to preserve the syntax may be too strong. Nevertheless, in other circumstances, mutation can overcome selection and cause an initially isogenic cellular population to become genetically heterogeneous. This effect is the subject of the next section. GENOMIC

HETEROGENEITY

We have examined the information content within a heritable message, but we have yet to consider the character of this information. Some information may be highly specific, bound presumably by the functional

148

WAYNE S. KENDAL

requirements of the cells. There is also free information, specifying abstract arrangements without any physical or biological requirements [3]. Bound information is sometimes considered a component of free information [3]. In the presence of noise, bound information will be unavoidably converted to the free state. As a cellular population loses bound information, its genotypic and phenotypic heterogeneity increases. The interplay between variation and selection determines the degree of heterogeneity within a cellular population. To measure this heterogeneity, the information-based parameters from the previous section may be used. However, these parameters will have a different use and interpretation, which will be indicated by lowercase symbols. In this situation the entropy function indicates the amount of bound information converted to free information, To estimate the entropy, we measure the probability of occurrence of each code symbol between different copies of the same message at a set position within the message. Let pi denote the probability of occurrence of the jth code symbol at the ith position in the copies. The entropy, hi, is assessed for each position within the message, j=N

hi=-k

c

pilnpj.

j=l

The total entropy for a message composed

of A4 symbols is

i=M

h,=

c

hi,

i=l

and the mean entropy per code symbol is (h) = h,/M.

The maximal entropy per symbol, h,,, is calculated larly the mean redundancy per symbol is

as before,

and simi-

(r) = l- (h)/hn,now indicates the maximal amount of information that However, h,, potentially could be lost to noise (and thus the maximal degree of heterogeneity possible). In addition, (r) gives the proportion of bound information not disrupted by noise. These equations are simple to apply. For example, considerable nucleotide variability occurs within the complementarity-determining region, CDR2, of the immunoglobulin heavy chain locus (V,). Four molecular

INFORMATION

149

THEORY AND NEOPLASIA

clones of the CDR2 region were obtained from a human B cell lymphoma [5], and their nucleotide sequences are given in Table 3. Multiple nucleotide substitutions have occurred. The mean entropy (h) = 0.114 bit/nucleotide. In this context, the greater the value for (h), the greater the heterogeneity within the clones. The mean redundancy (r) = 0.94. It becomes a simple calculation to measure the rate at which genetic information is lost to noise (i.e., the rate of development of heterogeneity). One measures the entropy for a given genomic region at different times in the history of a cell population, and the rate of change then is

h’= (hii,,

-

h initial

I/

n )

where II is the number of population doublings between the two measurements. This rate provides an additional means to assess genetic instability. Probably the measurement of variation (mutation) rate provides a more fundamental index of genetic instability than the entropy rate h’. The latter rate is influenced not only by the de nova generation of new variants but also by the selective propagation of existing variants. However, the variation rate is generally obtained by fluctuation analysis [20], a technique fraught with many difficulties [14]. In fluctuation analysis the rate is estimated only indirectly, from the statistical distribution of the number of variants per population. One must assume the absence of any selection, phenotypic lag, or differences in growth rate between wild and variant cells [14]. In contrast, the entropy rate is measured directly; there are no assumptions regarding the propagation, expression, or growth rate of the variant population. Entropy measurements are readily applicable to nucleotide sequences, karyotypes, and two-dimensional protein electrophoresis-a broad range of genetic analysis. The entropy rate is directly related to genetic heterogeneity; it complements the more standard variation rate. Although the entropy function is influenced by both variation and selection, in certain circumstances the constraining effect of selection can be clearly revealed and measured. For instance, if an initially uniform population were to sustain continued variation without any selection, then the entropy h’ would progressively increase until the maximal level of disorder h,, was reached. In contrast, if there was significant selection, then the entropy, once through its maximum, would progressively decrease as the favored variant predominated and as the bound information increased. This method allows for a more stringent quantitation of selective forces than currently available techniques permit. The influence of selection is illustrated by karyotypic data, taken from a cloned rat mammary adenocarcinoma [24] (Table 4). The frequencies for specific marker chromosomes in clone MTF7 were measured at early and late in vitro passage and were used to calculate the corresponding en-

--_ ___

115 t2(j

bit/bp;

_.. ___

___ ___

GAA

Different

to the maximum

sequence.

___ __C

_G_ ___

___ ___

AAA

-G-G_

ACT

Sequences

possible

nucleotide entropy

bases

3

-__

__.

___

___ ___

___

GCG

__.

___ ___

___

ACA ___ ___ ___ ___

___ ___

CGT

___ _T_

TAT

were sequenced

_._ ___ G__

___

ATA

.__ ___

__. __.

TCG

[19]. Dashes

___ ___

___ _..

GCG

indicate

_.. __A

___ .__

GTG

3

--_._

_.___

GGG

agreement

___ ___

.._ ___

AAA

are indicated by their respective letters. The mean entropy (h) = 0.114 the variability between the clones was 5.7%. (h,,, = 2.0 bit/bp),

B-cell lymphoma

___ ___ ___

__A

TAC

__A -_A

AAT

from Four Clones of the CDR2 Region

from a single human

_-_

___ region

___ ___ .__

AAG

__A G-A __A

AAC

from the CDR2

_-_ ___

___ --.

when compared

with the consensus

Four clones taken

--_ _.-

5’ CAT ATT

Clone LBLL 112

Consensus:

Nucleotide

TABLE

INFORMATION

THEORY

TABLE Marker

151

AND NEOPLASIA

Chromosomal

4

Changes

in Clone MTF7

Early passage Frequency Marker

(%)

Entropy (bits/marker)

2 2 6

M4 M7 Ml0 Ml1 Ml2 M21

0.14 0.14 0.33 0.40 0.14 0.40

8 2 8 8 84 28 6 2 2 52 10

M26 M29 M31 M32 M33 M34

0.40 0.63 0.86 0.33 0.14 0.14 1.00 0.47 Early passage

Late passage Frequency (%o)

Entropy (bits/marker)

4 0 56

0.24 0 0.99

52 0 2

1.0 0 0.14

2 93 74 6 8 0 80

0.14 0.37 0.83 0.33 0.40 0 0.72

10

0.47 Late passage

h max (bits/marker)

1.0

1.0

(h) (bits/marker) h, (bits)

0.39 5.5

0.40 5.6

tropies. The total entropy for all 14 markers did not change, indicating a constant level of overall karyotypic disorder. However, certain markers (MlO, Mll, M29, M33) became more predominant, while others (M21, M23) declined. In four of these cases (M21, M23, M29, M33) the corresponding entropy decreased, an effect attributable to selection. The rate at which the entropy decreased would, in this context, give a measure for the strength of the selective force. In this and the previous section we have seen how heritable information can be measured and characterized and its stability assessed. In the final sections we will consider how dynamic changes to information might influence neoplasia and how they might be measured. NEOPLASTIC

DEVELOPMENT

The proper maintenance and utilization of heritable information is essential to life; the derangement of these processes could be critical to neoplasia. This concept is no more evident than in the clonal selection hypothesis, where random mutation is putatively increased. However, there must be some limit to the extent of the mutation, since unlimited mutation

152

WAYNE

S. KENDAL

would lead to totally disorganized cellular functions and thus would be incompatible with life. Eigen and Schuster [6] considered this limit in the context of the evolution of the species; their conclusions can also be applied to the clonal selection model. They determined that the number of code symbols maintained within a heritable message is limited by the fidelity of the hereditary mechanisms. If v,, is the maximal number of symbols successfully propagated and (Y is the error rate per symbol replicated, then

urnax=

ldfl)/a,

where u denotes the selective advantage of the predominant species compared to the average for all competitors. Since u is logarithmically related to the other terms, only large changes in the selective advantage could influence the remaining terms. The number of symbols in a genome effectively carrying information must remain less than v,,; otherwise the genome would eventually be dissipated by noise (bound information would become free). Eigen and Schuster refer to this possibility as the “error catastrophe” [6]. Its consequence would be the extinction of the population. An error catastrophe might explain the extinction postulated to occur during spontaneous tumor regression or within unknown primary tumors. The intrinsic error rate, (Y,for the hereditary mechanism determines the size of the genome. A primitive bacterium might have an error rate of 1O-6 error/bp replicated and a genome of 3 x lo6 bp, whereas mammalian cells have a rate of about 10P9 error/bp replicated and a genome of 3 X lo9 bp (assuming (T = 20) [6]. If neoplastic cells have a greater intrinsic error rate than their normal counterparts, then they might lose certain heritable functions, since their v,, would be lower. If a tissue were to acquire increased genomic instability, then its cells would sustain errors in both essential housekeeping genes and those genes more related to the function of the tissue and organism as a whole. Those cells with housekeeping errors would die; those with derangements in the other genes might survive-as neoplastic parasites of the host. Thus housekeeping genes would be selectively preserved, whereas the deranged genes might contribute to the neoplastic phenotype. With these concepts in mind, let us take a closer look at the clonal selection hypothesis. As previously mentioned, a neoplasm is thought to develop as a result of the interplay between the random generation of variant cells and the nonrandom selection of those variants with a growth advantage. Nowell, who did much to refine this model, proposed that acquired genetic instability was responsible for the generation of variant cells [21]. He further proposed that variant cells, selected for increased

INFORMATION

THEORY AND NEOPLASIA

153

malignant behavior, would simultaneously be selected for increased genetic lability [21]. These concepts have formed the basis for the current view of neoplastic development. The largest body of data in support of the clonal selection hypothesis is cytogenetic. As tumors become more malignant, both the prevalence and the heterogeneity of karyotypic abnormalities also increase [22]. These observations, although well recognized, have not been subject to much quantitative analysis. There are several reasons why more stringent analyses of the clonal selection hypothesis are uncommon. One difficulty stems from the lack of a clear definition of genetic instability. Cytogenetic abnormalities, although frequently observed, likely reflect only gross changes, and the more biologically relevant changes may instead occur at the molecular level. At this latter level the relevant changes seem diverse: Point mutation, gene amplification, DNA methylation, allele loss, and subtle changes in gene regulation may all be important [2, 10, 181. Thus when more quantitative experiments are contemplated it becomes difficult to decide which parameters should be measured. Moreover, the early quantitative analyses were limited by the parameters that could be examined. Most dealt with the prevalence of genetic changes, and occasionally the rate of change was observed. Thus it is conceivable that critical abnormalities might not be detected by these early methods. However, the use of information theory allows the genetic mechanisms to be examined from additional perspectives. Several new parameters can be monitored: the accumulated error (indicated by h, or hi), the rate of increase in heterogeneity (h’), and the structure of the genetic code (indicated by R). Whether alterations in these latter parameters can detect biologically significant changes remains to be determined. In addition one would have to consider whether the number of successfully propagated code symbols could be affected by genomic instability. A decrease in v might not necessarily occur, since some symbols may not be eliminated as a result of instability but rather might be converted from carrying bound information to carrying free information. In this situation, a decrease in the proportion of message not disrupted by noise, (r), would be detectable and thus would provide tentative support for Eigen and Schuster’s concept of evolutionary limits. Further support for this concept would require the demonstration of an inverse relationship between the error rate and the number of bound code symbols. As an example of how information theoretic methods can be used to examine the clonal selection hypothesis, consider an issue that has been debated considerably over the last several years. Nowell proposed that more highly malignant cells should concurrently possess a greater degree of genetic instability [21]. Support for this notion came from experiments that

154

WAYNE TABLE Development

of Marker

S. KENDAL

5

Chromosomal

Changes

in Clone FlO-2

No. of cells with marker Marker 0 copies 1 copy 2 copies Entropy/marker Total entropy No. of population doublings Rate of entropy increase No. of metastases/mouse (mean

and range)

a

b

C

0 0

8 14

15 13

308 0 1.53

2 1.28

d 25 5

e 29 1

0 0 0.65 0.21 4.78 bits

f 29 1

g 29 1

h 29 1

i 21 3

0 0.21

0 0.21

0 0.21

0 0.47

28.0 0.17 bit/doubling 18 (10-38)

Thirty metaphase spreads of clone B16 FlO-2 were examined on Giemsa-stained slides [15], and the number of copies of each metacentric marker chromosome was recorded for each metaphase. Nine distinguishable markers (a-i) were observed, and each contained from zero to two copies of the individual markers. The entropy for

the marker chromosomal pattern was calculated. Since each clone was expanded from one cell through a measured number of cell divisions, the rate of accumulation of entropy was easily computed.

correlated an increased spontaneous mutation rate with increased metastatic potential [4]. However, other experiments failed to confirm the earlier observations [8, 13, 1.51.Let us now reanalyze some published data regard-

ing the rate of generation of major karyotypic abnormalities in relation to the metastatic potential of the murine B16 FlO melanoma [1.5]. Two clones of the FlO melanoma were raised and allowed to expand through a measured number of population doublings. The clones were then examined for their ability to induce pulmonary metastases in mice after intravenous injection and for the presence of abnormal marker chromosomes within their metaphase plates. When equal numbers of cells were injected into groups of syngeneic mice, clone FlO-2 and clone FlO-5 produced similar numbers of metastases. However, the pattern of Giemsa-stained marker chromosomes was more variable in clone FlO-2. These data are compiled in Tables 5 and 6 where the karyotypic heterogeneity (total entropy) and its rate of increase have been calculated. Although the two clones had similar metastatic abilities, clone FlO-2 was more karyotypically unstable than clone FlO-5. These data are consistent with the view that the selection of tumor cells with a growth advantage is more dependent upon the accumulation of certain genetic changes than on the rate of their production. We have considered how the rate of occurrence of genetic abnormalities might influence neoplastic development. Another feature, not yet considered, is the degree of selection for those cells with growth advantage.

INFORMATION

THEORY

TABLE Development

155

AND NEOPLASIA

of Marker

6

Chromosomal

Changes

in Clone FlO-5

No. of cells with marker Marker 0 copies

1 copy 2 copies Entropy/marker Total entropy No. of population doublings Rate of entropy increase No. of metastases/mouse (mean and range)

b 0 1 1 29 29 0 0.21 0.21 0.42 bits 29.1 0.015 bit/doubling 13 (6-29) a

C

30 0 0 0

Thirty metaphases from clone B16 FlO-5 were examined on Giemsa-stained slides [15] as was done with clone FlO-2. Only three distinguishable markers (a-c) were observed. The total entropy and the rate of entropy increase were less for clone FlO-5 than for clone FlO-2, indicating that the karyotypic heterogeneity and karyotypic instability were correspondingly less. Despite the differences in genetic instability as assessed by this means, both clones possessed similar metastatic abilities.

Conceivably the increased prevalence of genetic changes observed in neoplasms might be due to altered selection. Selection has, in the past, not been easily measured. However, information theory provides a means to this end. The presence of selection would be quantitatively indicated by a decreasing entropy h’ (calculated from the various subpopulations). The rate at which this decrease occurs would indicate the degree of selective advantage possessed by the subpopulation. Thus it is theoretically possible to measure the influence of selection during neoplastic development. The mechanisms proposed in the clonal selection hypothesis are most rigorously analyzed by quantitative methods. It is in this pursuit that the adaptability and precision of information theoretic methods can find considerable use. The biological concepts of tumor heterogeneity, random mutation, and nonrandom selection are melded easily with the information theoretic concepts of entropy, noise, and bound information. Given these tools, further experiments could be devised to examine the many aspects of neoplastic development. CONCLUSION If cancer is indeed a “malady of genes” 121, then our first task is to diagnose and locate the disorder. The mechanisms that process heritable information are complex; thus to restrict the investigation to only one aspect would risk misdiagnosis.

__

-

_

..

I

of heritable

PROTEIN information.

translation

RNA

transcripton

binding,

flows from

chromatin

information

species Heritable

Protein

RNA species

DNA methylation,

which may be determined by mechanisms like DNA methylation Epigenetic information specifies the activity of genes and may be

. .. .. _ _

theory.

information

each

at

formation

level

and

between

levels

can

potentially

be

monitored

using

monitored through the relative prevalence of the specific gene products. Since information is both additive and (in the absence of noise) conserved, the processing of in-

epigenetic information, or chromatin binding.

DNA to RNA and then to protein. There are three levels of processing: replication, transcription, and translation. Two parallel sets of information are simultaneously processed through each level: genetic information, determined by nucleotide sequences, and

FIG. 1. The processing

Amino acid sequences

DNA sequences

etc...

INFORMATION

THEORY AND NEOPLASIA

1.57

Figure 1 diagrams the flow of heritable information during replication, transcription, and translation. Two parallel informational systems are processed: genetic and epigenetic. Although at present it is not technically possible to assess both systems at each level by information-based methods, in theory one could apply the principles of information additivity and conservation to monitor the modulation or loss of information as it is processed. In the clonal selection hypothesis, the primary disorder is located in the replication loop of Figure 1. Only by replication can mutants propagate and be selected. Additional disruptions with either transcription or translation could occur, but these would be of secondary importance, compounding the replicative disorder. The clonal selection hypothesis possesses a seeming incongruity. The definitive features of malignancy-invasion and metastasis-require a complex orchestration of phenotypic changes, inappropriate to normal cellular function [19, 261. It seems remarkable that this complex chain of events could arise by a sequence of random mutations and be orchestrated by environmental selection with any frequency near the observed incidence for malignancy. In fact, certain cancers, for example, retinoblastoma, seem to require only a few mutations for their inception [l, 161. If additional changes were required to express all the complex features of malignancy, then these would have to occur with much greater frequency than is usually expected for somatic mutation; otherwise progression would become improbable. How this could occur without a concomitant increase in the loss of cells due to error catastrophe remains unclear. Alternatively, cancer might result from only a few somatic mutations, with subsequent progression the consequence of imbalanced gene expression. The imbalance might occur within the epigenetic portion of the replicative loop, or perhaps through altered transcription and translation. (Purely transcriptional or translational imbalances would not produce heritable changes except by non-Mendelian means, and thus they would probably not have a role in clonal selection.) Epigenetic variations may occur with greater frequency and less selective disadvantage than somatic mutations. Their possible role in neoplasia might be clarified using the methods outlined in the previous sections. The utility of information theory, perhaps, becomes most apparent when the flow of heritable information through its multiple levels (Figure 1) is considered. We know that in the absence of noise, information should be conserved as it is passed from level to level. A disruption of this transfer could have a considerable effect on a cell population and could conceivably have a role in neoplasia. Granted there exist considerable technical limitations with this sort of analysis; however, an analysis of certain segments of the informational system might be practical within the available technology.

158

WAYNE

S. KENDAL

Given the tremendous pace of technology, these limitations could eventually be surmounted, allowing a more detailed analysis of hereditary mechanisms.

REFERENCES 1 2 3 4

5

6 7 8

9 10 11 12 13

14 15

16 17 18 19 20

D. J. B. Ashley, The two “hit” and multiple “hit” theories of carcinogenesis, Br. J. Cancer 23:313 (19691. J. M. Bishop, The molecular genetics of cancer, Science 235:305 (1987). L. Brillouin, Science and Information Theory, Academic, New York, 1962. M. A. Cifone and I. J. Fidler, Increasing metastatic potential is associated with increasing genetic instability of clones from murine neoplasms, Proc. N&l. Acud. Sci. USA 78:6949 (1981). M. L. Cleary, T. C. Meeker, S. Levy, E. Lee, M. Trela, J. Sklar, and R. Levy, Clustering of extensive somatic mutations in the variable region of an immunoglobulin heavy chain gene from a human B cell lymphoma, Cell 44:97 (1986). M. Eigen and P. Schuster, The hypercycle: a principle of natural self-organization, Nuturwissenschufren 64:541 (1977). S. C. R. Elgin, The formation and function of DNase I hypersensitive sites in the process of gene activation, J. Biol. Chen. 236:19259 (1988). E. Elmore, T. Kakunaga, and J. C. Barrett, Comparison of spontaneous mutation rates of normal and chemically transformed human skin fibroblasts, Cancer Res. 43:1650 (1983). R. Grantham, M. Gautier Gouy, M. Jacobzene, and R. Mercier, Codon usage is a genome strategy modulated for gene expressivity, Nucl. Acids Res. 9:r42 (1981). M. F. Hansen and W. K. Cavenee, Genetics of cancer predisposition, Cancer Res. 47:5518 (1987). G. H. Heppner, Tumor heterogeneity, Cancer Res. 44:2259 (1984). T. Ikemura, Correlation between the abundance of yeast transfer RNAs and the occurrence of the respective codons in protein genes, J. Mol. Biol. 158:573 (1982). W. S. Kendal and P. Frost, Metastatic potential and spontaneous mutation rates: studies with two murine cell lines and their recently induced metastatic variants, Cancer Res. 46:6131 (1986). W. S. Kendal and P. Frost, Pitfalls and practice of Luria-Delbruck fluctuation analysis, Cancer Res. 48:1060 (19881. W. S. Kendal, R.-Y. Wang, T. C. Hsu, and P. Frost, Rate of generation of major karyotypic abnormalities in relationship to the metastatic potential of B16 murine melanoma, Cancer Res. 47:3835 (1987). A. J. Knudsen, Mutation and cancer: a statistical study of retinoblastoma, Proc. Nutl. Acud. Sci. USA 68:820 (1971). T. A. Kunkel and L. A. Loeb, Fidelity of mammalian DNA polymerases, Science 213:765 (1981). H. Land, L. F. Parada, and R. A. Weinberg, Cellular oncogenes and multistep carcinogenesis, Science 222:771 (19831. L. A. Liotta, Tumor invasion and metastasis-the role of the extracellular matrix, Cancer Res. 46:l (1986). S. E. Luria and M. Delbruck, Mutations of bacteria from virus sensitivity to virus resistance, Genetics 28:491 (19431.

INFORMATION

THEORY

AND

159

NEOPLASIA

26

P. C. Nowell, The clonal evolution of tumor cell populations, Science 194:23 (1976). P. C. Nowell, Mechanisms of tumor progression, Cancer Res. 46:2203 (1986). P. H. O’Farrell, High resolution two-dimensional electrophoresis of protein, J. Viol. Chem. 250:4007 (1975). V. Pearce, S. Pathak, D. Mellard, D. R. Welch, and G. L. Nicholson, Chromosome and DNA analysis of rat 13762NF mammary adenocarcinoma cell lines and clones of different metastatic potentials, C/in. Exp. Metastasis 2:271 (1984). B. Ponder, Gene lesions in human tumors, Nature 335:400 (1988). G. Poste and I. J. Fidler, The pathogenesis of cancer metastasis, Nature 283:139

27

(1980). A. Razin

21 22 23 24

25

28 29 30

and A. D. Riggs,

DNA

methylation

and gene

function,

Science 210:604

(1980). E. E. Shannon, in The Mathematical Theory of Communication, E. E. Shannon and W. Weaver, Eds., University of Illinois, Urbana, 1949, Chap. 2. P. M. Sharp and W.-H. Li, Codon usage in regulatory genes in Escherichia cob does not reflect selection for “rare” codons, Nucl. Acids Res. 19:7737 (1986). I. B. Weinstein, The origins of human cancer: molecular mechanisms of carcinogenesis and their implications for cancer prevention and treatment, Cancer Res. 48:4135 (1988).

The use of information theory to analyze genomic changes in neoplasia.

How the cell maintains and uses its heritable information may be a critical factor in neoplasia. For example, neoplastic development is thought to dep...
948KB Sizes 0 Downloads 0 Views