Neural Network Approach to Detection of Metastatic Melanoma from Chromatographic Analysis of Urine M. E. Cohen*, D. L. Hudson#, P. W. Banda&,#, M. S. Blois#

California State University, Fresno, University of California, San Francisco, San Jose State University

Abstract

Chromatographic analysis of sera or urine is important in medicine for the evaluation of patients whose clinical status is associated with the presence of specific biochemical markers. Malignant melanoma has been a model for such studies due to the elaboration of melanin precursors and pigment as the tumor metastasizes. Computer-assisted methods for categorizing chromatographic data and clinical status are imperative due to the large number of detectable compounds and possible correlations. In addition, computer-based analysis of the data can readily extract patterns that are not obvious by visual inspection. In this paper, we present a neural network analysis of melanoma chromatographic and clinical data that categorizes subjects into normals, NED patients (No Evidence of Disease), and metastatic patients. The set of marker compounds for metastatic disease represents a significant advance over the correlations derived by visual inspection. Introduction

Chromatographic analysis of physiological fluids is a useful tool in medical applications to detect chemical constituents that may serve as markers for the presence and level of disease, or its absence. In these types of analyses, the objective is to classify the chromatographic results into categories. In the simplest case, this is a two-category problem, indicating either presence or absence of a particular disease, although it may be multi-category, such as determination of one disease out of two or more possibilities. A number of pattern classification techniques have been applied in an attempt to automate the above process. Albano et al. [1] and Wold and Johansson [2] utilized a technique called the SIMCA method which involved fitting a hyperplane to each class, which has since been used in a number of applications [3,4]. Surveys of other pattern recognition techniques show a 0195-4210/91/$5.00 © 1992 AMIA, Inc.

wide variety of applications [5,6], including other applications in chemistry [71, as well as numerous applications in medicine [8,9]. In previous work of the authors, a new pattern recognition technique was applied to chromatographic analysis [10,11]. This approach differed from the SIMCA method in that it was not restricted to linear hyperplanes in n-dimensional space, but rather used non-linear hypersurfaces for category division using non-statistical methods [12,13]. In the work described here, these methods have been extended to permit the development of a neural network model for classification of chromatographic data. The neural network approach was originally introduced more than forty years ago [14]. Neural networks are based loosely on the structure of biological nervous systems, and have been shown to be useful in constructing decision models [15]. This approach emphasizes parallel structures and distributed information processing [16]. At the heart of the neural network model is the learning algorithm by which information is extracted from accumulated data. A number of types of learning algorithms have been tried [17-19]. In this paper, a learning algorithm developed by the authors is utilized [20]. It is a nonstatistical method based on a potential function approach to supervised learning [21,22]. Through this learning algorithm, the structure of a decision surface is ascertained, which can then be used as a prospective classifier. This method is illustrated in the analysis of chromatographic data for melanoma patient urines that categorizes subjects into two categories: no evidence of disease (NED) and metastatic disease. This application is discussed in detail in the following section. Application A general problem in the management of cancer patients is the detection of occult metastatic disease, ideally at an earlier rather than a later state of development. Although no general solution appears at

295

hand, the detection of marker compounds associated with certain tumors has provided one approach to the problem. Malignant melanoma, the cancer of the pigment cell, has been an archetype for this approach through the detection of melanin precursors, collectively called melanogens. It has long been recognized clinically that some patients with advanced melanoma void a urine that is This dark or that darkens upon standing. melanogenuria occurs in only a small proportion of melanoma patients, usually terminal, who survive long enough to develop the requisite tumor burden. The occurrence in the blood or urine of compounds that are involved in melanin synthesis (or possibly in pigment hydrolysis) has long provided a theoretical model for an approach to detecting occult metastases, estimating the tumor burden in patients, and providing early evidence of recurrence. The melanogens are a range of catecholic and indolic compounds, and their substituted derivatives, that appear prominently in the urine of patients with metastatic melanoma. The melanogens are derived from 3,4-dihydroxyphenylalanine (DOPA), the parent compound for the formation of both eumelanin (black) The and phacomelanin (red-brown) pigments. melanogens include DOPA and its derivatives, such as 3-methoxy-DOPA; the DOPAquinone derivatives, such as 2-S-cysteinyl-DOPA and 5-S-cysteinyl-DOPA; the DOPA decarboxylation products, such as DOPAmine, and the DOPAmine metabolites, such as

dihydroxyphenylacetic acid (DOPAC) and homovanillic acid (HVA); the DOPAquinone cyclization products: the 5,6-dihydroxyindoles and their derivatives. A variety of analytical procedures have been used to detect these compounds, and a range of correlations between individual melanogens and clinical status have been reported [23-271. Visual inspection of DPPH chromatograms has provided early evidence for the development of metastatic disease [28]. A previous analysis by visual inspection of the logged data noted that HVA correlated poorly with clinical status [29]. The constituents upon which clinical correlations were based included vanillactic acid, 2-S-cysteinyl-DOPA, 3methoxy-DOPA and 5-cysteinyl-DOPA, and a 5hydroxyindole derivative.

[30,31]. DPPH reacts with a wide range of electron donors that include the melanogens as well as a number of common urinary constituents. Elution positions for the melanin precursors and the common urinary reactants have been determined, and these compounds form the pool from which the neural network model selects the appropriated melanoma markers. Other chromatographic peaks, including some that may be melanoma-related, are known only by retention time, and are not included in the present analysis. The DPPH analyzer that generated the data for the present communication has been previously described [29]. The system represents a version of our instrumentation that was developed for data logging. Melanoma urines were obtained from patients being cared for or seen in consultation at the University of California, San Francisco Melanoma Clinic. Urine samples were also received from other participating institutions (Massachusetts General, New York University, Temple University, UCSF) that were participating at the time in the Malignant Melanoma Clinical Cooperative Group. The patients were classified either as having metastatic disease, or NED by physical examination or conventional laboratory tests and scans. The NED group included patients both at high risk for developing metastases, as well as those who have been surgically "cured" and who are normal for all practical purposes. Urine samples were also obtained from normal volunteers maintained on a constant, formula diet, through the cooperation of the Department of Nutritional Sciences, University of California, Berkeley.

Materials and Methods The procedure that generated the data for the current communication utilized ion-exchange chromatography coupled with post-column detection of suitable reducing compounds via reaction with the stable free radical 1,1-diphenyl-2-picrylhydrazyl (DPPH), a system that is called the DPPH analyzer

296

Neural Network Model The neural network approach utilizes a learning algorithm which attaches weights to parameters which may contribute to the decision making process. An n-dimensional vector x = (xl,...,xn) is defined, where each xi represents a parameter which may be useful in the classification decision. In the application described here, each xi represents a chemical constituent of the urine. The task is then to use data of known classification to attach a weighting factor to each of these parameters. The learning algorithm utilizes supervised learning techniques [21]. The basis of the technique is generalized vector spaces which permits the development of multidimensional non-linear decision surfaces [22]. This learning algorithm has a number of advantages over more traditional back propagation networks: dependent features are easily handled; missing information can be accommodated; convergence of the system is assured. In many applications, a number of the input nodes will represent

dependent information. In this model, no assumptions are made regarding independence of parameters. The model described here is also quite robust in handling missing information. The last point concerning convergence is especially important. Work in the last decade with recursive systems has shown not only that such systems can propagate error, but under some circumstances, the systems will produce chaotic behavior [321. It can be shown theoretically that the method used here will not result in divergence or chaos. The basic learning algorithm is: Read in values for input nodes; Compute value P1. Until no changes Compute Pi IF P1 > 0 and class 1, no change IF Pi < 0 and class 2, no change IF Pi > 0 and class 2, or P1 < 0 and class 1, then Adjust Pi Output decision hypersurface equation with weighting factors, D(x) = Pi(x). The method used is a modification of the potential function approach to pattern recognition. The potential function is defined by

where m is the dimensionality of the data, ai, i= 1,...,k are parameters which may be arbitrarily selected, A is the normalization constant, and vi, i= 1,...,m are assigned values corresponding to the components of the first feature vector. The general form of the decision function is

00

P(X,xk) =

E

Ai O

(x) i (xk)

(1)

i1=1

for k = 1,2,3..., where Vi(x), i=1,2 are orthonormal functions and ki are non-zero real numbers. P1 is computed by substituting the values from the first feature vector for case 1, x1. Subsequent values for Pk are then computed by (2) Pk = Pk-1 + rk P(x,xk) where if Pi < O and class 1 1 if Pi >O and class 2 rk= -1 if correct 0 The orthonormal functions can in fact be replaced by orthogonal functions, since a normalizing factor does not affect the final relative outcome. These functions are chosen from the set of multidimensional orthogonal functions developed by Cohen, represented by: m!

Cn(xl.***Xm)

=

n

m

k=1 (n-k)!(m-n)!

x

p=1

Variable Xi

p

I x w Xwjxixj

(4)

1i=1 j=1 i#j

Chemical Compound

3-Indoxylsulfate

x2

Ascorbic Acid Uric Acid Dihydroxymandelic Acid Vanilmandelic Acid Dihydroxyphenylacetic Acid Vanillactic Acid Homovanillic Acid

x1o

Dihydroxyphenylalanine S3-Methoxytyrosine & 5-S-Cystcinyl-DOPA 5-Hydroxyindole-3-acetic Acid

X3 X4 xs X6 X7 x43 X9

x12 X1.3 x14 x15

2-S-Cysteinyl-DOPA

5-Hydroxyindole-2-carboxylic Acid 5-Hydroxytryptophan

Creatinine The peak area was determined for each of the fifteen known constituents in all of the logged samples. Each peak was then divided by the corresponding maximum, leaving a data set ranging between 0 and 1, indusive, with 0 indicating no peak present for that constituent, and 1 indicating the maximum value. Thus the values between 0 and 1 represent a degree of

i 2=2 i 1=1

a(n, ip) [a(n,ip)+vj I xi

n

Chromatographic Analysis In all, 144 urine samples were analyzed. These included a set of sixty-six normal controls. The melanoma patients were divided into two groups: NED patients and those with clear evidence of metastasis. The three categories of subjects were labelled as follows: Class 1: normal controls; Class 2: NED patients; Class 3: metastatic patients The chromatograms were internally aligned using the common catabolite, uric acid, as a reference marker. A specific peak was determined to be present if it eluted within a timed interval on either side of the elution time for the known compound. For the purpose of the analysis described here, fifteen peaks were identified, as follows:

xi

i2-1 ii-1

ik-1

x x ik=k ikl=k..1 k

(-l)k(m-k) !

wixi =1

+

where n is the number of input nodes. This is the simplest non-linear case. It should be noted that higher order equations can also be generated.

+

n!(m-n)!

n

n

Di (x)=

(3)

vi p

297

presence for that peak. The neural network model was then run to determine the relative importance of each peak in separating samples with the following comparisons: 1: Group 1 versus Group 3; 2: Group 2 versus Group 3; 3: Group 1 versus Group 2 The goal of this analysis of DPPH-detectable urinary constituents is to develop a reliable procedure to determine whether a patient has metastatic disease earlier than can be assessed by conventional clinical methods, or that may be undetected by such methods. Results and Discussion The first comparison considered was Group 1 (normals) versus Group 3 (melanoma patients with metastatic disease). For this comparison, the neural network model selected four variables: x7, x8, x9, x1o. Classification results are shown in Table I. The use of these four variables produces a clear separation of the two groups. Note that one of the discriminators selected is HVA, a compound that was previously cited as correlating poorly with metastatic disease when considered by itself [29]. However, when combined with other compounds by the neural network model, the unique set of markers, including HVA, becomes an important discriminator of metastatic disease. Such a result could not have been anticipated by visual inspection of the data. These findings regarding the value of HVA as a melanogen illustrate the power of an integrated neural network analysis as a decision tool for

clinical evaluations. The second comparison was Group 2 (NED patients) versus Group 3 (metastatic patients). For this comparison, the neural network model selected five variables as the best discriminators: the four variables from above, as well as x6. These results are shown in Table II. As might be expected, while the separation remains high, it is not as strong as the Group 1 versus Group 3 comparison. There is the possibility that some metastatic disease was in fact present in Group 2 but was not clinically detectable at the time. The Group 2 versus Group 3 comparison is particularly important for the potential detection of occult metastatic disease. The selection of a distinct set of marker compounds for this comparison is also an outcome that is not anticipated by visual inspection of the chromatograms. The comparison of Groups 1 and 2 showed little separation, as would be expected if Group 2 were clinically normal at this point in time, although some differences are evident between these groups. A number of continuing studies are planned using this approach to analyze the melanoma urine chromatograms. The same data analysis procedures

that were described here will be utilized, but instead of the fifteen known peaks listed above, all of the chromatographic data will be included in the analysis. There is preliminary evidence that some of the additional peaks are important discriminators. Their identification is under investigation. Secondly, a thorough follow-up will be undertaken to determine the clinical status of the Group 2 (NED) patients. Finally, we are planning a study based on additional urine samples in order to refine the models utilized in this work, with the goal of developing a reliable model for clinical evaluations. TABLE I: GROUP 1 VERSUS GROUP 3

Correct

Incorrect

Percentage

Class 1 Class 2

61 14

5 2

92.4 87.5

Total

75

7

91.5

TABLE II: GROUP 2 VERSUS GROUP 3 Correct

Incorrect

Percentage

Class 2

20 12

4 4

83.3 75.0

Total

32

8

80.0

Class 1

Evaluation The neural network approach to medical decision making appears to be promising. This method differs inherently from the knowledge-based approach in that information is extracted directly from data, rather than from expert input. This puts forth both advantages and disadvantages. The major benefit is the ease with which a new application can be established, since the long, difficult process of knowledge elicitation is bypassed. The establishment of models which are dependent on information inherent in the data which may not have been previously explicitly known is facilitated. On the other hand, the method does not take advantage of expert knowledge which may not be present in the data, and the resulting system does not have the explanation capabilities of a knowledge based system. The most promising approach appears to be the development of hybrid systems which combine both the neural network approach and the knowledge-based approach to take full advantage of all sources of information [33].

298

Certain criteria, however, point to the applicability of the neural network model alone. These are applications such as the one described above in which little expert knowledge exists which is relevant to the decision at hand and for which a large data base of cases of known classification is present or the ability exists to acquire such a data base. The model is useful for any decision process in which reliable data are available. The data can consist of binary, categoric, or continuous input values, and either two-category or multi-category problems can be handled. The use of this neural network approach for analysis of urine chromatograms from patients with melanoma has resulted in a model which identifies previously unknown markers for the presence of metastatic disease. Extension of the model offers the possibility of developing a prospective decision aid which can be used by the clinician as a good indicator of the current state of metastatic disease in each patient at a much earlier stage than previous methods, which in turn can result in specific treatment before the disease advances to a stage at which treatment is ineffective.

Acknowledgments We wish to recognize the assistance of the following in the data collection for the chromatographic samples: Arletta E. Sherry, Mark S. Tuttle, and Larry Selmer. We gratefully acknowledge the contribution of the late Dr. Marsden S. Blois, who devoted many years to the establishment of the Melanoma Clinic and the Melanoma Data Base at UCSF. References C. Albano, et al., Anal. Chim. Acta, 103 (1978) 429. S. Wold, et al., Anal. Chim. Acta 133 (1981) 251. G. Blomquist, et al., J. Chromatog. 173 (1979) 19. E. Jellum, et al. J. Chromatog. 217 (1981) 231. EA. Patrick, Systems, Man, Cybern. Rev. 6 (1977) 4. E. H. Shortliffe, B.G. Buchanan, EA. Feigenbaum, Proc. IEEE 67 (1979) 1207. 7. D. Coomans, et al., Anal. Chim. Acta CTO 133 (1981) 215. 8. M. Ben-Bassat, D.B. Campbell, et al., IEEE Trans. on Pattern Analysis and Machine Intelligence, PAMI-5 (1983) 225. 9. EA. Patrick, Systems, Man, and Cybern. Rev., 6 (1977) 4. 10. M.E. Cohen, D.L. Hudson, et al., J. Chromatog., 382 (1987) 145. 11. M.E. Cohen, D.L. Hudson, Chromatographia, 24 (1987) 891. 12. M.E. Cohen, D.L. Hudson, Lecture Notes in Computer Science, Springer-Verlag, 286 (1987) 245-254.

1. 2. 3. 4. 5. 6.

13. M.E. Cohen, D.L.Hudson, JJ. Touya, P.C. Deedwania, MEDINFO (1986) 614-618. 14. F. Rosenblatt, Principles of Neurodynamics, Perceptrons, and the Theory of Brain Mechanisms, Spartan, Washington, 1961. 15. B. Kosko, International Journal of Approximate Reasoning, 2 (1988) 377-393. 16. B. Widrow, R. Winter, Computer, 21, 3 (1988) 152169. 17. D.E. Rummelhart, J.L. McClelland, and the PDP Research Group, Parallel Distributed Processing, vols. 1 and 2, MIT Press, Cambridge, 1986. 18. J.W. Smith, et al, Computer Applications in Medical Care, 12 (1988) 261-265. 19. D. B. Parker, Proc. American Institute of Physics, Neural Networks for Computing, 1986. 20. D.L. Hudson, M.E. Cohen, Computer Applications in Medical Care, 12 (1988) 251-255. 21. M.E. Cohen, D.L. Hudson, M.F. Anderson, Computer Applications in Medical Care, 13 (1989) 307-311. 22. M.E. Cohen, D.L. Hudson, M.F. Anderson, IEEE Engineering in Biology and Medicine, 1989, 19911992.. 23. J. Duchon and R. Matous, in Pigment Cell, vol. 1 (S. Basel, A.G. Harger, Eds.) (1973) 317-322. 24. M.L. Voorhess, Cancer 26 (1970) 146-149. 25. H. Hinterberger. A. Freedman and R. J. Bartholomew, Clin. Chim. Acta, 39 (1972) 395400. 26. G. Agrup, et al., Acta Dermatol. 55 (1975) 337-341. 27. S. Ito and K. Wakamatsu, J. for Investigative Dermatology, 92 (5) (1989) 261S-265S. 28. M.S. Blois and P.W. Banda, Cancer Research, 36, (1976) 3317-3323. 29. P.W. Banda, M.S. Tuttle, L.E. Selmer, Y.T. Thatachari, A.E. Sherry and M.S. Blois, , Computers and Biomedical Research, 13 (1980) 549-566. 30. P.W. Banda, A.E. Sherry and M.S. Blois, Anal. Chem., 46 (1974) 1772-1777. 31. P.W. Banda, A.E. Sherry and M.S. Blois, in Pigment Cell, vol. 2 (V. Riley Ed.) (1976) 254-263. 32. F.C. Hoppendteadt, Intermittant chaos, selforganization, and learning from synchronous synaptic activity in model neuron networks, Proc. Natl. Acad. Sci., USA, 86, 1989, 2991-2995. 33. D.L. Hudson, M.E. Cohen, Combination of rulebased and connectionist expert systems, International Journal of Microcomputer Applications, 10 (2), 1991, 36-41.

299

Neural network approach to detection of metastatic melanoma from chromatographic analysis of urine.

Chromatographic analysis of sera or urine is important in medicine for the evaluation of patients whose clinical status is associated with the presenc...
862KB Sizes 0 Downloads 0 Views