1008

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

Exploring Early Glaucoma and the Visual Field Test: Classification and Clustering Using Bayesian Networks Stefano Ceccon, David F. Garway-Heath, David P. Crabb, and Allan Tucker

Abstract—Bayesian networks (BNs) are probabilistic models used for classification and clustering in several fields. Their ability to deal with unobserved variables and to integrate data and expert knowledge make them an appropriate technique for modeling eye functionality measurements in glaucoma. In this study, a set of BNs is used to simultaneously perform classification of early glaucoma and cluster data into different stages of disease. A novel learning algorithm that combines clustering and quasi-greedy search is also proposed. The classification performances of the models are evaluated on an independent dataset, while the clusters are compared to K-means, previous publications, and direct knowledge. The use of clustering and structure learning enabled the exploration of the visual field patterns of the disease while obtaining good results both on pre- (50% sensitivity at 90% specificity) and post- (85% sensitivity at 90% specificity) diagnosis data. Clusters obtained were insightful and in conformity with consolidated knowledge in the field. Index Terms—Bayesian networks (BNs), clustering, glaucoma, simulated annealing, visual field (VF).

I. INTRODUCTION AYESIAN networks (BNs) are probabilistic graphical models [1]. They have recently shown excellent modeling properties [2] and are widely used in many different fields with good results, both in terms of classification performance and to gain insights into data structure and relations between variables. In fact, the flexibility of BNs and their intuitive results makes it a suitable model for many applications. Moreover, the ability of BNs to deal with missing data and unobserved variables makes them useful for investigating unknown medical or bioengineering problems. Glaucoma is a major eye disease, being the second cause of blindness worldwide [3]. Its mechanisms are not completely known and early medication is still considered the best

B

Manuscript received November 7, 2012; revised April 13, 2013; accepted October 24, 2013. Date of publication November 21, 2013; date of current version May 1, 2014. S. Ceccon and A. Tucker are with the Department of Information Systems and Computing (DISC), Brunel University, London, UB8 3PH, U.K. (e-mail: [email protected]; [email protected]). D. F. Garway-Heath is with the National Institute for Health Research (NIHR) Biomedical Research Centre for Ophthalmology, London, EC1V 2PD, U.K. (e-mail: [email protected]). D. P. Crabb is with the Department of Optometry and Visual Science, City University London, London, EC1V 0HB, U.K. (e-mail: david.crabb.1@ city.ac.uk). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/JBHI.2013.2289367

Fig. 1. Glaucoma impaired vision. Left: simulated view of street scene as seen by Glaucomatous people with superior hemifield defect. Right: visual field sensitivity map for a patient with deep superior hemifield defect (Source: Department of Optometry and Visual Science, City University London, EC1V 0HB, London, UK).

management available [4], [5]. In this context, several tests are used to estimate functional or anatomical measurements of the eye; in turn, these produce great amounts of data that are not fully exploited. The most common test for glaucoma is the visual field (VF) test, which measures the functional ability of an eye by exposing stimuli of differing intensity to various locations of the patient’s VF. Using the patient responses to the stimuli, it is possible to build a sensitivity map of the patient’s field of view, which can then be assessed for diagnosis (see Fig. 1, right). Diagnostic performance for early glaucoma is still insufficient, due to the inter- and intrasubject variability and the high impact of noise in the data. Moreover, there is no gold standard definition of glaucoma, so that different metrics introduce biases in the diagnosis. Thus, in clinical practice the decision is often down to the clinician’s interpretation of test results, i.e., the available data are not fully exploited. Machine learning (ML) techniques may be of great help to the clinician by analyzing the data and providing a reasonable diagnostic outcome, as well as by discovering VF patterns. Several ML techniques have been applied to the task with encouraging results [6]–[9]. The performance of BNs on glaucoma data has been already investigated in [10] and [11]. In [11], simple BNs outperformed widely used classification models on the same dataset used in this study, although using aggregated data. In this paper, we propose a set of BN models to investigate clustering in glaucoma. The configuration of such models allows to simultaneously perform classification and cluster data into different stages of disease. A novel learning algorithm that combines clustering and quasi-greedy search is also proposed. The classification performances of the models are tested on an independent dataset, while the clusters are compared to K-means results, previous publications and direct expert knowledge.

2168-2194 © 2013 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications standards/publications/rights/index.html for more information.

CECCON et al.: EXPLORING EARLY GLAUCOMA AND THE VISUAL FIELD TEST

1009

parameters are known, it is possible to infer the value of each variable given the value of the others. For example, when performing classification, the values of the class node are inferred from the observed data. To set the parameters for a given structure, it is possible to estimate them from the available data. The problem is solved by maximizing the posterior probability of the parameters given the data, i.e., Fig. 2. Left: Na¨ıve Bayes network. Right: clustering Na¨ıve Bayes network. Squares represent class (C) and hidden (H) nodes, while ellipses represent VF sensitivity nodes (V). Arcs encode relations between variables, e.g., the AGISbased classification (node C) is dependent upon VF sensitivities.

II. MATERIALS AND METHODS A. Data In this study two independent datasets were used. In both, data consist of the 52-point VF raw sensitivity values (i.e., not age-corrected) obtained with the Humphrey Field Analyzer II (Carl Zeiss Meditec, Inc., Dublin, CA). A VF map is shown in Fig. 1, with its correspondent simulated vision impairment. A cross-sectional dataset was used for this study [12]. Patients included as early glaucomatous (85 subjects/VFs) had Intraocular Pressure (IOP) > 21 mmHg and clinically evident early VF defects repeatable over at least 3 visits. Healthy people (78 subjects/VFs) had IOP < 21 mmHg and a normal VF defined as those with no point below the threshold given in the Advanced Glaucoma Intervention Study [13] classification (AGIS). Early VF defects were defined as those with a score of 5 or less in the AGIS classification. The dataset used for testing the models is a longitudinal dataset of 43 patients (474 VFs), from an ocular hypertension treatment trial [14], who developed glaucoma in the time span observed (“converters”) and 19 healthy subjects (163 VFs). Initial eligibility criteria for the ocular hypertensive subjects were IOP > 21 mmHg and normal VF test, and conversion was defined as a positive AGIS score on 3 consecutive tests. Healthy subjects had an AGIS score of ‘0’ in 2 baseline tests and IOP < 21 mmHg; there was no stipulation for subsequent VFs. B. Bayesian Networks BNs are probabilistic graphical models that encode variables and their relations in a set of arcs and nodes [1]. The structure of a BN includes a set of arcs, which represent dependency between variables in a “parent–child” manner (see Fig. 2). Directed cycles are not permitted in the structure of a BN. Each node is associated with a variable and its parameters. Each set of parameters describes the conditional probability distribution (CPD) of a variable, i.e., the probability of the values of the variable given the values of its parent nodes. The set of all the CPDs of the BN provide an efficient factorization of the joint probability p (x) =

n 

p (xi |pai )

i=1

where pai are the parents of the node xi (which denotes both node and variable). Therefore, when both the structure and the

θ¯ = arg max p(θ|D, S) = arg max p(D|S, θ)p(θ|S) θ

θ

where S is the structure of the network, D is the data, and θ is the parameters vector. With a fixed structure S and a uniform parameters prior distribution, the latter equation is simplified in maximizing the marginal likelihood function P (D|θ). The structure of the network can be imposed when external knowledge is available or certain assumptions are considered. For example, the Na¨ıve Bayes classifier is a BN model in which all variables are connected to a parent node but not between each other (see Fig. 2, left). Even if the assumption of independence between leaf nodes is very strong, this model performs better than more complex models in many applications given the lower variance of its probability estimates [15]. In [11], Na¨ıve BN outperformed linear regression, K-nearest neighbor (KNN), and multilayer perceptron classifier on aggregated glaucoma data. The structure of a BN can also be learned from data, leading often to interesting insights and better performance. To learn the structure from the data a search-and-score approach is typically employed, i.e., the space of the possible structures is iteratively searched and the best scoring solution is selected. A widely used search technique is the simulated annealing (SA) algorithm [16], which performs a quasi-greedy search by allowing nonoptimal solutions to be explored to escape local minima. Again, a likelihood-based score is often used, although it is necessary to take into account the complexity of the model. In fact, since the likelihood function monotonically increases with the number of parameters, overfitting and estimation problems arise if a simple likelihood-based score is used. A common solution is the Bayesian Information Criterion (BIC) [17] score, which is a likelihood-based score with a correction factor to take into account the model complexity. Clustering corresponds to divide data into clusters so that items in the same group are as similar as possible and items in different groups are as dissimilar as possible. One of the most widely used partition clustering algorithm is the K-means [8]. K-means is a technique that uses the distance between data points (sensitivity values) to identify the centroid of a given number of clusters. C. Clustering BNs BNs offer a framework to perform both classification and clustering. To use BN for clustering, the classification variable (C) is typically treated as unobserved variable (see Fig. 2, left). The data matrix will now consider the unobserved variable as missing data, which can be inferred using the model. To perform this task, the estimation–maximization (EM) algorithm is typically used [18]. In glaucoma, the classical approaches are to perform clustering only on glaucomatous subjects or to

1010

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

discard class information and perform clustering on the whole dataset [7]. However, clustering only on glaucomatous patients is not useful for classification because it doesn’t take into account the condition of healthy subjects. On the other hand, clustering on the whole dataset often results in discovering the apparent differences between healthy and glaucomatous subjects, independently on the class label. We propose here a new BN configuration, instead, which aims to gain performance and insight by applying classification and simultaneously maximizing the informative power of clustering. We refer to these models as to clustering BNs. The right graph in Fig. 2 shows the general configuration of clustering BNs. A class node (C) is linked to all variables, which are in turn dependent also on the clustering node (i.e., H hidden variable). This node is treated as unobserved. Considering the dependencies between the variables in the model, the clustering and the class node are marginally independent. However, it can be demonstrated that they are conditionally dependent on the observed variable VF [19]. This effect is well documented and consists in one or more variables (i.e., parents) converging to an observed one (i.e., son). Given the observed “child”, observing the value of one “parent” will act on the parameters of the other parents. Intuitively, it can be explained by the fact that the parents are competing to explain the observed value. In other words, knowing that a variable belongs to a cluster acts on the probabilistic output for the classification node. In the same way, knowing the class value of the data changes the clustering as well. Since the class node represents the AGIS-dependent “glaucoma” label, the variance associated with the AGIS label is “explained away” by the class node. The clustering uses the remaining variance to separate the data without regards to the AGIS score, which in turn has an impact when performing the classification of new data. The rationale here is based on the fact that there might be other factors, either shared or not by healthy and glaucomatous people, competing to explain the observed values. The hidden variable describes these factors and uses them to perform a better classification. For example, there may be particular VF patterns that are shared by certain people, which are particularly helpful to indicate glaucoma in that group. Capturing and using this difference to calculate the classification probabilistic output could lead to better classification performance. To summarize, clustering BNs are able to characterize the different clusters of people who share patterns and relations not directly dependent on the scoring metric in use, while in the same time exploiting this information in the classification process. Moreover, it is possible to extract the added utility by building a “performance” index for each cluster, which can be directly used in clinical practice. Na¨ıve BN, K-means, and 3 clustering BNs were investigated in this study. The description of the 3 clustering BNs is as follows. 1) Clustering Na¨ıve Bayes: This clustering BN corresponds to the model in Fig. 2 (right). No arcs between leaf nodes are allowed in clustering Na¨ıve Bayes (CNB). 2) Structural Expectation–Maximization: Clustering BN that includes arcs between leaf nodes can be obtained using the SA and EM algorithm. While SA and EM represent two classical approaches to deal respectively with structure learning and clustering with BNs, little has been done in combining these

two techniques. Friedman [20] tackled the matter by mixing a model search with the EM algorithm, although not in relation to clustering but to solve the issue of missing data. To this purpose, he proposed a two-step structural EM (SEM) algorithm. In the first step, the model that maximizes the score with the present instantiation of the parameters is chosen, while in the second step the parameters that maximize the score with the present model are selected. In analogy with the EM algorithm, structural expectation–maximization (SEM) uses the expected values for the unobserved parameters while performing the model search. In our implementation of SEM, the parameters were learned using the EM algorithm, while the structural search was implemented with a gradient search. The iterations for the EM algorithm were fixed at a maximum of 15, while the structural operations were set at 150. About 3000 iterations were carried out. The structure was allowed to change only between the leaf nodes, and only certain structural operations were allowed. In particular, adding a new link and removing a present link were randomly executed at each iteration. 3) Clustering Simulated Annealing: SEM is one of the most commonly used techniques to deal with missing data and structure learning for BNs, although it is not the only method that can be used to obtain clustering BNs with arcs between leaf nodes. In this study we propose a novel technique that performs clustering and structure learning using a SA framework (clustering simulated annealing [CSA]). The difference with respect to SEM lies in treating unobserved data as observed, and using SA to search for both the best clustering and structure. The algorithm starts with a random grouping of the data, i.e., data is assigned to random clusters. At each further iteration, the cluster of one randomly picked patient is changed, until convergence is found. It should be noted that the EM step is skipped in favor of a scoring of the present complete data. This leads to a faster model evaluation, although more iterations are needed to stabilize the groups of patients. While the application of such a technique to BN-based clustering is novel, to our knowledge, the rationale has been investigated by [21] and it has been proved that clustering benefits from the annealing algorithm obtaining many advantages with respect to other methods. The pseudo code of the CSA algorithm follows: Input: Temp, Cooling Factor, Choice Factor, Model, Data, Max Iterations Initialise Score, Best Score Loop on iterations until Max Iterations reached if (Random number is less than Choice Factor) Add, Delete OR Move Random Arc else Assign Random Data Point to Random Cluster end if Update Score and store in New Score Score Diff is New Score – Score if (New Score is higher than Score) Update Score, Model and Data if (New Score is higher than Best Score) Store Model and Data for Output Best Score is New Score

CECCON et al.: EXPLORING EARLY GLAUCOMA AND THE VISUAL FIELD TEST

end if elseif Random number is less than exp(Score Diff/Temp) Update Score, Model and Data end if Temperature is Temperature∗Cooling Factor end Loop Output: Data, Model Where the input parameters Temperature, Cooling Factor, and Max Iterations regulate the convergence of the algorithm, Model is the starting model, and Data is the training dataset in use. Best Score is used to store the current best model BIC score, while Score and New Score hold respectively the BIC score of the current and the model explored at each iteration. The latter is evaluated after either an operation on the model structure or a random reassignment of a data point to a cluster. The probability of carrying out one or the other is regulated by the Choice Factor. Note that only arcs between leaf nodes are permitted. If the new score is higher than the current score, the parameters of the current model and the current score are updated. If the new solution is the best among the explored ones, the results are also stored in Best Score and for final output. If the new solution is worse than the current one, a temperature factor Temperature regulates whether to accept it as a new current solution. This allows the algorithm to escape local minima, which are a common issue in structure learning of BNs. The temperature factor is lowered at each iteration through a cooling factor Cooling Factor. After a number of iteration Max Iterations, the algorithm returns the best explored network and clusters configuration. D. Clusters Evaluation To quantitatively measure the quality of the clusters, the silhouette mean index [22] was used in this study. This measure is based on the comparison of clusters tightness and separation. The difference between sensitivity values was used to measure dissimilarity between elements. A silhouette mean value is close to 1 when elements are appropriately clustered and close to – 1 when they are assigned to the less appropriate clusters. The similarity between different clustering techniques was assessed using the Cohen’s Kappa index [23]. This index can be used to measure the agreement between two clustering techniques using their observed agreement (i.e., elements assigned to the same cluster) and the probability of chance agreement. Kappa value is 1 if there is complete agreement and 0 if there is no agreement. Kappa does not take into account the qualitative similarity between the clusters obtained. III. RESULTS In [11], it has been shown that BNs improve glaucoma classification performance with respect to nearest neighbor (KNN), multilayer perceptron, and logistic regression classifiers. The same datasets were used for these experiments, although data was not aggregated in six sectorial variables before the analysis. In this study, pointwise BN models for classification and clustering were assessed. The results confirmed the high performances

1011

TABLE I PARTIAL AUROC (SPECIFICITY > 80%) FOR BAYESIAN CLASSIFIERS (SEE FIG. 3)

of BN classifiers already found on aggregated data, obtaining a sensitivity of 80% at 90% specificity in discriminating between post-diagnosis glaucomatous patients and healthy subjects. Even if no significant difference was found over the entire ROC curve, the results seem to confirm the higher performances at high specificities of CSA, SEM, and CNB over NB and KNN. This is shown in Table I, where the partial area under the receiver operating characteristic (ROC) curve (AUROC) values are calculated using the more clinically meaningful part of the ROC curve [24]. As suggested in [25], in cases where ROC curves cross, global AUROC comparisons may not be indicative. Moreover, false positives lead to overtreatment and unnecessary cost in glaucoma management, so only high specificities are of interest. The table shows a consistent improvement in performances with the inclusion of the clustering process and structural learning when compared to the Naive Bayes classifier. Among all, the best classifiers are SEM and CSA, which perform similarly to each other. On the prediagnosis data, BN models still performed well, obtaining about 50% sensitivity at 90% specificity. That is, the BN models were able to classify as glaucomatous 50% of the VFs of patients who subsequently were diagnosed with glaucoma, with a false classification rate of 10% of VFs from healthy subjects. Regarding the clusters obtained with the different techniques, K-means clustering obtained a higher mean silhouette value (0.19) than the other models (0.09 ± 0.01). Regarding the similarity among BN models clusters using the Cohen’s kappa, CSA, CNB, and SEM obtained fairly similar clusters (0.31 ± 0.02). A qualitative representation of the clusters obtained with CSA is shown in Fig. 3. The upper section shows the VF maps learnt for each cluster. These were obtained by sampling from the conditional distribution p(X|Ci ), where Ci is the observed cluster label (i = 1, ..., 4) and X is a vector of variables representing each point in the VF map. In the lower section, the distribution of glaucomatous and healthy subjects over clusters in the training dataset for each technique is shown. While relatively uniform distributions were obtained between BN clustering techniques, K-means obtained sparse clusters effectively separating glaucomatous and healthy subjects. In this context, Fig. 4 shows the qualitative results and Table II reports the performances obtained on each cluster. Pre- and Post- diagnosis figures indicate the classification performances of the methods in relation to the AGIS-based diagnosis. Cluster 1, on which the highest discrimination power was found, shows a

1012

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

Fig. 3. Above: Clusters obtained with CSA. Below: Distribution of data points for each cluster obtained with CNB, SEM, K -means (K-M), and CSA. On each histogram, left bar represents the number of healthy VFs and right bar represents the number of glaucomatous VFs in a cluster.

strong hemifields difference (i.e., difference between upper and lower half of the VF). This is a characteristic pattern of glaucoma, and it is the rationale behind the Glaucoma Hemifield test, currently implemented in several screening machines [26]. Cluster 4, which also performed well in classification, particularly in early cases (i.e., before AGIS diagnosis), presents a nasal defect pattern. This is often described in literature as an early indicator of glaucoma [27], [28]. Cluster 3 shows a general peripheral functional impairment, which is another early sign of glaucoma development [29]. Cluster 2 performed rather poorly as a discriminator between converters and healthy subjects, and included healthy subjects with ‘bad’ VF tests and glaucomatous patients with a nonuniform pattern of depressed points. Regarding the structures learned using SEM and CSA, the respective results are shown in Fig. 5. The networks present several similarities, such as dense first order (adjacent location) spatial relations and fewer long relations. However, CSA obtained a sparser network structure with arcuate, or partial arcuate, patterns in the central and peripheral VF, with a similar distribution to the known anatomical distribution of the retinal nerve fiber layer. IV. DISCUSSION The silhouette values were moderate for BN models and higher for the K-means clustering. However, since the silhouette score measures the distance between clusters, a technique that only separates healthy subjects from glaucomatous subjects would be expected to obtain a higher value. Therefore, as shown in the histograms in Fig. 3, K-means clustering performed similarly to a simple discrimination task, being unable to separate the data into groups based on variance other than that

carried by the inclusion criteria used in the data collection, i.e., AGIS metric. The clusters obtained with different BN clustering techniques were rather similar, showing good robustness in the data. This is reflected in a fairly high Kappa values and in a relatively smooth distribution of the data across clusters in Fig. 3. Qualitative results of each technique are not presented in the results, although the patterns showed were observed with all BN techniques. K-means clustering, instead, obtained a different set of clusters, therefore Kappa is not comparable. Regarding the single clusters, Fig. 4 shows the performance for each cluster in relation to the mean age of the subjects in the glaucomatous and healthy groups (left) and the pre- and post- diagnosis condition (barplot on the right). The best discriminating cluster (cluster 1) shows the greatest age difference between healthy and glaucomatous subjects, as well as the largest proportion of cases in the post-diagnosis group. This indicates that a strong hemifield difference is typical of later stages of glaucoma, and when occurring at a younger age is more likely to be glaucomatous. The higher proportion of post-diagnosis patients in a cluster means that the AGIS score is able to capture these “easy” cases too. This is also reflected in a right-skewed histogram above the visual field map in Fig. 4. The second best performing cluster (cluster 4) is instead characterized by an early nasal defect, and it is also placed on the left side of the Fig. 4, meaning that this pattern in younger subjects is more likely to be glaucomatous. In this case, AGIS seems to perform less well than the proposed model, with a high proportion of pre-diagnosis patients falling in this group. This may be due to the inclusion criteria (conversion defined as 3 successive positive AGIS scores) but also on the nature of AGIS scoring. The AGIS score reflects the pointwise sensitivity deviation relatively to the average (age-corrected) healthy state and labels as positive those points showing a depression below a certain threshold. For example, in the nasal area, the threshold is 8 dB loss. The threshold criterion and the reference model may contribute to the observed overlooked cases, i.e., false negatives, obtained by AGIS classification of cases in cluster 4. Our technique, instead, uses absolute pointwise values with different parameterization for each cluster and was able to capture the defect earlier. Cluster 3 obtained acceptable classification performance, modeling mainly post-diagnosis glaucoma developing peripheral VF glaucomatous damage at older age. Cluster 2, instead, is difficult to characterize because of the small number of cases allocated and because of its inhomogeneous appearance, which is probably due to noisy VF tests. Fig. 5 shows the BN structures learned by SEM and CSA algorithms. In comparison to SEM, CSA obtained a sparser and cleaner result. This is probably due to the broader clustering and structure space that the algorithm explores in comparison with SEM. The CSA network structure learned on the training data shows strong agreement with typical progression paths in the VF and stronger similarity to the distribution of the RNFL. Similar dependencies were found in [30], in which a VF filter was found to be in accordance with the accepted physiological shape of the retinal nerve fiber layer. In [31], higher correlations between VF and ONH measures were found in peripheral VF areas, which is confirmed by the higher density of the CSA structure on corresponding sectors.

CECCON et al.: EXPLORING EARLY GLAUCOMA AND THE VISUAL FIELD TEST

1013

Fig. 4. Results of CSA algorithm on the testing dataset. The four clusters are presented in four rows, ordered from top to bottom by classification performances (AUROC and sensitivity at 90% and 80% specificity). The left section of the Fig. shows the VF maps for each cluster for glaucomatous (G) and healthy (H) subjects. VF maps are shown in squares, with the VF maps obtained from G subjects slightly displaced upward with respect to maps obtained from H subjects. The x-axis in the graph is the mean AGE for each group, so that clusters with older G or H visits lie on the right side of the plot. The deviation from the mean AGE is also reported with whiskers for each VF map. Above each G VF map, a histogram showing the distribution of the data points relative to the AGIS diagnosis is shown. The diagnosis is represented with a vertical red line in the centre of the histogram. The right section of the figure shows the number of data points assigned to each cluster, separately for healthy subjects (Controls), prediagnosis (Pre), and postdiagnosis (Post). TABLE II AUROC PER CLUSTER FOR CLUSTERING SIMULATED ANNEALING

provided interesting insight on the patterns of visual field defects. In particular, the clusters obtained on the glaucomatous patients were found to be meaningful and well supported in literature. Future work will look at extending the models to explicit time series modeling, such as in dynamic BNs, the temporal extension of BNs. In fact, this model does not take into consideration time-dependent relations between variables. Also, other data types will be investigated, such as age-related deviation values for VF analysis, intraocular pressure or anatomical measurements used in glaucoma detection (e.g., retinal nerve fiber layer thickness). REFERENCES

Fig. 5. Arcs between variables learned using CSA and SEM. The arcs superimposed to the VF map indicate dependency between pairs of locations, i.e., VF sensitivity values at each location. Dashed lines indicate relations with spatial order inferior to 5.

V. CONCLUSION In this study, a set of BN classifiers was tested on an independent glaucoma dataset, obtaining good results both on pre(50% sensitivity at 90% specificity) and post- (85% sensitivity at 90% specificity) diagnosis data. The extension of the models to include structure learning and metric-independent clustering

[1] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. San Francisco, CA, USA: Morgan Kaufman, 1988. [2] N. Friedman, D. Geiger, and M. Goldszmidt, “Bayesian network classifiers,” Mach. Learn., vol. 2, no. 29, pp. 131–163, 1993. [3] S. Resnikoff, D. Pascolini, D. Ety´aale, I. Kocur, R. Pararajasegaram, G. P. Pokharel, and S. P. Mariott, “Global data on visual impairment in the year 2002,” Bull. World, vol. 012831, no. 4, pp. 844–851, 2004. [4] M. A. Kass, D. K. Heuer, E. J. Higginbotham, C. A. Johnson, J. L. Keltner, J. P. Miller, I. I. Parrish, K. Richard, M. R. Wilson, and M. O. Gordon, “The Ocular Hypertension Treatment Study: A randomized trial determines that topical ocular hypotensive medication delays or prevents the onset of primary open-angle glaucoma,” Arch. Ophthalmol., vol. 120, no. 6, p. 701, 2002. [5] A. Heijl, M. C. Leske, B. Bengtsson, L. Hyman, and M. Hussein, “Reduction of intraocular pressure and glaucoma progression: Results from the Early Manifest Glaucoma Trial,” Arch. Ophthalmol., vol. 120, no. 10, p. 1268, 2002. [6] K. Chan, T. W. Lee, P. Sample, M. H. Goldbaum, R. N. Weinreb, and T. J. Sejnowski, “Comparison of machine learning and traditional classifiers in glaucoma diagnosis,” IEEE Trans. Biomed. Eng., vol. 49, no. 9, pp. 963– 974, Sep. 2002.

1014

IEEE JOURNAL OF BIOMEDICAL AND HEALTH INFORMATICS, VOL. 18, NO. 3, MAY 2014

[7] M. H. Goldbaum, P. A. Sample, K. Chan, J. Williams, T. W. Lee, E. Blumenthal, C. A. Girkin, L. M. Zangwill, C. Bowd, and T. Sejnowski, “Comparing machine learning classifiers for diagnosing glaucoma from standard automated perimetry,” Invest. Ophthalmol. Vis. Sci., vol. 43, no. 1, pp. 162–169, 2002. [8] J. A. Hartigan and M. A. Wong, “Algorithm AS 136: A K-means Clustering Algorithm,” J. Royal Statist. Soc., Series C, vol. 28, no. 1, pp. 100–108, 1975. [9] D. Bizios, A. Heijl, and B. Bengtsson, “Trained artificial neural network for glaucoma diagnosis using visual field data: A comparison with conventional algorithms,” J. Glaucoma, vol. 16, no. 1, pp. 20–28, 2007. [10] A. Tucker, V. Vinciotti, X. Liu, and D. Garway-Heath, “A spatio-temporal Bayesian network classifier for understanding visual field deterioration,” Artif. Intell. Med., vol. 34, no. 2, pp. 163–177, Jun. 2005. [11] S. Ceccon, D. F. Garway-Heath, D. P. Crabb, and A. Tucker, “Investigations of clinical metrics and anatomical expertise with Bayesian network models for classification in early glaucoma,” in Proc. Intell. Data Anal. Biomed. Pharmacol. (IDAMAP 2010). [12] G. Wollstein, D. F. Garway-Heath, and R. a Hitchings, “Identification of early glaucoma cases with the scanning laser ophthalmoscope,” Ophthalmology, vol. 105, no. 8, pp. 1557–1563, Aug. 1998. [13] D. E. Gaasterland, F. Ederer, E. K. Sullivan, J. Caprioli, and M. N. Cyrlin, “Advanced glaucoma intervention study. 2. Visual field test scoring and reliability,” Ophthalmology, vol. 101, no. 8, pp. 1445–1455, 1994. [14] D. S. Kamal, A. C. Viswanathan, D. F. Garway-Heath, R. A. Hitchings, D. Poinoosawmy, and C. Bunce, “Detection of optic disc change with the Heidelberg retina tomograph before confirmed visual field change in ocular hypertensives converting to early glaucoma,” Br. J. Ophthalmol., vol. 83, no. 3, pp. 290–294, Mar. 1999. [15] D. J. Hand and K. Yu, “Idiot’s Bayes— Not so stupid after all?,” Int. Statist. Rev., vol. 2, no. 69, pp. 385–398. [16] R. A. Jacobs, M. I. Jordan, S. J. Nowlan, and G. E. Hinton, “Adaptive mixtures of local experts,” Neural Comput., vol. 3, no. 1, pp. 79–87, 1991. [17] G. Schwarz, “Estimating the dimension of a model,” Ann. Stat., vol. 6, no. 2, pp. 461–464, 1978. [18] A. P. Dempster, N. M. Laird, and D. B. Rubin, “Maximum likelihood from incomplete data via the EM algorithm,” J. R. Stat. Soc., vol. 39, no. 1, pp. 1–38, 1977. [19] M. P. Wellman and M. Henrion, “Explaining ‘explaining away,’,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 15, no. 3, pp. 187–192, 1993. [20] N. Friedman, “The Bayesian structural EM algorithm,” in Proc. 14th Conf. Uncertainty Artif. Intell., 1998, pp. 129–138. [21] S. Z. Selim and K. Alsultan, “A simulated annealing algorithm for the clustering problem,” Pattern Recognit., vol. 10, no. 24, 1991. [22] P. J. Rosseeuw, “Silhouettes: A Graphical aid to the interpretation and validation of cluster analysis,” Comput. Appl. Math., no. 20, pp. 53–65, 1987. [23] J. Carletta, “Assessing agreement on classification tasks: The kappa statistic,” Comput. Linguist., vol. 22, no. 2, pp. 249–254, 1996. [24] D. K. Mc Clish, “Analyzing a portion of the ROC curve,” Med. Decis. Making, no. 9, pp. 190–195, 1989. [25] B. J. Mc Neil and J. A. Hanley, “Statistical approaches to the analysis of the receiver operating characteristic curves,” Med. Decis. Making, vol. 4, no. 2, pp. 137–150, 1984. [26] P. Asman and A. Heijl, “Glaucoma hemifield test: Automated visual field evaluation,” Arch. Ophthalmol., vol. 110, no. 6, pp. 812–819, 1992. [27] P. H. Artes and B. C. Chauhan, “Longitudinal changes in the visual field and optic disc in glaucoma,” Prog. Retin. Eye Res., vol. 24, no. 3, pp. 333– 354, May 2005. [28] E. B. Werner and S. M. Drance, “Early visual field disturbances in glaucoma,” Arch. Ophthalmol., vol. 95, no. 7, pp. 1173–1175, Jul. 1977. [29] M. Yanoff and J. S. Duker, Ophthalmology, 2nd ed. St. Louis, MO, USA: Mosby, 2003. [30] S. K. Gardiner, D. P. Crabb, F. W. Fitzke, and R. A. Hitchings, “Reducing noise in suspected glaucomatous visual fields by using a new spatial filter,” Vis. Res., no. 44, pp. 839–848, 2004. [31] N. G. Strouthidis, V. Vinciotti, A. Tucker, S. K. Gardiner, D. P. Crabb, and D. F. Garway-Heath, “Structure and function in glaucoma: The relationship between a functional visual field map and an anatomical retinal map,” Invest. Opthalmol. Vis. Sci., vol. 47, no. 12, pp. 5356–5362, 2006.

Stefano Ceccon received the B.Sc. degree in biomedical engineering and the M.Sc degree in bioegineering from the University of Padova, Padova, Italy, in 2006 and 2009. He is currently working toward the Ph.D. degree at the School of Information Systems, Computing and Mathematics, Brunel University, London, UK. He is currently the Honorary Research Fellow and member of Glaucoma Research Unit at Moorfields Eye Hospital, London. His current research interests include machine learning, data mining, glaucoma, and Bayesian networks.

David F. Garway-Heath received the M.D. degree on structure/function correlations in glaucoma from the University of London, London, UK, in 2001. He studied medicine at St. Thomas’ Hospital, University of London, London, U.K., and undertook his Residency in ophthalmology at Moorfields Eye Hospital, London. He is currently the International Glaucoma Association Professor of Ophthalmology for Glaucoma and Allied Studies, Consultant Ophthalmologist at Moorfields Eye Hospital, and Visual Assessment and Imaging Theme Leader at the National Institute for Health Research (NIHR) Biomedical Research Centre for Ophthalmology, Moorfields Eye Hospital NHS Foundation Trust, and UCL Institute of Ophthalmology. His current research interests include optic nerve and retinal imaging, psychophysics, tonometry, ocular biomechanics, and risk factors for glaucoma.

David P. Crabb received the degrees in mathematics and statistics at Oxford and Sheffield before completing the Ph.D. degree in visual science in 1996. He is currently a Professor of Statistics and Vision Research at the City University London, London, U.K. Following a Post-Doctoral position at the University College London and a lectureship in Nottingham, he took up his position at City in 2005. He is a Fellow of the Royal Statistical Society and an Honorary Consultant in Visual Science at Moorfields Eye Hospital. His research laboratory focuses on measurement in vision: visual fields, imaging, visual function, eye movements, quality of life measures, and medical statistics.

Allan Tucker received the B.Sc. degree in cognitive science from the University of Sheffield, Sheffield, U.K., in 1996, and the Ph.D. degree in computer science from Birkbeck College, University of London, London, UK, in 2001. For four years, he was a Postdoctoral Fellow at the Centre of Intelligent Data Analysis, Brunel University, Uxbridge, UK. He is currently a Senior Lecturer with the School of Information Systems, Computing and Mathematics, Brunel University, London. His current research interests include machine learning, data mining, Bayesian networks, bioinformatics, and medical informatics.

Exploring early glaucoma and the visual field test: classification and clustering using Bayesian networks.

Bayesian networks (BNs) are probabilistic models used for classification and clustering in several fields. Their ability to deal with unobserved varia...
560KB Sizes 0 Downloads 3 Views