Computers in Biology and Medicine 54 (2014) 180–187

Contents lists available at ScienceDirect

Computers in Biology and Medicine journal homepage: www.elsevier.com/locate/cbm

Gene expression microarray classification using PCA–BEL Ehsan Lotfi a, Azita Keshavarz b,n a b

Department of Computer Engineering, Torbat-e-Jam Branch, Islamic Azad University, Torbat-e-Jam, Iran Department of Psychology, Torbat-e-Jam Branch, Islamic Azad University, Torbat-e-Jam, Iran

art ic l e i nf o

a b s t r a c t

Article history: Received 21 February 2014 Accepted 16 September 2014

In this paper, a novel hybrid method is proposed based on Principal Component Analysis (PCA) and Brain Emotional Learning (BEL) network for the classification tasks of gene-expression microarray data. BEL network is a computational neural model of the emotional brain which simulates its neuropsychological features. The distinctive feature of BEL is its low computational complexity which makes it suitable for high dimensional feature vector classification. Thus BEL can be adopted in pattern recognition in order to overcome the curse of dimensionality problem. In the experimental studies, the proposed model is utilized for the classification problems of the small round blue cell tumors (SRBCTs), high grade gliomas (HGG), lung, colon and breast cancer datasets. According to the results based on 5-fold cross validation, the PCA–BEL provides an average accuracy of 100%, 96%, 98.32%, 87.40% and 88% in these datasets respectively. Therefore, they can be effectively used in gene-expression microarray classification tasks. & 2014 Elsevier Ltd. All rights reserved.

Keywords: Amygdala BEL Emotional neural network Cancer BELBIC Diagnosis Diagnostic method

1. Introduction Every cell in our body contains a number of genes that specify the unique features of different types of cells. The gene expression of cells can be obtained by DNA microarray technology which is capable of showing simultaneous expressions of tens of thousands of genes. This technology is widely used to distinguish between normal and cancerous tissue samples and support clinical cancer diagnosis [27]. There are certain challenges facing classification of gene expression in cancer diagnosis. The main challenge is the huge number of genes compared to the small number of available training samples [47]. Microarray learning data samples are typically gathered from often less than one hundred of patients, while the number of genes in each sample is usually more than thousands of genes. Furthermore, microarray data contain an abundance of redundancy, missing values [7] and noise due to biological and technical factors [25,75]. In the literature, there are two general approaches to these issues including feature selection and feature extraction. A feature selection method selects a feature subset from the original feature space and provides the marker and causal genes [9,4,1] which are able to identify cancers quickly and easily. However, feature extraction methods, normally transforms the original data to other spaces to generate a new set of features containing high information packing properties. In each of these two approaches, the reduced features are applied by a

n

Corresponding author. E-mail address: [email protected] (E. Lotfi).

http://dx.doi.org/10.1016/j.compbiomed.2014.09.008 0010-4825/& 2014 Elsevier Ltd. All rights reserved.

proper classifier to diagnosis. A proper classifier increases the accuracy of detection and can influence the feature reduction step. This paper aims to review these approaches, investigate the recently developed methodology and propose a proper feature reduction-classification method for cancer detection. The organization of the paper is as follows: feature selection methods are reviewed in Section 1.1. Section 1.2 explains the feature extraction methods and Section 2 offers the proposed method. Experimental results on cancer classification are evaluated in Section 3. Finally, conclusions are made in Section 4. 1.1. Feature selection methods Researchers have developed various feature selection methods for classification. Feature selection methods are categorized into three techniques including the filter model [62], wrapper model and embedded model [19]. The filter model considers feature selection and classifier's learning as two separate steps and utilizes the general characteristics of training data to select features. The filter model includes both traditional methods which often evaluate genes separately and new methods which consider gene-togene correlation. These methods rank the genes and select top ranked genes as input features for the learning step. The gene ranking methods need a threshold for the number of genes to be selected. For example Golub et al. [20] proposed the selection of the top 50 genes. Additionally the filter model needs a criterion to rank the genes. Liu et al. [35] and Golub et al. [20] have investigated some filter methods based on statistical tests and information gain. Examples of the filter criterion include Pearson correlation coefficient method [84], t-statistics method [2] and

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187

signal-to-noise ratio method [20]. The time complexity of these methods is O(N) where N shows the dimensionality. It is efficient but they cannot remove redundant genes, the issue studied in recent literature [83,78,14,26,37]. In the wrapper model, a subset is selected and then the accuracy of a predetermined learning algorithm is predicted to determine the properness of a selected subset. In the wrapper model of Xiong et al. [83], the selected subsets learn through three learning algorithms including; linear discriminant analysis, logistic regression and support vector machines. These classifiers should be run for every subset of genes selected from the search space. This procedure has a high computational complexity. Like the wrapper methods, in the embedded models, the genes are selected as part of the specific learning method but with lower computational complexity [19]. The subset selection methods of wrapper model can be categorized into the population-based methods [71,34,53] and backward selection methods. Recently Lee and Leu [34], and Tong and Schierz [69] shed light on the effectiveness of the hybrid model in feature selection. The elements of a hybrid method include Neural Network (NN), Fuzzy System, Genetic Algorithm (GA; [76,23]) and Ant Colony [79]. Lee and Leu [34] examined the GA's ability in the feature selection. Furthermore, the abilities of fuzzy theories have been successfully applied by many researchers [12,72,10]. Tong and Schierz [69] used a genetic algorithm-Neural Network approach (GANN) as a wrapper model. The feature subset extraction is performed by GA and then the extracted subset is applied to learn the NN. These processes are repeated until the best subset is determined. Because of the high dimension data, the GA looks to be a proper strategy for feature selection.

1.2. Feature extraction methods In the literature, there are two well-known methods for feature extraction including principal component analysis (PCA; [78]) and linear discriminant analysis (LDA; [48]). They normally transform the original feature space to a lower dimensional feature transformation methods. PCA transforms the original data to a set of reduced feature that best approximate the original data. In the first step, PCA calculates the data covariance matrix and then finds the eigenvalues and the eigenvectors of the matrix. Finally it goes through a dimensionality reduction step. According to the final step, the only terms corresponding to the K largest eigenvalues are kept. In contrast to the PCA, first LDA calculates the scatter matrices including a within-class scatter matrix for each class, and the between-class scatter matrix. The within-class scatter matrix is measured by the respective class mean, and within-class scatter matrix measures the scatter of class means around the mixture mean. Then LDA transforms the data in a way that maximizes the between-class scatter and minimizes the within-class scatter. So the dimension is reduced and the class separability is maximized. The feature extraction/selection method is the first step in gene expression microarray classification and cancer detection. The second step consists of a classifier learning the reduced features. In the literature, various classifiers have been investigated in order to find the best classifier. It seems that the NN and various types of NN [29,36,57,6,56,68,74,81,69,16], k nearest neighbors [61,13], k-means algorithms [32], Fuzzy c-means algorithm [11], bayesian networks [4], vector quantization based classifier [59], manifold methods [18,80], fuzzy approaches [54,58,30,60], complementary learning fuzzy neural network [64–67], ensemble learning [55,8,27,50], logistic regression, support vector machines [22,5,82,73,63,46,70], LSVM [44], wavelet transform [28] as well as radial basis-support vector machines [51] have been investigated successfully in classification and cancer detection. But the recently developed classifiers

181

such as brain emotional learning (BEL) networks [42] have not been examined in this field. BEL networks are recently developed methodologies that use simulated emotions to aid their learning process. BEL is motivated by the neurophysiological knowledge of the human's emotional brain. In contrast to the published models, the distinctive features of the BEL are low computational complexity and fast training which make it suitable for high dimensional feature vector classification. In this paper, BEL is developed and examined for gene expression microarray classification tasks. It is expected that a model with low computational complexity can be more successful in solving the challenges of high dimensional microarray classification.

2. Proposed PCA–BEL to microarray data classification Fig. 1 shows the general view of the proposed methods and the final proposed algorithm is presented in Fig. 2. In the proposed framework, what's different from published diagnostic methods is the application of BEL model to cancer classification. There are various versions of BEL, including basic BEL [3], BELBIC (BEL based intelligent controller; [45]), BELPR (BEL based pattern recognizer; [39]), BELPIC (BEL based picture recognizer; [43]) and supervised BEL [38,40–42]. They are learning algorithms of emotional neural networks [42]. These models are inspired by the emotional brain. The description of the relationship between the main components of emotional brain is common among all these models. What differs from one model to another is how they formulate the reward signal in the learning process. For example in the model presented by Balkenius and Morén [3], it is not clarified how the reward is assigned. In the BELBIC, the reward signal is defined explicitly and the formulization of other equations is formed accordingly. However, the supervised BEL employs the target value of input pattern instead of the reward signal in the learning phase. So supervised BEL is model free and can be utilized in different applications and here, this version is developed for gene expression microarray classification task. Generally the computational complexity of BEL is very low [39–42]. It is O(n) that make it suitable to use in high dimensional feature vector classification.

Fig. 1. General view of proposed method.

182

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187

Fig. 2. The flowchart of proposed method in learning step.

BEL [42] is inspired by the interactions of thalamus, amygdala (AMYG) [15,17,21,24,31,33,77], orbitofrontal cortex (OFC) and sensory cortex in the emotional brain [42]. The first step is associated with PCA dimension reduction (Fig. 1). Consider the first k-principle components p1, p2,…, pk, they are the outputs of the first step and the inputs of second step. In the second step, this pattern should be normalized between [0 1]. The normalized k-principle components p1, p2,…, pk are outputs of the second step and the inputs of thirds step. Fig. 2 illustrates the details of the proposed method. The input pattern of BEL is illustrated by vector p1, p2,…, pk and the E is the final output. The model consists of two main subsystems including AMYG and the

OFC. The AMYG receives the input pattern including: p1, p2,…, pk from the sensory cortex, and pk þ 1 from the thalamus. The OFC receives the input pattern including p1, p2,…, pk from the sensory cortex only. The pk þ 1 calculated by following formula: pk þ 1 ¼ maxj ¼ 1:::k ðpj Þ

ð1Þ

The vk þ 1 is related to AMYG weight and the wk þ 1 is related to OFC weight. The Ea is the internal output of AMYG which is used to adjust the plastic connection weights v1, v2,…, vk þ 1 (Eq. (6)). The Eo is the output of OFC which is used to inhibit the AMYG output. This inhibitory task is implemented by subtraction of Eo from Ea (Eq. (5)). As the corrected AMYG response, E is the final output

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187

node. It is evaluated by monotonic increasing activation function tansig and used to adjust OFC connection weights including w1, w2, …, wk þ 1 (Eq. (7)). The activation function is as follows: tan sigðxÞ ¼

2 1 1 þ e  2x

ð2Þ

The AMYG, OFC and the final output are simply calculated by following formulas respectively: kþ1

Ea ¼ ∑ ðvj  pj Þ þ ba j¼1 k

ð3Þ

Eo ¼ ∑ ðwj  pj Þ þ bo

ð4Þ

E ¼ tan sigðEa  Eo Þ

ð5Þ

j¼1

Let t be target value associated to nth pattern (p). The t should be binary encoded. So the supervised learning rules are as follows: vj ¼ vj þ lr  maxðt Ea ; 0Þ  pj f or j ¼ 1…k þ 1

ð6Þ

wj ¼ wj þ lr  ðEa  Eo  tÞ  pj f or j ¼ 1…k þ 1

ð7Þ

ba ¼ ba þ lr  maxðt  Ea ; 0Þ

ð8Þ

bo ¼ bo þ lr  ðEa Eo  tÞ

ð9Þ

where lr is learning rate, t is binary target and t  Ea is calculated error, ba is the bias of AMYG neuron and bo is the bias of OFC neuron. The v1, v2,…, vk þ 1 AMYG are learning weights and w1, w2,…, wk þ 1 are OFC learning weights. Eqs. (3)–(9) show the multiple-inputs single-output model. In Figs. 2 and 3, the equations are extended to multiple-inputs multiple-outputs usage. The input training microarray data in Fig. 2 includes the two matrices of P and T. The size of the matrix P is m  s where m is the number of patterns and s is the number of features in each pattern (s⪢k). The size of the matrix T is m  c where c is the number of classes. The targets are encoded binary. So each row of matrix T includes only one “1” and other columns are “0”. In the flowcharts, pi denotes the ith pattern and ti is related target. The learning rate lr can be adaptively adjusted to increase the performance. The final flowchart, Fig. 2, shows this adaptation and the related parameters including the ratio to increase the learning rate (lr_inc) initialized with 1.05, the ratio to decrease learning rate (lr_dec) with the initial value 0.7, the maximum performance increase (minc) with initial the value 1.04, first performance (perf_f; in step 4 of the flowchart) and last performance (perf_l) which can be calculated as MSE. The initial lr¼0.001, the learning weights are initialized randomly (step 3 in the flowchart.) and according to the algorithm, if (perf_l/perf_f)4minc then lr¼lr  lr_dec, else if (perf_loperf_f), lr¼ lr  lr_inc. In the Fig. 2, the stop criterion is to reach a determined learning epoch. The stop criterion can be the maximum epoch, which means the maximum number of epochs has been reached (for example 10,000 epochs). Fig. 2 presents the learning step and Fig. 3 shows the flowchart of the testing step. The inputs of the algorithm presented in Fig. 3 are a testing pattern, number of classes and the weights adjusted in the learning step. The last step of the algorithm is associated to the diagnosis where the index of the maximum E shows the class number of the pattern.

3. Experimental studies The source code of the proposed method is accessible from http:// www.bitools.ir/tprojects.html and it is evaluated to classify the gene expression microarray data of 4-class complementary DNA (cDNA) microarray dataset of the small round blue cell tumors (SRBCTs), high grade gliomas (HGG), lung, colon and breast cancer datasets. The

183

SRBCTs dataset is a 4-class cDNA microarray data and contains 2308 genes and 83 samples including 29 samples in Ewing's sarcoma (EWS), 25 in rhabdomyosarcoma (RMS), 18 in neuroblastoma (NB) and 11 in Burkitt lymphoma (BL). This data set can be obtained from http://research.nghri.nih.gov/microarray/Supplement/. In the proposed algorithm, the maximum learning epoch¼10,000, k¼100 and initial lr is set at 0.001, 0.000001 and 0.001 for SRBCT, HGG and lung cancer datasets respectively. These parameters are picked empirically. The value k¼100 and lr¼ 0.001 and 0.000001 can show better results for these datasets. However, in other applications these parameters should be optimized. The HGG dataset applied here, consist of 50 samples with 12,625 genes including 14 classic glioblastomas, 14 non-classic glioblastomas, 7 classic anaplastic oligodendrogliomas and 15 non-classic anaplastic oligodendrogliomas. HGG dataset is accessible from http://www.broadinstitute.org. In this dataset, the number of patterns much less than the number of the features in each sample and it may be difficult for classification methods to classify the data. In the lung cancer dataset, there are 181 tissue samples in two classes: 31 points are malignant pleural mesothelioma and 150 points are adenocarcinoma. Each sample is described by 12,533 genes. This data set is also accessible from http://datam.i2r.a-star.edu.sg/datasets/. Other datasets, applied here, are colon and breast cancer datasets that are accessible from http://genomics-pubs.princeton.edu/oncology/ affydata/index.html and http://datam.i2r.a-star.edu.sg/datasets/krbd/ BreastCancer/BreastCancer.html, respectively. Colon dataset includes 62 tissue samples with 2000 genes and the breast cancer dataset consist of 97 samples and 24,481 genes. Here and prior to entering comparative numerical studies, let us analyze the computational complexity of the proposed BEL. Regarding the learning step, the algorithm adjust O(2n) weights for each pattern-target sample, where n is the number of input attributes (for example for HGG database n¼12,625). Let us compare the computational complexity with traditional neural networks and a supervised orthogonal discriminant projection classifier (SODP; [80]) applied in cancer detection. As mentioned above, the computational complexity of the proposed classifier is O(n). In contrast, computational time is O(cn) for neural network and it is O(n2) for SODP. In NN architecture, c is the number of hidden neurons (generally c¼10) and SODP uses a Lagrangian multiplier that imposes the complexity of O(n2). So the proposed method has a lower computational complexity. This improved computing efficiency can be important for high dimensional feature vector classification and cancer detection. The key to the proposed method is the fast processing resulting from low computational complexity that makes it suitable for cancer detection. Another important point which is observed across in the experimental implementations is that the results of the proposed model can change by changing the initial lr and k values. lr indicates the learning rate and k specifies the number of PCA's initial k component in the algorithm. In other words, the value of lr and k should be optimized in each problem. Here, the optimum values of 0.001, 0.000001, 0.001, 0.00001 and 0.000001 are assigned to lr and 100 to k for SRBCT, HGG, lung, colon and breast cancers respectively. The values assigned to lr are obtained from 0.1, 0.001, 0.0001, 0.00001… 0.0000000001 and in the case of k from the values of 10, 50 and 100 through implementation and observation. The proposed method is compared with the results of the methods which have been reported by Zhang and Zhang [80]. They have reported the results based on the 5-fold cross validation method. This implementation can result in the assessment of accuracy and repeatability and it can be used to validate the proposed method [46]. The compared methods include supervised locally linear embedding (SLLE), probability-based locally linear embedding (PLLE), locally linear discriminant embedding (LLDE), constrained maximum variance mapping (CMVU), orthogonal

184

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187

Fig. 3. The testing step of proposed method to class diagnosis of an input issue.

SRBCT

105.00 100.00 95.00 90.00 85.00

SLLE

PLLE

ODP

LLDE

CMVU

SODP

PCA-BEL

Fig. 4. The accuracy comparison between various methods and proposed PCA–BEL in SRBCTs classification problem.

Fig. 5. The accuracy comparison between various methods and proposed PCA–BEL in HGG classification problem.

Fig. 6. The accuracy comparison between various methods and proposed PCA–BEL in the lung cancer classification problem.

Fig. 7. The accuracy comparison between various methods reported by Zhang and Zhang [80] and proposed PCA–BEL in the colon classification problem.

discriminant projection (ODP) and supervised orthogonal discriminant projection (SODP). These methods are extended manifold approaches that have been successfully used in tumor classification. SLLE, PLLE and LLDE are extended versions of the locally linear embedding (LLE) that is a classical manifold method. SODP is an extended version of ODP and CMVU is a linear approximation of multi-manifolds learning method. Figs. 4–8 show the comparative results based on average accuracy of 5-fold cross validation. As illustrated in the figures, the proposed model shows consistent results and provides higher performance in SRBCT, HGG and Lung cancer (Figs. 4–6). Table 1 presents the percentage improvement of PCA–BEL with respect to the best compared method reported by [80]. The best method in SRBCT and HDD detection is a supervised orthogonal discriminant projection (SODP) algorithm with 96.56% and 73.74% average accuracy while in lung cancer classification the best method is a locally linear discriminant embedding (LLDE) with average accuracy 93.18%. The proposed method improves these results.

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187

185

It seems that SRBCT and lung cancer are rather simple challenges for the classifiers in terms of complexity, since the best compared classifiers i.e. SODP and LLDE (Table 1 and Figs. 8 and 6) have been able to exhibit a detection precision of 96.56% and 93.18%. The proposed model improves these numbers by 3.56%

and 5.52% turning the accuracy into 100% and 98.32% for SRBCT and lung cancer, respectively (Table 1) At any rate the detection precision of the proposed model is very significant for HGG. It seems that this dataset is too complex for other classifiers, because the best detection precision achieved for HGG is 73.74% using the SODP method (refer to Table 1 and Fig. 5). The proposed PCA–BEL has been able to effect a 30.18% improvement which results in 96% precision rate. However, the results of colon and breast cancers, obtained from PCA–BEL, are 87.40% and 88% accuracy which does not show any significant improvement compared to the existing methods (Figs. 7 and 8). The percentage improvement of the proposed PCA–BEL is summarized in Table 1 and calculated by the following formulas:

Fig. 8. The accuracy comparison between various methods reported by Zhang and Zhang [80] and proposed PCA–BEL in the breast cancer classification problem.

Percentage improvement ¼ 100  ðpropose method result–compared resultÞ=ðcompared resultÞ

Table 1 Percentage improvement of classification of the small round blue cell tumor (SRBCT), high grade gliomas (HGG) and Lung cancer, obtained from proposed method. The compared methods are the supervised orthogonal discriminant projection classifier (SODP) and locally linear discriminant embedding (LLDE) which are the best compared methods (Figs. 4–6). Problem

SRBCT

HGG

Lung cancer

Compared method Detection accuracy of compared method Detection accuracy of our PCA–BEL method Percentage improvement

SODP 96.56% 100% 3.56%

SODP 73.74% 96% 30.18%

LLDE 93.18% 98.32% 5.52%

Table 2 The statistical results of proposed PCA–BEL in three following improved problems: the small round blue cell tumor (SRBCT), high grade gliomas (HGG) and Lung cancer datasets. The rows 2, 3…, 5 show the detection accuracy of the folds and the remaining rows present the statistical information including maximum, mean, standard deviation (STD) of the results, and the confidence level (ConfiLevel) based on the Student's t-test with 95% confidence. Foldnumber

SRBCT (%)

HGG (%)

Lung cancer (%)

F#1 F#2 F#3 F#4 F#5 Max Average STD ConfiLevel

100.00 100.00 100.00 100.00 100.00 100.00 100.00 0.00 0.00

100.00 80.00 100.00 100.00 100.00 100.00 96.00 8.94 11.10

100.00 94.40 97.22 100.00 100.00 100.00 98.32 2.50 3.10

ð10Þ As illustrated in Table 1, the average accuracy of SRBCT, HGG and lung cancer classification are 100%, 96% and 98.32% respectively obtained from proposed PCA–BEL. Table 2 shows the statistical details of the improved results. The confidence level (confiLevel) in Table 2 is the Student's t-test with 95% confidence. Finally Fig. 9 shows the averaged confusion matrix including accuracy, precision and recall of improved results obtained from proposed PCA–BEL in 5-fold. In Fig. 9a, the class numbers 1, 2, 3 and 4 belong to EWS, RMS, BL and NB respectively. In the experimental results, 10,000 cycles is considered as the maximum number of learning cycles in every run. However this parameter can change for different problems. The maximum number of cycles for the model is 220 in order to reach convergence and 100% accuracy while there is a need for more than 8000 or even the whole 10,000 cycles to reach convergence in some folds of HGG and lung cancer datasets. This parameter should preferably have the maximum value and considering the low calculation complexity of the method, increasing the number of learning cycles even to 100,000 will result in an acceptable calculation time in modern computers.

4. Conclusions In this paper, a novel gene-expression microarray classification method is proposed based on PCA and BEL network. In contrast to the many other classifiers, the proposed method shows lower computational complexity. Thus BEL can be considered as an alternative approach to overcome the curse of dimensionality

Fig. 9. The averaged confusion matrix of improved problems including (a) SRBCT, (b) HGG and (c) lung cancer datasets.

186

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187

problem. The proposed model is accessible from http://www. bitools.ir/projects.html and is utilized for classification tasks of SRBCT, HGG, lung, colon and breast cancer datasets. According to the experimental results, the proposed method is more accurate than traditional methods in SRBCT, HGG and lung datasets. PCA– BEL improves the detection accuracy about 3.56%, 30.18% and 5.52% obtained respectively from SRBCT, HGG and lung cancer. The results indicate the superiority of the approach in terms of higher accuracy and lower computational complexity. Hence, it is expected that the proposed approach can be generally applicable to high dimensional feature vector classification problems. However, the proposed approach has a drawback. Like many other methods that used PCA, this method has not just extract the informative gens. As mentioned in Section 1, PCA is a feature extraction method and cannot select the features. For future improvements the informative genes should be determined. To determine the informative gens, the proposed method should apply a feature selection step. This issue can be considered as the next step of this research effort i.e. a proper feature selection method should be found and replaced by PCA step of the proposed method. Furthermore, in order for the proposed method to provide a proper response in other cancer classification problems, lr and k parameters should be specifically optimized for each problem. This issue can also be considered for the future works and on the other datasets such as prostate cancer.

[17]

[18]

[19] [20]

[21] [22] [23]

[24] [25] [26]

[27]

[28]

Conflict of interest

[29]

There is no conflict of interest. [30]

References [1] M. Alshalalfa, G. Naji, A. Qabaja, R. Alhajj, Combining multiple perspective as intelligent agents into robust approach for biomarker detection in gene expression data, Int. J. Data Min. Bioinform. 5 (3) (2011) 332–350. [2] P. Baldi, A.D. Long, A Bayesian framework for the analysis of microarray expression data: regularized t-test and statistical inferences of gene changes, Bioinformatics 17 (6) (2001) 509–519. [3] C. Balkenius, J. Morén, Emotional learning: a computational model of amygdala, Cybern. Syst. 32 (6) (2001) 611–636. [4] R. Cai, Z. Zhang, Z. Hao, Causal gene identification using combinatorial V-structure search, Neural Netw. 43 (2013) 63–71. [5] A.H. Chen, C.H. Lin, A novel support vector sampling technique to improve classification accuracy and to identify key genes of leukaemia and prostate cancers, Expert Syst. Appl. 38 (4) (2011) 3209–3219. [6] J.H. Chiang, S.H. Ho, A combination of rough-based feature selection and RBF neural network for classification using gene expression data, NanoBiosci. IEEE Trans. 7 (1) (2008) 91–99. [7] W.K. Ching, L. Li, N.K. Tsing, C.W. Tai, T.W. Ng, A. Wong, K.W. Cheng, A weighted local least squares imputation method for missing value estimation in microarray gene expression data, Int. J. Data Min. Bioinform. 4 (3) (2010) 331–347. [8] D. Chung, H. Kim, Robust classification ensemble method for microarray data, Int. J. Data Min. Bioinform. 5 (5) (2011) 504–518. [9] Y.R. Cho, A. Zhang, X. Xu, Semantic similarity based feature extraction from microarray expression data, Int. J. Data Min. Bioinform. 3 (3) (2009) 333–345. [10] J. Dai, Q. Xu, Attribute selection based on information gain ratio in fuzzy rough set theory with application to tumor classification, Appl. Soft Comput. 13 (1) (2013) 211–221. [11] D. Dembele, P. Kastner, Fuzzy C-means method for clustering microarray data, Bioinformatics 19 (8) (2003) 973–980. [12] Z. Deng, K.S. Choi, F.L. Chung, S. Wang, EEW-SC: Enhanced Entropy-Weighting Subspace Clustering for high dimensional gene expression data clustering analysis, Appl. Soft Comput. 11 (8) (2011) 4798–4806. [13] M. Dhawan, S. Selvaraja, Z.H. Duan, Application of committee kNN classifiers for gene expression profile classification, Int. J. Bioinform. Res. Appl. 6 (4) (2010) 344–352. [14] C. Ding, H. Peng, Minimum redundancy feature selection from microarray gene expression data, J. Bioinform. Comput. Biol. 3 (02) (2005) 185–205. [15] J.P. Fadok, M. Darvas, T.M. Dickerson, R.D. Palmiter, Long-term memory for pavlovian fear conditioning requires dopamine in the nucleus accumbens and basolateral amygdala, PloS One 5 (9) (2010) e12751. [16] F. Fernández-Navarro, C. Hervás-Martínez, R. Ruiz, J.C. Riquelme, Evolutionary generalized radial basis function neural networks for improving prediction

[31]

[32] [33]

[34] [35]

[36]

[37] [38]

[39] [40]

[41]

[42] [43]

[44]

[45]

[46]

accuracy in gene classification using feature selection, Appl. Soft Comput. 12 (6) (2012) 1787–1800. R. Gallassi, L. Sambati, R. Poda, M.S. Maserati, F. Oppi, M. Giulioni, P. Tinuper, Accelerated long-term forgetting in temporal lobe epilepsy: evidence of improvement after left temporal pole lobectomy, Epilepsy Behav. 22 (4) (2011) 793–795. J.M. García-Gómez, J. Gómez-Sanchs, P. Escandell-Montero, E. Fuster-Garcia, E. Soria-Olivas, Sparse Manifold Clustering and Embedding to discriminate gene expression profiles of glioblastoma and meningioma tumors, Comput. Biol. Med. 43 (11) (2013) 1863–1869. S. Ghorai, A. Mukherjee, P.K. Dutta, Gene expression data classification by VVRKFA, Procedia Technol. 4 (2012) 330–335. T.R. Golub, D.K. Slonim, P. Tamayo, C. Huard, M. Gaasenbeek, J.P. Mesirov, E.S. Lander, Molecular classification of cancer: class discovery and class prediction by gene expression monitoring, Science 286 (5439) (1999) 531–537. E.M. Griggs, E.J. Young, G. Rumbaugh, C.A. Miller, MicroRNA-182 regulates amygdala-dependent memory formation, J. Neurosci. 33 (4) (2013) 1734–1740. I. Guyon, J. Weston, S. Barnhill, V. Vapnik, Gene selection for cancer classification using support vector machines, Mach. Learn. 46 (1) (2002) 389–422. C. Gillies, N. Patel, J. Akervall, G. Wilson, Gene expression classification using binary rule majority voting genetic programming classifier, Int. J. Adv. Intell. Paradig. 4 (3) (2012) 241–255. O. Hardt, K. Nader, L. Nadel, Decay happens: the role of active forgetting in memory, Trends Cogn. Sci. 17 (3) (2013) 111–120. H. Hong, Q. Hong, J. Liu, W. Tong, L. Shi, Estimating relative noise to signal in DNA microarray data, Int. J. Bioinform. Res. Appl. 9 (5) (2013) 433–448. D.S. Huang, C.H. Zheng, Independent component analysis-based penalized discriminant method for tumor classification using gene expression data, Bioinformatics 22 (15) (2006) 1855–1862. N. Iam-On, T. Boongoen, S. Garrett, C. Price, New cluster ensemble approach to integrative biological data analysis, Int. J. Data Min. Bioinform. 8 (2) (2013) 150–168. A. Jose, D. Mugler, Z.H. Duan, A gene selection method for classifying cancer samples using 1D discrete wavelet transform, Int. J. Comput. Biol. Drug Des. 2 (4) (2009) 398–411. J. Khan, J.S. Wei, M. Ringner, L.H. Saal, M. Ladanyi, F. Westermann, P.S. Meltzer, Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks, Nat. Med. 7 (6) (2001) 673–679. M. Khashei, A. Zeinal Hamadani, M. Bijari, A fuzzy intelligent approach to the classification problem in gene expression data analysis, Knowl.-Based Syst. 27 (2012) 465–474. J.H. Kim, S. Li, A.S. Hamlin, G.P. McNally, R. Richardson, Phosphorylation of mitogen-activated protein kinase in the medial prefrontal cortex and the amygdala following memory retrieval or forgetting in developing rats, Neurobiol. Learn. Mem. 97 (1) (2011) 59–68. Y.K. Lam, P.W. Tsang, eXploratory K-Means: a new simple and efficient algorithm for gene clustering, Appl. Soft Comput. 12 (3) (2012) 1149–1157. R. Lamprecht, S. Hazvi, Y. Dudai, cAMP response element-binding protein in the amygdala is required for long-but not short-term conditioned taste aversion memory, J. Neurosci. 17 (21) (1997) 8443–8450. C.P. Lee, Y. Leu, A novel hybrid feature selection method for microarray data analysis, Appl. Soft Comput. 11 (1) (2011) 208–213. H. Liu, J. Li, L. Wong, A comparative study on feature selection and classification methods using gene expression profiles and proteomic patterns, Genome Inform. Ser. 13 (2002) 51–60. B. Liu, Q. Cui, T. Jiang, S. Ma, A combinational feature selection and ensemble neural network method for classification of gene expression data, BMC Bioinform. 5 (1) (2004) 136. Y. Liu, Wavelet feature extraction for high-dimensional microarray data, Neurocomputing 72 (4) (2009) 985–990. E. Lotfi, M.R. Akbarzadeh-T, Supervised brain emotional learning. IEEE International Joint Conference on Neural Networks (IJCNN), 2012, pp. 1–6, http:// dx.doi.org/10.1109/IJCNN.2012.6252391. E. Lotfi, M.R. Akbarzadeh-T, Brain Emotional Learning-Based Pattern Recognizer, Cybern. Syst. 44 (5) (2013) 402–421. E. Lotfi, M.R. Akbarzadeh-T, Emotional brain-inspired adaptive fuzzy decayed learning for online prediction problems, in: 2013 IEEE International Conference on Fuzzy Systems (FUZZ), pp. 1–7, IEEE, 2013, July). E. Lotfi, M.R. Akbarzadeh-T, Adaptive brain emotional decayed learning for online prediction of geomagnetic activity indices, Neurocomputing 126 (2014) 188–196. E. Lotfi, M.R. Akbarzadeh-T, Practical emotional neural networks, Neural Networks 59 (2014) 61–72. http://dx.doi.org/10.1016/j.neunet.2014.06.012. E. Lotfi, S. Setayeshi, S. Taimory, A neural basis computational model of emotional brain for online visual object recognition, Appl. Artif. Intell. 28 (2014) 1–21. http://dx.doi.org/10.1080/08839514.2014.952924. Z. Liu, D. Chen, Y. Xu, J. Liu, Logistic support vector machines and their application to gene expression data, Int. J. Bioinform. Res. Appl. 1 (2) (2005) 169–182. C. Lucas, D. Shahmirzadi, N. Sheikholeslami, Introducing BELBIC: brain emotional learning based intelligent controller, Int. J. Intell. Autom. Soft Comput. 10 (2004) 11–21. M. Meselhy Eltoukhy, I. Faye, B. Belhaouari Samir, A statistical based feature extraction method for breast cancer diagnosis in digital mammogram using multiresolution representation, Comput. Biol. Med. 42 (1) (2012) 123–128.

E. Lotfi, A. Keshavarz / Computers in Biology and Medicine 54 (2014) 180–187

[47] V.S. Tseng, H.H. Yu, Microarray data classification by multi-information based gene scoring integrated with Gene Ontology, Int. J. Data Min. Bioinform. 5 (4) (2011) 402–416. [48] M. Xiong, L. Jin, W. Li, E. Boerwinkle, Computational methods for gene expression-based tumor classification, Biotechniques 29 (6) (2000) 1264–1271. [50] Reboiro-Jato Miguel, Glez-Peña Daniel, Díaz Fernando, Fdez-Riverola Florentino, A novel ensemble approach for multicategory classification of DNA microarray data using biological relevant gene sets, Int. J. Data Min. Bioinform. 6 (6) (2012) 602–616. [51] L. Nanni, A. Lumini, Ensemblator: an ensemble of classifiers for reliable classification of biological data, Pattern Recognit. Lett. 28 (5) (2007) 622–630. [53] T. Prasartvit, A. Banharnsakun, B. Kaewkamnerdpong, T. Achalakul, Reducing bioinformatics data dimension with ABC-kNN, Neurocomputing 116 (2013) 367–381. http://dx.doi.org/10.1016/j.neucom.2012.01.045. [54] M. Perez, D.M. Rubin, L.E. Scott, T. Marwala, W. Stevens, A hybrid fuzzy-svm classifier, applied to gene expression profiling for automated leukaemia diagnosis, in: IEEE 25th Convention of Electrical and Electronics Engineers in Israel, 2008, IEEEI 2008, IEEE, 2008, December, pp. 041–045. [55] Y. Peng, A novel ensemble machine learning for robust microarray data classification, Comput. Biol. Med. 36 (6) (2006) 553–573. [56] L.P. Petalidis, A. Oulas, M. Backlund, M.T. Wayland, L. Liu, K. Plant, V.P. Collins, Improved grading and survival prediction of human astrocytic brain tumors by artificial neural network analysis of gene expression microarray data, Mol. Cancer Ther. 7 (5) (2008) 1013–1024. [57] L.E. Peterson, M. Ozen, H. Erdem, A. Amini, L. Gomez, C.C. Nelson, M. Ittmann, Artificial neural network analysis of DNA microarray-based prostate cancer recurrence, in: Proceedings of the 2005 IEEE Symposium on Computational Intelligence in Bioinformatics and Computational Biology, 2005, CIBCB'05, IEEE, 2005, November, pp. 1–8. [58] L.E. Peterson, M.A. Coleman, Machine learning-based receiver operating characteristic (ROC) curves for crisp and fuzzy classification of DNA microarrays in cancer research, Int. J. Approx. Reason. 47 (1) (2008) 17–36. [59] I. Porto-Díaz, V. Bolón-Canedo, A. Alonso-Betanzos, O. Fontenla-Romero, A study of performance on microarray data sets for a classifier based on information theoretic learning, Neural Netw. 24 (8) (2011) 888–896. [60] S. Saha, A. Ekbal, K. Gupta, S. Bandyopadhyay, Gene expression data clustering using a multiobjective symmetry based clustering technique, Comput. Biol. Med. 43 (11) (2013) 1965–1977. [61] A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, S. Levy, A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis, Bioinformatics 21 (5) (2005) 631–643. [62] X. Sun, Y. Liu, M. Xu, H. Chen, J. Han, K. Wang, Feature selection using dynamic weights for classification, Knowl.-Based Syst. 37 (2013) 541–549. http://dx.doi. org/10.1016/j.knosys.2012.10.001. [63] M. Song, S. Rajasekaran, A greedy algorithm for gene selection based on SVM and correlation, Int. J. Bioinform. Res. Appl. 6 (3) (2010) 296–307. [64] T.Z. Tan, C. Quek, G.S. Ng, Ovarian cancer diagnosis by hippocampus and neocortex-inspired learning memory structures, Neural Netw. 18 (5) (2005) 818–825. [65] T.Z. Tan, C. Quek, G.S. Ng, E.Y.K. Ng, A novel cognitive interpretation of breast cancer thermography with complementary learning fuzzy neural memory structure, Expert Syst. Appl. 33 (3) (2007) 652–666.

187

[66] T.Z. Tan, G.S. Ng, C. Quek, Complementary learning fuzzy neural network: an approach to imbalanced dataset, in: International Joint Conference on Neural Networks, 2007, IJCNN 2007, IEEE, pp. 2306-2311, 2007. [67] T.Z. Tan, C. Quek, G.S. Ng, K. Razvi, Ovarian cancer diagnosis with complementary learning fuzzy neural network, Artif. Intell. Med. 43 (3) (2008) 207–222. [68] M. Takahashi, H. Hayashi, Y. Watanabe, K. Sawamura, N. Fukui, J. Watanabe, T. Someya, Diagnostic classification of schizophrenia by neural network analysis of blood-based gene expression signatures, Schizophr. Res. 119 (1) (2010) 210–218. [69] D.L. Tong, A.C. Schierz, Hybrid genetic algorithm-neural network: feature extraction for unpreprocessed microarray data, Artif. Intell. Med. 53 (1) (2011) 47–56. [70] M. Tong, K.H. Liu, C. Xu, W. Ju, An ensemble of SVM classifiers based on gene pairs, Comput. Biol. Med. 43 (6) (2013) 729–737. [71] M.H. Tseng, H.C. Liao, The genetic algorithm for breast tumor diagnosis – the case of DNA viruses, Appl. Soft Comput. 9 (2) (2009) 703–710. [72] P. Vadakkepat, L.A. Poh, Fuzzy-rough discriminative feature selection and classification algorithm, with application to microarray and image datasets, Appl. Soft Comput. 11 (4) (2011) 3429–3440. [73] V. Vinaya, N. Bulsara, C.J. Gadgil, M. Gadgil, Comparison of feature selection and classification combinations for cancer classification using microarray data, Int. J. Bioinform. Res. Appl. 5 (4) (2009) 417–431. [74] S.L. Wang, X. Li, S. Zhang, J. Gui, D.S. Huang, Tumor classification by combining PNN classifier ensemble with neighborhood rough set based gene reduction, Comput. Biol. Med. 40 (2) (2010) 179–189. [75] Y.F. Wang, Z.G. Yu, V. Anh, Fuzzy C–means method with empirical mode decomposition for clustering microarray data, Int. J. Data Min. Bioinform. 7 (2) (2013) 103–117. [76] A. Yardimci, Soft computing in medicine, Appl. Soft Comput. 9 (3) (2009) 1029–1043. [77] S.H. Yeh, C.H. Lin, P.W. Gean, Acetylation of nuclear factor-κB in rat amygdala improves long-term but not short-term retention of fear memory, Mol. Pharmacol. 65 (5) (2004) 1286–1292. [78] K.Y. Yeung, W.L. Ruzzo, Principal component analysis for clustering gene expression data, Bioinformatics 17 (9) (2001) 763–774. [79] Y. Zhang, J. Xuan, R. Clarke, H.W. Ressom, Module-based breast cancer classification, Int. J. Data Min. Bioinform. 7 (3) (2013) 284–302. [80] C. Zhang, S. Zhang, A supervised orthogonal discriminant projection for tumor classification using gene expression data, Comput. Biol. Med. 43 (5) (2013) 568–575. http://dx.doi.org/10.1016/j.compbiomed.2013.01.019. [81] Z. Zainuddin, P. Ong, Reliable multiclass cancer classification of microarray gene expression profiles using an improved wavelet neural network, Expert Syst. Appl. 38 (11) (2011) 13711–13722. [82] X.L. Xia, K. Li, G.W. Irwin, Two-stage gene selection for support vector machine classification of microarray data, Int. J. Model. Identif. Control 8 (2) (2009) 164–171. [83] M. Xiong, X. Fang, J. Zhao, Biomarker identification by feature wrappers, Genome Res. 11 (11) (2001) 1878–1887. [84] H. Xiong, S. Shekhar, P.N. Tan, V., Kumar, Exploiting a support-based upper bound of Pearson's correlation coefficient for efficiently identifying strongly correlated pairs, in: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, ACM, 2004, August, pp. 334–343.

Gene expression microarray classification using PCA-BEL.

In this paper, a novel hybrid method is proposed based on Principal Component Analysis (PCA) and Brain Emotional Learning (BEL) network for the classi...
1MB Sizes 2 Downloads 6 Views