542

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 4, APRIL 2013

Granular Neural Networks: Concepts and Development Schemes Mingli Song and Witold Pedrycz, Fellow, IEEE

Abstract— In this paper, we introduce a concept of a granular neural network and develop its comprehensive design process. The proposed granular network is formed on the basis of a given (numeric) neural network whose structure is augmented by the formation of granular connections (being realized as intervals) spanned over the numeric ones. Owing to its simplicity of the underlying processing, the interval connections become an appealing alternative of information granules to clarify the main idea. We introduce a concept of information granularity and its quantification (viewed as a level of information granularity). Being treated as an essential design asset, the assumed level of information granularity is distributed (allocated) among the connections of the network in several different ways so that certain performance index becomes maximized. Due to the high dimensionality nature of some protocols of allocation of information granularity and the nature of the allocation process itself, single-objective versions of particle swarm optimization is considered a suitable optimization vehicle. As we are concerned with the granular output of the network, which has to be evaluated with regard to the numeric target of data, two criteria are considered; namely, coverage of numeric data and specificity of information granules (intervals). A series of numeric studies completed for synthetic data and data coming from the machine learning and StatLib repositories provide a useful insight into the effectiveness of the proposed algorithm. Index Terms— Granular connections, granular neural networks, interval analysis, optimal allocation of information granularity, particle swarm optimization (PSO).

I. I NTRODUCTION

M

OST neural networks encountered in the literature are numeric constructs realizing a certain nonlinear mapping. A conceptually viable and practically useful generalization of numeric neural networks comes in the form of nonnumeric mappings realized by neural networks. In this case, it is legitimate to refer to such networks as granular neural networks. The nonnumeric (granular) nature of the mapping Manuscript received September 15, 2012; revised October 28, 2012; accepted December 30, 2012. Date of publication January 18, 2013; date of current version February 13, 2013. This work was supported in part by the Engineering Project of Communication University of China numbered XNG1239. M. Song is with School of Computer, Communication University of China, Beijing 100024, China, and also with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6R 2G7, Canada (e-mail: [email protected]). W. Pedrycz is with Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB T6R 2G7, Canada, also with the Department of Electrical and Computer Engineering, Faculty of Engineering, King Abdulaziz University, Jeddah 21589, and also with the Saudi Arabia and Systems Research Institute of the Polish Academy of Sciences, Warsaw 00-901, Poland (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2013.2237787

arises because of the granular character of the connections. In this case, any numeric input to such a network produces a granular output. There are several compelling reasons behind the realization of this category of neural networks. First, by establishing granular outputs one can effectively gauge the performance of the already constructed numeric neural network in the presence of training data. Second, when dealing with new data, the network forms granular outputs, which are instrumental in the quantification of the quality of the obtained result. For instance, in the case of prediction, we are provided with a comprehensive forecasting outcome; instead of a single numeric result, an information granule is formed whose location and size lead to a description of the quality of prediction. “For instance, in the case of prediction, we are provided with” retain your intended sense. The term “granular” pertains to the nature of the developed construct and by no means is confined to a certain specific type of the neural network. Instead, it concerns a general augmented neural architecture it builds upon. The proposed concept applies equally well to multilayer perceptrons (MLPs) or radial basis function neural networks; cf., [1], [2], [4], [5], [9], [10], [12], [13], [21], and [22]. They have been incorporated in a plethora of neurofuzzy systems; cf., [7], [11], [15], [16], [18], [19], and [23]–[26]. At this point, it is instructive to relate the proposed approach to what is known in the literature as interval neural networks, especially interval MLPs; see [1], [13], [17], [20], [21]. Ishibuchi et al. [1] proposed a neural network with interval weights and interval biases and derived a learning algorithm supporting its development. The studies presented in [2], being a continuation of the work reported in [1], generalize numeric inputs of fuzzy neural networks to their fuzzy set-based counterparts. In [10], a granular neural network using backpropagation algorithm and fuzzy neural networks is used to handle numeric-linguistic data fusion providing a mechanism of knowledge discovery in numeric-linguistic databases. It is noticeable that in all these cases, the corresponding interval neural networks are built from scratch. This paper reported in [22] introduces the concepts of granular neural networks and outlines a design process of granular neural networks. Here, we concentrate on the actual developmental scheme of granular neural networks. In the former studies, a concept of granular neural networks was introduced as a standalone construct. In contrast, in this paper, we introduce a granular neural network, which is constructed on a basis of the existing neural network, and shows a role of information granularity as an important design asset whose optimal allocation makes the resulting granular neural network

2162–237X/$31.00 © 2013 IEEE

SONG AND PEDRYCZ: GRANULAR NEURAL NETWORKS: CONCEPTS AND DEVELOPMENT SCHEMES

to be more in rapport with the real world. In addition, in the literature [22], [1], [2], [22] the design of granular neural networks is done from scratch on a basis of numeric or granular data. Quite often the quality and complexity of the underlying optimization might be negatively affected by the significant number of parameters to be optimized. In this paper, we rely on the already available neural network (whose design could immensely benefit from the wealth of the existing design approaches) and on its basis construct its generalized granular counterpart. In the proposed neural architecture, we are concerned with granular connections (weights) of the neurons. From a computational perspective, intervals offer a computationally appealing alternative and while the underlying concepts are of general character and those are relevant to other formalisms of information granules, more detailed investigations will be focused on interval-valued (interval) neural networks. Furthermore given the focus of this paper, the term information granule and interval are used interchangeably. The entire design process behind the proposed networks is outlined as follows. A starting point of the overall design process is a numeric network that has already been developed by means of one of the well-established learning strategies. Then a dataset (the same as training dataset or a new one) is used to construct a granular network, viz. form interval connections on a basis of the given network. In this sense, the resulting granular construct augments the topology of the existing neural architecture. The design process (viz., the formation of information granules of the connections) is well articulated and translates into an optimization problem. An allocation of granularity following some protocols leads to the optimization of some performance index gauging the quality of the resulting granular neural network. Along with the granular (interval) neural networks, some comments are worth making in the context of applications of such constructs. The granular network produces results being in better rapport with reality when dealing with training data. It is accomplished by showing intervals of possible output values rather than a single numeric entity. An interesting application scenario refers to knowledge (experience) transfer as elaborated on in [31]. A neural network is built on a basis of existing data. Then one intends to use it for a new yet somewhat related problem. The currently available data are very scarce so building a sound model on its basis is not justifiable. In this case, we consider the usage of the existing neural network, viewing it as a source of knowledge. However being cognizant of the differences (and some similarities) of the current circumstances and the environment within which the neural network was designed, one can anticipate that the results produced by the network have to be treated with caution and represented as information granules instead of single numeric entities. This is accomplished by forming a granular neural network on a basis of the original network that is made granular (accepting interval connections) on a basis of the existing numeric evidence (experimental data). The main objective of this paper is to develop a comprehensive design process for granular neural networks through an optimal allocation of information granularity where the levels

543

of information granularity distributed throughout the network are assigned to each connection in an optimal fashion. In our paper, we offer a sound motivation behind the emergence of granular connections. Information granularity is regarded as an important and practically useful design asset whose optimal allocation helps optimize the resulting construct. It is also worth stressing that granular neural networks are constructed based on the already developed numeric neural network (we are not concerned about a way in which the design has been completed). In essence, the design of the granular neural network comes as an enhancement of the well-established practice of the formation of neural networks and augments the originally formed network with a sound quantification of features of the numeric construct. The concept of information granules is formalized in different ways depending upon the underlying formalism. Fuzzy sets, sets, and rough sets are just representative examples of information granules. The number of information granules implies a level of generality one assumes when dealing with a problem at hand. The level of abstraction (generality) associates with information granularity. In a nutshell, information granularity is concerned with a size (specificity) of information granules. Depending upon the formalism being used, information granularity can be described and quantified in different ways. Commonly, as we are focused on a size of the granule, its granularity is expressed by counting the number of elements embraced by the granule (discrete case) or the length of the granule (continuous case). In situations where there are grades of membership involved (like in fuzzy sets), certain generalizations are sought such as, e.g., a sigma count of fuzzy sets (where a summation is completed over the grades of membership) is involved. The approach presented here builds upon the existing numeric neural network and augments it by forming information granules over the existing numeric weights (connections) of the neural network. By doing this, we form a granular neural network, which in light of the coverage criterion (being treated as a performance index) becomes more in rapport with experimental data. It has to be stressed that for any numeric input the granular neural network produces a granular output so a range of feasible output values (output information granules) is produced. The optimized performance index makes sure that the coverage criterion is satisfied to the highest extent (given a certain predetermined level of information granularity). The quality of the granular neural network is optimized and the criteria of coverage and specificity of granular results contribute here to a sound optimization framework. All in all, a comparative analysis with a numeric neural network cannot be realized in a comparable manner as we are concerned with numeric versus granular results. We have emphasized that the granular neural network is an augmented construct built upon the original numeric neural network. In this sense, granular neural networks are not regarded as architectures that are competitive to the numeric neural networks—in contrast, they are conceptually and practically useful models built on a basis of already existing neural networks. This paper is structured as follows. Section II describes the architecture of interval-valued neural networks and presents

544

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 4, APRIL 2013

the underlying processing giving rise to granular outputs. Next, the design process is elaborated on with the two main aspects stressed, namely an optimal allocation of information granularity and an objective function guiding the allocation process. The optimization tool of particle swarm optimization (PSO) used in this paper is discussed in Section IV; its single objective version is presented. Experimental studies are covered in Section V. As noted earlier, throughout this paper, the terms interval-valued neural network and granular neural network are used interchangeably (even though in general the term “granular” exhibits a broader connotation). Capital letters are used to denote intervals (interval connections). II. A RCHITECHTURE OF I NTERVAL -VALUED N EURAL N ETWORKS The architecture of a granular neural network along with the associated learning scheme is formed by starting with a certain already designed (numeric) neural network and augmenting its structure by considering granular connections spanned over the numeric weights (connections). We are concerned with an MLP [27], which is one of the commonly used topologies of neural networks and comes with a wealth of learning schemes. Fig. 1.

A. Interval Operations Before we proceed with the detailed considerations on granular neural networks, let us briefly recall some results of interval mathematics. Given the arguments (variables) represented in the form of intervals, say X = [a, b], Y = [c, d], etc., the algebraic operations are defined as follows [13]: Addition : X + Y = [a + c, b + d]

(1)

Subtraction : X − Y = [a − d, b − c] Multiplication : X × Y = [min(ac, ad, bc, bd),

(2)

max(ac, ad, bc, bd)].

(3)

Division (excluding division by an interval containing 0)     X 1 1 1 1 =X× , . with = (4) Y Y Y d c Furthermore, when it comes to the mapping (function) of intervals, we have the following main results for nondecreasing mapping f (X) = f ([a, b]) = [ f (a), f (b)]

the output layer are also interval valued. Each neuron comes with an interval bias. In virtue of the interval character of the connections used in the network, for any numeric input, the result of processing becomes an interval, say Y = [y− , y+ ]. As indicated, in our design, we proceed with the already constructed MLP (e.g., realized in terms of Levenberg– Marquardt backpropagation learning method; see [3]). The choice of the size of the hidden layer is decided upon during this design phase. The activation functions of the neurons in the hidden layer and the output layer are denoted by f1 and f2 , respectively. The detailed formulas are outlined as follows: hidden layer n  wji xi + bj , j = 1, 2, . . . , n 1 oj = f 1 (z j ), z j = i=1 2 −1 f1 (z) = 1 + e−2×z output layer

(5) y = f2 (z), z =

for nonincreasing mapping f (X) = f ([a, b]) = [ f (b), f (a)].

Architecture of a granular network.

(6)

B. Architectures of Interval-Weight Neural Networks The granular neural network under consideration, presented in Fig. 1, comprises a single hidden layer consisting of n1 neurons, and a single neuron located in the output layer. Features (inputs) to the network are organized in a vector form x = [x1 , x2 ,. . ., xn ]T . The weight (connection) connecting the ith neuron in the input layer to the jth neuron in the hidden layer comes in the form of an interval and is denoted by Wji , + Wji =[w− ji ,wji ]. The weights between the hidden layer and

n1 

wj oj +bj ,

f2 (z) = z.

j=1 When it comes to the interval-valued neural network, the aforementioned formulas are generalized in the following way hidden layer:     - ), f (z+ ) = f (z oj = o-j , o+ 1 1 j j j n  z-j = (w-ji xi + bj ), i=1 n  z+ = (w+ j = 1, 2, . . . , n 1 j ji xi + bj ), i=1

SONG AND PEDRYCZ: GRANULAR NEURAL NETWORKS: CONCEPTS AND DEVELOPMENT SCHEMES

output layer



Y = y− , y+ = f2 (z− ), f2 (z+ ) n  - ,w- o+ ,w+ o+ + b z− = min w-j o-j ,w+ o j j j j j j j i=1 n1  - ,w- o+ ,w+ o+ + b . max w-j o-j ,w+ o z+ = j j j j j j j j=1 As the outputs of the granular neural network are intervals while the targets coming from the experimental data are numeric, we have to define a suitable performance index (objective function), whose optimization (maximization or minimization) is realized through a suitable allocation (distribution) of information granularity. III. D ESIGNING G RANULAR C ONNECTIONS The challenging yet highly important issue is how to construct interval-valued weights and biases of a network. The available information granularity (more specifically, its level of granularity), being treated as an important design asset, has to be carefully distributed among all the connections of the network so that the interval-valued output of the neural network covers (includes) the experimental datum. In what follows, we propose several protocols of allocation of information granularity and discuss two indices whose optimization is realized through this allocation process. A. Allocation of Information Granularity Given a level of information granularity ε assuming values in the unit interval, it is allocated to the individual weights and biases. A way of building intervals around the original numeric values of the weights and biases can be referred to as granularity allocation. The allocation leading to the optimization of a given performance index refers to optimal information granularity allocation. The original weight or bias, denoted symbolically by “wji ” and “bk ,” is made of interval character by forming some bounds around the original numeric values in the following way: w− ji = wji − −ε− |wji | and bk = bk − −ε− |bk | + w+ ji = wji + ε+ |wji | and bk = bk + ε+ |bk |.

(7) (8)

The resulting interval granules of the weights and biases are + - + described as G (wji ) =Wji = [w− ji ,wji ] or G (bk ) = [bk ,bk ]. In this notation, we use the symbol G to stress an operation of the formation of the granular connection. For a given value of ε and a total number of weights and biases equal to “h,” the overall balance of granularity is equal to hε, namely hε =

h 

(εi− + εi+ ).

(9)

i=1

The allocation of information granularity to individual connections can be realized in several different ways depending on how much diversity one would like to accommodate in the allocation process. The following protocols are studied;

545

note that in all cases the balance of granularity expressed by (9) is satisfied. 1) C1 : uniform allocation of information granularity. This protocol is the simplest one. Each connection is affected in the same way. In essence, this allocation does not call for any optimization. All weights (connections) are replaced by intervals constructed with the use of the same value of ε. The intervals themselves are distributed symmetrically around the original numeric value of the connection meaning that we have ε− = ε+ = ε/2. 2) C2 : uniform allocation of information granularity with asymmetric position of intervals around the original connections of the network. In this case, each connection uses the same level of granularity that is ε. Instead of ε− = ε+ = ε/2 for all connections, ε− and ε+ may be different. For each connection, the condition ε− +ε+ = ε is also satisfied. 3) C3 : nonuniform allocation of information granularity with symmetrically distributed intervals of information granules. The levels of granularity are different for different connections and equal to εi . Because of the symmetry, we have εi− = εi+ = εi /2. 4) C4 : nonuniform allocation of information granularity with asymmetrically distributed intervals of information granules. The levels of granularity of different connections may be different and we have εi− + εi+ = εi . 5) C5 : random allocation of information granularity. This protocol is proposed as a reference method to demonstrate how much improvement is achieved by optimizing the granular connections by using different protocols. Each protocol presented previously implies a certain way to realize allocation of information granularity. In the cases of C1 and C2 , we envision no optimization or a very limited 1-D optimization. As for protocols C3 and C4 , the allocation process has to be realized through some optimization techniques since the problem is of high dimensionality. Given the increasing level of flexibility in the realization of the granularity allocation protocols, C4 offers the highest level of flexibility. B. Fitness Function As we are concerned with the granular output of the network, which has to be evaluated with regard to the numeric target, two criteria (performance indices) are worth considering. The first one looks at the quantification of the concept of coverage—an extent to which the target values are covered (included) in the corresponding granular outputs. Another one is focused at expressing a level of specificity of the information granules produced by the network. In this paper, we concentrate on the first criterion. Let us assume that for the purpose of the evaluation, we have some data F coming in the form the pairs (xt , targett ), t=1, 2, …, M. Coverage: The coverage criterion is quantified by some index Q, which is described as a ratio Q 1 = number of data inside the intervals formed by the granular neural networks/M. (10)

546

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 4, APRIL 2013

Specificity: It is expressed in terms of the average length of the intervals produced by the granular neural network for the inputs coming from the dataset F  M 

− − y y+ t t t=1 . (11) length = M In the optimization of the allocation of granularity, we consider several design scenarios. 1) Use of the Coverage Criterion: This gives rise to a single-objective optimization. 2) Use of Both Criteria (Coverage and Specificity): These two criteria, given their character, are very likely in conflict. In this case, a two-objective optimization producing a Pareto front is an option worth pursuing. In summary, the overall design process consists of the following steps. Step 1: Train a numeric neural network with existing training data using a certain method. Step 2: Assign a level of information granularity ε to the network, and use a protocol to build interval (or other types of information granules) connections. Step 3: Apply single-objective PSO optimization to adjust the allocation of granularity guiding by maximizing an objective function, which contains coverage requirement. Step 4: Use the optimized set of granularities to form interval connections on the original network. Hence, a granular network is formed. IV. S INGLE -O BJECTIVE PARTICLE S WARM O PTIMIZATION PSO has been widely used in the literature as an effective optimization vehicle [6], [8]. In this section, we briefly review an essential mode of PSO, namely its single-objective version. It is used in the subsequent experiments reported in this paper. In PSO, a swarm of particles is used where each of them represent a potential solution (in our case it is a vector of optimized levels of information granularity). Each particle, say particle i, is associated with two vectors, i.e., the current position vector si = [si1 , si2 , . . . , siD ] and its current velocity vector vi = [vi1 , vi2 , . . . , viD ], where D stands for the dimensionality of the search space. The velocity and the position of each particle are initialized randomly; the entries are located within the unit interval. The PSO algorithm updates the particles’ velocities and position iteratively until a stopping criterion has been satisfied. The update expressions of the velocity and position of the ith particle for the coordinate “d” are as follows: v id (t + 1) = η(t)v id (t) + c1r1d (P pbestid (t) − sid (t)) +c2r2d (P gbestd (t) − sid (t)) sid (t + 1) = sid (t) + v id (t + 1)

(12) (13)

where t is the current generation. Ppbestid represents the best position found by the i th particle so far, and Pgbestd is the best position encountered in the current generation. c1 and c2

Algorithm 1 Create an Initial Swarm of Particles t=0; While tq1[si (t–1)] Replace si (t–1) with si (t) Endif Endfor Find global best—the particle with largest q1 Save the position of this particle and its q1 For each particle i do Update the velocity vi (t) according to (12) If vi (t) satisfies the limitation of velocity Update the position si (t) by (13) Endif Endfor Endwhile

are two constants being typically set to 2. r1d and r2d are two uniformly distributed random numbers independently generated within [0, 1]. There are also some limits imposed on the velocities, say vmax and vmin . Here we set vmax = 1 and vmin = –1. η(t) is the generation-based inertial weight, which influences the convergence of the swarm. It linearly changes over successive generations starting from its maximal value η(t) =

tηmax − t (ηmax − ηmin ) + ηmin . tηmax

(14)

The maximal and minimal weights ηmax and ηmin are set to 0.9 and 0.4, respectively [29]. In this paper, the particles represent possible solutions of the levels of granularity: ε1 , ε2 , …, εh for protocol C3 and ε1− , ε2− , …, εh− , ε1+ , ε2+ , …, εh+ for protocol C4 . In other words, the searching space is [0, ε]h for protocol C3 , and [0, ε]2h for protocol C4 . h is the total number of connections and biases of the network. We initialize the particles randomly to proceed with the search of the space. For protocol C3 , D = h; for protocol C3 , D = 2 h. The pseudocode of the single-objective PSO is presented as follows. V. E XPERIMENTAL S TUDIES In this section, we present a series of numeric experiments to illustrate the proposed algorithm, show its development, and quantify the resulting performance. Both synthetic data and those benchmark datasets coming from the machine learning repository (http://archive.ics.uci.edu/ml/datasets.html) as well as StatLib website (http://lib.stat.cmu.edu/datasets/) are used. In all the experiments, we use the tenfold cross-validation method to avoid overfitting and also make our method easily comparable with other algorithms. The performance of the numeric neural network is quantified by means of the commonly used mean squared error (MSE).

SONG AND PEDRYCZ: GRANULAR NEURAL NETWORKS: CONCEPTS AND DEVELOPMENT SCHEMES

In all experiments, we start with a neural network with a single hidden layer. The activation function used for the neurons in the hidden layer is the sigmoidal one while for the neurons in the output layer we use a linear function. The number of neurons in the hidden layer was selected experimentally by analyzing the performance of the network vis-à-vis the size of the hidden layer. The training of the network was realized using a standard Levenberg–Marquardt minimization method, which was run for a maximum of 10 000 epochs (this number was sufficient given a fast rate of convergence of the method). The data were preprocessed by normalizing the input variables into the unit interval. The initial values of the weights (connections) were set up randomly to be within the range [–1, 1]. The construction of the granular neural network was realized by running the protocols of information granularity allocation as presented before and using the training data (the same as being utilized in the formation of the original neural network)—case (a)—and the testing data—case (b). The setup of the PSO is as follows: the number of generations 100, the size of population 100, c1 = c2 = 2. The values of these parameters are in line with those reported in the literature [28]. We also ran some other combinations of the size of the population and the number of generations but no improvement of the fitness function was reported when increasing the values of these two parameters. Computing overhead is another major concern when we compare different protocols of granularity allocation. To quantify the overall performance, we report the actual running time for protocols C1 –C4 and the time for training a numeric neural network in M ATLAB. According to the design process presented in Section I, we assume that a numeric network has been trained (with the use of training data). Then depending on the dataset used to construct a granular network, we consider two cases: 1) case (a) the data are the original training data, and 2) case (b) the data are testing data. To quantify the performance of the granular neural network at the global level (not a single value of ε), we compute the area under curve (AUC) for the coverage plot. The AUC measure is an informal name of definite integral. In our experiment, AUC is used to calculate the area of the region in the xy plane bounded by the graph of an objective function (coverage), the x-axis (ε), and the vertical lines ε = 0 and ε = 1. Since the coverage value is in the unit [0, 1], or in other words, the areas being all above the x-axis, all the areas add to the total. The performance of a granular neural network is quantified by the values of AUC. Hence, we can investigate the performance of our method at the global level instead of local. A. Two-Input Synthetic Data The two-variable function is a sine wave, which has been used as regression benchmark data [14] y = 0.8 × sin(x 1 /4) × sin(x 2 /2)

(15)

where x1 is in [0, 10] and x2 is in [−5, 5]. There are 600 input– output pairs used in the following experiment (followed

547

Fig. 2. Nonlinear two-variable function along with training data and testing data.

Fig. 3. Performance index as a function of the number of neurons in the hidden layer.

a uniform distribution over the input space). The function along with the superimposed training data and testing data is illustrated in Fig. 2. Tenfold cross-validation method is applied here and thus each time there are 540 training data and 60 testing data. The performance of the constructed neural networks quantified in terms of the MSE and the standard deviation is illustrated in Fig. 3. The overall trend of MSE is going down whereas the standard deviation is changeable. Through a visual inspection, we choose the number of neurons in the hidden layer to be equal to 10. Let us start with the optimization with the average coverage being treated as the fitness function and the data formed following case (a). The plots of the average coverage regarded as a function of ε are included in Fig. 4. The values of the AUC are computed over the unit interval; however, the plot is shown only for a portion of the entire range of ε. The corresponding average values of the AUC and the standard deviation on training data in case (a) are as follows:

548

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 4, APRIL 2013

Fig. 4. Average coverage as a function of ε for the four protocols of information granularity allocation used in this paper in case (a).

Fig. 5.

Average coverage as a function of ε in case (b).

1) protocol C1 : AUC = 0.9859 ± 0.0058; 2) protocol C2 : AUC = 0.9870 ± 0.0052; 3) protocol C3 : AUC = 0.9935 ± 0.0014; 4) protocol C4 : AUC = 0.9944 ± 0.0004; 5) protocol C5 : AUC = 0.9856 ± 0.0073. The plots of the coverage regarded as a function of ε for case (b) are shown in Fig. 5. As shown in this figure, the average value of AUC obtained for protocol C4 is slightly higher than the one produced for C3 . However, both are higher than the ones produced by other three protocols. This is not surprising as they reflect the increasing flexibility of the successive protocols of allocation of granularity and a better, more effective usage of information granularity. The corresponding average values of the AUC and the standard deviation on training data in case (b) are as as follows: 1) protocol C1 : AUC = 0.9859 ± 0.0067; 2) protocol C2 : AUC = 0.9870 ± 0.0062; 3) protocol C3 : AUC = 0.9936 ± 0.0011;

Fig. 6. Performance of PSO expressed in terms of the fitness function obtained in consecutive generations; ε = 0.01 and protocol C3 .

Fig. 7. Original neural network with numeric weights. The biases of the neurons are not shown.

4) protocol C4 : AUC = 0.9944 ± 0.0005; 5) protocol C5 : AUC = 0.9859 ± 0.0061. Fig. 6 displays the values of the fitness function reported in consecutive generations; it is noticeable that the convergence of the optimization process (increasing coverage) occurs in the first ten generations. Now the construction process of a granular neural network is complete. As a comparison, we first illustrate the numeric weights of an original neural network in Fig. 7. The weights in the hidden layer are in the range of [–2.14, 3.78] and in the range of [–1.15, 1.42] in the output layer. The solid line represents a positive value and the dotted line represents a negative one. The darker the lines look, the larger the absolute value of the weight. Since we use tenfold cross-validation method, there are a total of ten granular networks. Now we show one of them in terms of the interval weights (levels of information granularity). We give the example of C3 instead of C4 since C3 and C4 have almost the same accuracy but C3 is more

SONG AND PEDRYCZ: GRANULAR NEURAL NETWORKS: CONCEPTS AND DEVELOPMENT SCHEMES

Fig. 8. Optimized allocation of granularity levels realized by running protocol C3 for case (a) with ε = 0.01. The biases of the neurons are not shown.

549

Fig. 10. Optimized allocation of granularity levels realized by running protocol C3 for case (a) with ε = 1. The biases of the neurons are not shown.

Fig. 9. Optimized allocation of granularity of level 0.01 (ε) for C3 for case (b). Fig. 11. Optimized allocation of granularity of level 1 (ε) for C3 for case (b).

proficient. The granularities of the network trained for case (a) are displayed in Fig. 8 (ε = 0.01) and for case (b) in Fig. 9. Same for Fig. 7, the darkness of the line shows how large the granularity is. The thicker the line, the higher the level of granularity associated with the corresponding connection. When ε = 0.01, for both cases (a) and (b) we have some zero granularities and we denote this as dashed lines in Figs. 8 and 9. The values of other granularities are enlarged 200 times to provide a clear vision. The optimized coverage of case (a) is 0.9. We can find in Fig. 8 that besides the zero granularity level, other granularities are almost the same size. It is obvious that the distribution of information granularity in case (b) is different from case (a); however, their coverage is similar [0.9 for case (b)]. Since each interval is constructed by extending the numeric value to a lower bound and an upper bound separately, we actually form two subintervals for each connection. And in this way, we have to distribute the bounds of granularity to each connection. Here, we illustrate protocol C3 which allows the two sub-intervals have the same level of granularity. There are

some connections which have zero values of the levels of granularity, which means that these connections are effectively retained as numeric ones. When ε = 1, for both cases (a) and (b) we cannot find any zero granularity and in general each connection is assigned a value of granularity. In addition, the result of single objective optimization produces less specific broader intervals, which comes with the higher coverage value. Refer to Figs. 10 and 11 about the allocation of information granularity for cases (a) and (b), respectively. It is interesting to look at the computing overhead. As an example, let us take ε = 0.01. The training time of the original neural network is 2.9 s (average). For the average computing time when using C1 , C2 , C3 , and C4 with the parameters of PSO set as before (processor: 2.50 GHz), we obtain the following: case (a): Tc1 = 0.04 s, Tc2 = 2.8 s, Tc3 = 258 s, and Tc4 = 515 s; case (b): Tc1 = 0.03 s, Tc2 = 2.8 s, Tc3 = 31 s, and Tc4 = 62 s.

550

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 4, APRIL 2013

Fig. 12.

Performance index Q2 as a function of ε in case (a).

Fig. 13.

Performance index Q2 as a function of ε in case (b).

Another criterion mentioned in Section III is the average length of intervals [formula (11)]. It is more complicated to use two objectives than considering only coverage. Here, we give one example and the objective function is defined as coverage . (16) Q2 = log10(length) The values of the objective function as a function of ε is displayed in Fig. 12. We can easily find a best result when ε is equal to 0.15 in protocol C4 . This is different from the one objective case in which the best result remains after a certain value of ε. The corresponding average values of the AUC and the standard deviation on training data in case (b) are as follows: 1) protocol C1 : AUC = 1.7337 ± 2.74; 2) protocol C2 : AUC = 35.3 ± 101.52; 3) protocol C3 : AUC = 408 890 ± 835 550;

Fig. 14.

Performance index MSE versus the size of the hidden layer.

Fig. 15.

Coverage as a function of ε for case (a).

4) protocol C4 : AUC = 880 620± 1827 000; 5) protocol C5 : AUC = 1.6332 ± 1.47. Fig. 13 shows the values of objective function as a function of ε in case (b). It displays a similar result to Fig. 12 does: there is one optimized result in between 0 and 1 of protocol C4 . However, the fluctuation is more frequent. B. Auto MPG Dataset The tenfold cross validation of data results in around 283 data forming a training set and 31 data used for testing at each time. The results of the development of the neural network are summarized in Fig. 14. The number of neurons in the hidden layer is set to be equal to 8. As before, we look at the results of cases (a) and (b). Case (a): The average values of the performance index shown in Fig. 15 show a saturation effect for higher values of the level of granularity. As before, it is apparent that C3 and C4 produce higher coverage than the first two allocation protocols.

SONG AND PEDRYCZ: GRANULAR NEURAL NETWORKS: CONCEPTS AND DEVELOPMENT SCHEMES

551

Fig. 17. Toptimized allocation of granularity levels realized by running protocol C3 for case (a) with ε = 0.01.

(a)

Fig. 18.

(b) Fig. 16. Performance index as a function of generation. (a) ε = 0.01. (b) ε = 0.03.

The corresponding average values of the AUC and the standard deviation on training data are as follows: 1) 2) 3) 4) 5)

C1 : C2 : C3 : C4 : C5 :

AUC AUC AUC AUC AUC

= = = = =

0.9191 0.9233 0.9746 0.9773 0.9152

± ± ± ± ±

0.0139; 0.0122; 0.0039; 0.0029; 0.0183.

The performances of PSO for two different values of ε = 0.01 and ε = 0.03 are shown in Fig. 16. Let us look at the constructed granular neural network when ε = 0.01. Fig. 17 shows the allocation of information granularity in which dotted lines express the zero granularities and solid lines represent other levels. We find that all other levels are the same in both hidden layer and output layer. Case (b): The plots of the average coverage regarded as a function of ε are included in Fig. 18. The corresponding

Coverage as a function of ε for case (b).

average values of the AUC and the standard deviation on training data are as follows: 1) C1 : AUC = 0.9024 ± 0.0263; 2) C2 : AUC = 0.9142 ± 0.0241; 3) C3 : AUC = 0.9653 ± 0.0107; 4) C4 : AUC = 0.9705 ± 0.0089; 5) C5 : AUC = 0.9008 ± 0.0259. Since there are eight neurons in the hidden layer and six inputs, and the granularities differ a lot, it is difficult to compare their values in one picture. So here we will not give the allocation of granularity figure. The training time of the original neural network is 2.5 s (average). The actual average running time of the two scenarios are as follows: 1) case (a) Tc1 = 0.04 s, Tc2 = 3 s, Tc3 = 276 s, and Tc4 = 553 s; 2) case (b) Tc1 = 0.04 s, Tc2 = 3 s, Tc3 = 33 s, and Tc4 = 66 s. We also experimented with three other datasets: Boston Housing data, Bodyfat data, and PM10 data. Table I contains the AUC values for the optimized distribution of granularity

552

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 24, NO. 4, APRIL 2013

TABLE I R ESULTS FOR C ASE (a): AVERAGE AUC VALUES OF THE C OVERAGE O BJECTIVE F UNCTION O BTAINED FOR D IFFERENT P ROTOCOLS OF A LLOCATION OF I NFORMATION G RANULARITY FOR THE T RAINING D ATA AND

T ESTING D ATA (E NTRIES OF THE TABLE I NCLUDE PAIRS OF R ESULTS —AUC T RAINING D ATA /AUC T ESTING D ATA )

XXX Protocol XX XX Data X

C1

C2

C3

C4

C5

Synthetic

0.9859/ 0.9858

0.9870/ 0.9866

0.9935/ 0.9930

0.9944/ 0.9376

0.9856/ 0.9857

Auto MPG

0.9191/ 0.9024

0.9233/ 0.9012

0.9746/ 0.9666

0.9773/ 0.6736

0.9152/ 0.8987

Housing

0.8347/ 0.7787

0.8490/ 0.7876

0.9538/ 0.9329

0.9620/ 0.6408

0.8322/ 0.7749

Bodyfat

0.9715/ 0.9667

0.9776/ 0.9688

0.9912/ 0.9829

0.9930/ 0.9543

0.9740/ 0.9654

PM10

0.9527/ 0.9448

0.9542/ 0.9455

0.9770/ 0.9732

0.9790/ 0.8450

0.9524/ 0.9448

TABLE II R ESULTS FOR C ASE (b): AVERAGE AUC VALUES OF THE C OVERAGE O BJECTIVE F UNCTION O BTAINED FOR D IFFERENT P ROTOCOLS OF A LLOCATION OF I NFORMATION G RANULARITY FOR THE T RAINING D ATA AND

T ESTING D ATA (E NTRIES OF THE TABLE I NCLUDE PAIRS OF R ESULTS —AUC T RAINING D ATA /AUC T ESTING D ATA )

XX XXProtocol XX Data X X

C1

C2

C3

C4

C5

Synthetic

0.9858/ 0.9859

0.9870/ 0.9865

0.9936/ 0.9931

0.9944/ 0.9555

0.9859/ 0.9861

Auto MPG

0.9024/ 0.9191

0.9142/ 0.9112

0.9653/ 0.9582

0.9705/ 0.8012

0.9008/ 0.9161

Housing

0.7787/ 0.8347

0.8163/ 0.8204

0.9365/ 0.9324

0.9559/ 0.7321

0.7734/ 0.8276

Bodyfat

0.9667/ 0.9751

0.9712/ 0.9742

0.9862/ 0.9868

0.9890/ 0.9704

0.9655/ 0.9747

PM10

0.9448/ 0.9527

0.9488/ 0.9505

0.9709/ 0.9711

0.9740/ 0.8937

0.9436/ 0.9515

governed by the coverage objective function obtained on the training data for case (a). The best results are shown in boldface. As it could have been anticipated, the highest values of AUC are obtained when running protocol C4 (which offers the highest level of flexibility). This mostly happens for the training data; however, in the case of the testing data, C3 shows better performance. Table II shows the AUC values for the optimized distribution of granularity governed by the coverage objective function obtained on the training data for case (b). A similar situation happens here: in the case of training data, C4 shows better performance; however for testing data C3 shows better performance. VI. C ONCLUSION The concept of granular neural networks along with the underlying design practices opens a new perspective at the realizations of neural architectures at the higher level of abstraction. The granularity of the connections of the network (which is subject to intensive optimization) delivers the architectures and results of neurocomputing at the higher level of abstraction. We stressed the role of information granularity as

an important design asset. The two cases investigated in this paper are reflective of the categories of the problems: the first one helps quantify the performance of the network and link it with its further fabrication along with the analysis of their fault tolerance; the level of granularity of the corresponding connection in the second case can relate to the corresponding level of fault tolerance. The obtained experimental results (when using synthetic and real-world data) quantify the performance of the granular neural network demonstrating its performance and capabilities for the single-objective optimization scenario and several protocols of allocation of information granularity. It is worth stressing that the interval-valued connections (weights) of the networks were studied here as one of the simplest alternatives of granular constructs. The two promising generalizations worth pursuing could include fuzzy neural networks (in which case we admit fuzzy numbers built around numeric connections) and probabilistic neural networks (with the optimal allocation of granularity expressed in terms of the standard deviation of the probabilistic connections regarded as respective random variables). R EFERENCES [1] H. Ishibuchi and H. Tanaka, “An architecture of neural networks with interval weights and its application to fuzzy regression analysis,” Fuzzy Sets Syst., vol. 57, no. 1, pp. 27–39, Jul. 1993. [2] H. Ishibuchi, K. Kwon, and H. Tanaka, “A learning algorithm of fuzzy neural networks with triangular fuzzy weights,” Fuzzy Sets Syst., vol. 71, no. 3, pp. 277–293, May 1995. [3] M. T. Hagan and M. B. Menhaj, “Training feedforward networks with the Marquardt algorithm,” IEEE Trans. Neural Netw., vol. 5, no. 6, pp. 989–993, Nov. 1994. [4] R. J. Kuo, P. Wu, and C. P. Wang, “An intelligent sales forecasting system through integration of artificial neural networks and fuzzy neural networks with fuzzy weight elimination,” Neural Netw., vol. 15, no. 7, pp. 909–925, Sep. 2002. [5] L. Huang, B. Zhang, and Q. Huang, “Robust interval regression analysis using neural networks,” Fuzzy Sets Syst., vol. 97, no. 3, pp. 337–347, Aug. 1998. [6] S. H. Nabavi-Kerizi, M. Abadi, and E. Kabir, “A PSO-based weighting method for linear combination of neural networks,” Compu. Electr. Eng., vol. 36, no. 5, pp. 886–894, 2010. [7] G. Panoutsos and M. Mahfouf, “A neural-fuzzy modeling framework based on granular computing: Concepts and applications,” Fuzzy Sets Syst., vol. 161, no. 21, pp. 2808–2830, 2010. [8] C. M. Fonseca and P. J. Fleming, “Multiobjective optimization and multiple constraint handling with evolutionary algorithms. I. A unified formulation,” IEEE Trans. Syst., Man Cybern., A, Syst. Humans, vol. 28, no. 1, pp. 26–37, Jan. 1998. [9] H. Park, W. Pedrycz, and S. Oh, “Granular neural networks and their development through context-based clustering and adjustable dimensionality of receptive fields,” IEEE Trans. Neural Netw., vol. 20, no. 10, pp. 1604–1616, Oct. 2009. [10] Y. Zhang, M. D. Fraser, R. A. Gagliano, and A. Kandel, “Granular neural networks for numerical-linguistic data fusion and knowledge discovery,” IEEE Trans. Neural Netw., vol. 11, no. 3, pp. 658–667, May 2000. [11] G. Bortolan, “An architecture of fuzzy neural networks for linguistic processing,” Fuzzy Sets Syst., vol. 100, nos. 1–3, pp. 197–215, 1998. [12] Y. Zhang, B. Jin, and Y. Tang, “Granular neural networks with evolutionary interval learning,” IEEE Trans. Fuzzy Syst., vol. 16, no. 2, pp. 309–319, Apr. 2008. [13] E. de Weerdt, Q. P. Chu, and J. A. Mulder, “Neural network output optimization using interval analysis,” IEEE Trans. Neural Netw., vol. 20, no. 4, pp. 638–653, Apr. 2009. [14] D. Wedge, D. Ingram, D. McLean, C. Mingham, and Z. Bandar, “On global-local artificial neural networks for function approximation,” IEEE Trans. Neural Netw., vol. 17, no. 4, pp. 942–952, Jul. 2006. [15] J. Buckley and Y. Hayashi, “Fuzzy neural networks: A survey,” Fuzzy Sets Syst., vol. 66, no. 1, pp. 1–14, 1994.

SONG AND PEDRYCZ: GRANULAR NEURAL NETWORKS: CONCEPTS AND DEVELOPMENT SCHEMES

[16] P. Liu and H. Li, “Efficient learning algorithms for three-layer regular feedforward fuzzy neural networks,” IEEE Trans. Neural Netw., vol. 15, no. 3, pp. 545–558, May 2004. [17] H. Ishibuchi and M. Nii, “Improving the generalization ability of neural networks by interval arithmetic,” in Proc. Int. Conf. Knowl.-Based Intell. Electron. Syst., vol. 1. Apr. 1998, pp. 231–236. [18] C. F. Juang and C. T. Lin, “A recurrent self-organizing neural fuzzy inference network,” IEEE Trans. Neural Netw., vol. 10, no. 4, pp. 828–845, Jul. 1999. [19] Y. Ishibuchi, Fuzzy Modelling, Paradigms and Practice, W. Pedrycz, Ed. Boston, MA: Kluwer, 1996, pp. 185–202. [20] R. Fang, J. Zhou, F. Liu, and B. Peng, “Short-term load forecasting using interval arithmetic backpropagation neural network,” in Proc. Int. Conf. Mach. Learn. Cybern., Aug. 2006, pp. 2872-2876. [21] Z. A. Garczarczyk, “Interval neural networks,” in Proc. IEEE Int. Symp. Circuits Syst., vol. 3. May 2000, pp. 567–570. [22] W. Pedrycz and G. Vukovich, “Granular neural networks,” Neurocomputing, vol. 36, nos. 1–4, pp. 205–224, 2001. [23] C. Juang, R. Huang, and Y. Lin, “A recurrent self-evolving interval type-2 fuzzy neural network for dynamic system processing,” IEEE Trans. Fuzzy Syst., vol. 17, no. 5, pp. 1092–1105, Oct. 2009. [24] P. Melin, O. Mendoza, and O. Castillo, “Face recognition with an improved interval type-2 fuzzy logic sugeno integral and modular neural networks,” IEEE Trans. Syst., Man Cybern., A, Syst. Humans, vol. 41, no. 5, pp. 1001–1012, Sep. 2011. [25] C. Wang, C. Cheng, and T. Lee, “Dynamical optimal training for interval type-2 fuzzy neural network (T2FNN),” IEEE Trans. Syst., Man Cybern., B, Cybern., vol. 34, no. 3, pp. 1462–1477, Jun. 2004. [26] C. Juang, R. Huang, and W. Cheng, “An interval type-2 fuzzy-neural network with support-vector regression for noisy regression problems,” IEEE Trans. Fuzzy Syst., vol. 18, no. 4, pp. 686–699, Aug. 2010. [27] M. Rocha, P. Cortez, and J. Neves, “Evolutionary design of neural networks for classification and regression,” Adaptive Natural Computing Algorithms. New York: Springer-Verlag, 2005, pp. 304–307. [28] J. Kennedy and R. C. Eberhart, Swarm Intelligence. San Mateo, CA: Morgan Kaufmann, 2001. [29] E. Zitzler, K. Deb, and L. Thiele, “Comparison of multiobjective evolutionary algorithms: Empirical results,” Evol. Comput., vol. 8, no. 2, pp. 173–195, Jun. 2000. [30] J. Horn, Multiple Decision Making, Handbook of Evolutionary Computation, D. Fogel and Z. Michalewicz, Eds. London, U.K.: Oxford Univ. Press, 1997. [31] W. Pedrycz, B. Russo, and G. Succi, “Knowledge transfer in system modeling and its realization through an optimal allocation of information granularity,” Appl. Soft Comput., vol. 12, no. 8, pp. 1985–1995, Aug. 2012.

553

Mingli Song received the B.S. and M.S. degrees in automation from the Dalian University of Technology, Dalian, China, in 2006 and 2008, respectively, and the Ph.D. degree from the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada, in 2012. She is currently an Instructor with the School of Computer, Communication University of China, Beijing, China. Her current research interests include knowledge discovery, fuzzy modeling, and granular computing.

Witold Pedrycz (M’88–SM’90–F’99) is currently a Professor and the Canada Research Chair (CRC – Computational Intelligence) with the Department of Electrical and Computer Engineering, University of Alberta, Edmonton, AB, Canada. He has authored or co-authored numerous papers in journals and conferences, and has authored 14 research monographs on computational intelligence and software engineering. His current research interests include computational intelligence, fuzzy modeling and granular computing, knowledge discovery and data mining, fuzzy control, pattern recognition, knowledgebased neural networks, relational computing, and software engineering. Dr. Pedrycz was a recipient of the Norbert Wiener Award from the IEEE Systems, Man, and Cybernetics Council in 2007, the IEEE Canada Computer Engineering Medal in 2008, and the Cajastur Prize for Soft Computing from the European Centre for Soft Computing for Pioneering and Multifaceted Contributions to Granular Computing in 2009. He is intensively involved in editorial activities. He is the Editor-in-Chief of Information Sciences and the IEEE T RANSACTIONS ON S YSTEMS , M AN , AND C YBERNETIC S — PART A. He is currently an Associate Editor of the IEEE T RANSACTIONS ON F UZZY S YSTEMS and a member of a number of editorial boards of other international journals. In 2009, he was elected as a Foreign Member of the Polish Academy of Sciences. In 2012, he was elected as a Fellow of the Royal Society of Canada. He has been a member of numerous program committees of the IEEE conferences on fuzzy sets and neurocomputing.

Granular neural networks: concepts and development schemes.

In this paper, we introduce a concept of a granular neural network and develop its comprehensive design process. The proposed granular network is form...
965KB Sizes 3 Downloads 3 Views