IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

435

QoS Differential Scheduling in Cognitive-Radio-Based Smart Grid Networks: An Adaptive Dynamic Programming Approach Rong Yu, Member, IEEE, Weifeng Zhong, Student Member, IEEE, Shengli Xie, Senior Member, IEEE, Yan Zhang, Senior Member, IEEE, and Yun Zhang Abstract— As the next-generation power grid, smart grid will be integrated with a variety of novel communication technologies to support the explosive data traffic and the diverse requirements of quality of service (QoS). Cognitive radio (CR), which has the favorable ability to improve the spectrum utilization, provides an efficient and reliable solution for smart grid communications networks. In this paper, we study the QoS differential scheduling problem in the CR-based smart grid communications networks. The scheduler is responsible for managing the spectrum resources and arranging the data transmissions of smart grid users (SGUs). To guarantee the differential QoS, the SGUs are assigned to have different priorities according to their roles and their current situations in the smart grid. Based on the QoS-aware priority policy, the scheduler adjusts the channels allocation to minimize the transmission delay of SGUs. The entire transmission scheduling problem is formulated as a semi-Markov decision process and solved by the methodology of adaptive dynamic programming. A heuristic dynamic programming (HDP) architecture is established for the scheduling problem. By the online network training, the HDP can learn from the activities of primary users and SGUs, and adjust the scheduling decision to achieve the purpose of transmission delay minimization. Simulation results illustrate that the proposed priority policy ensures the low transmission delay of high priority SGUs. In addition, the emergency data transmission delay is also reduced to a significantly low level, guaranteeing the differential QoS in smart grid. Index Terms— Adaptive dynamic programming (ADP), cognitive radio (CR), quality of service (QoS), smart grid communications network, transmission scheduling.

T

I. I NTRODUCTION HE traditional power grid is a centrally controlled system with unidirectional information collection and

Manuscript received August 15, 2014; revised January 4, 2015; accepted March 1, 2015. Date of publication April 22, 2015; date of current version January 18, 2016. This work was supported in part by the National Natural Science Foundation of China under Grant 61422201, Grant 61370159, Grant 61333013, Grant 61322306, Grant U1301255, Grant U1201253, and Grant U1401252, in part by the Guangdong Province Natural Science Foundation under Grant S2011030002886, in part by the High Education Excellent Young Teacher Program of Guangdong Province under Grant YQ2013057, in part by the Science and Technology Program of Guangzhou under Grant 2014J2200097 through the Zhujiang New Star Program, in part by the Research Council of Norway under Grant 217006/E20, and in part by the European Commission Seventh Framework Programme through CROWN Project under Grant PIRSES-GA-2013-627490. R. Yu, W. Zhong, S. Xie, and Y. Zhang are with School of Automation, the Guangdong University of Technology and the Guangdong Key Laboratory of IoT Information Technology, Guangzhou 510006, China (e-mail: yurong@ ieee.org; [email protected]; [email protected]; [email protected]). Y. Zhang is with the Simula Research Laboratory, University of Oslo, Oslo N-0317, Norway (e-mail: [email protected]). Color versions of one or more of the figures in this paper are available online at http://ieeexplore.ieee.org. Digital Object Identifier 10.1109/TNNLS.2015.2411673

unidirectional electricity delivery, which causes many problems and challenges, such as energy wasting, pollution, electricity demand prediction, integration of distributed energy resources, and security [1]–[3]. Therefore, the evolution of power grid has become an urgent worldwide concern. The smart grid is regarded as the next-generation power grid, providing with renewable energy, intelligent control, high efficiency, and high reliability [2], [4]–[6]. In order to implement the desirable functions of smart grid, novel information technologies should be integrated, including embedded sensing, wireless communications, pervasive computing, and adaptive control [7]–[9]. To balance the energy supply and demand, efficient two-way communications between the customers and the utilities are required for information exchange in real time [10]. The huge amount of data of control commands, monitoring, and smart meters reading will be transmitted through the smart grid communications networks [11]. To cover a large number of network nodes in a wide area, wireless communications is a preferred option for the smart grid communications networks. However, the free industrial, scientific and medical (ISM) frequency band has become crowded without the guarantee of quality of service (QoS). Purchasing licensed frequency bands will increase the burden of utilities [12]. Meanwhile, other licensed frequency bands are used in a fixed and an inefficient way [13]. For all these challenges, cognitive radio (CR), as a promising technology, improves the spectrum resource utilization and provides considerable bandwidth of large-scale data transmission. The CR technology allows two radio systems, i.e., the primary system and the secondary system, to share the same frequency band. Primary system has the license and priority to use the legal spectrum. Secondary system has no priority to use the spectrum, but could opportunistically access to the spectrum when they are not occupied by the primary system. The user in the primary system is called as primary user (PU) and the user in the secondary system is called as secondary user. By introducing CR technology, the smart grid users (SGUs) can use potentially all available spectrum resources and improve the spectrum utilization, supporting the huge data traffic in smart grid. In the study of the CR-based smart grid communications networks [14]–[17], spectrum access schemes are designed to optimize the QoS in smart grid. Different services in smart grid have different QoS requirements. For example,

2162-237X © 2015 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

436

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

the substations transform the high-voltage electricity into the low-voltage electricity and distribute it to the neighborhood distribution grid. The SGUs, who require high QoS, like substations, should use the available spectrum for vital data transmission with high priority. The priority-based optimization in CR-based smart grid communications networks is studied in [18]. The meter reading service requires low QoS in smart grid, since there is no need to upload the energy consumption data in real time. However, more than meter reading, the household smart meter also can detect and report faults. If there is an emergency, e.g., device damage, the smart meter has to report the emergency, which requires rather high QoS for the reliability of smart grid. To guarantee the differential QoS, we readjust the priorities of SGUs according to their situations. The available channels are allocated to the SGUs based on their priorities. In this paper, we investigate the transmission scheduling of SGUs in CR-based smart grid communications network satisfying the potential requirement of differential QoS. The main contributions of this paper are as follows. 1) A QoS differentiation policy with diverse priorities is devised. First, the SGUs are divided into different classes based on their roles in the smart grid. The higher class SGUs have higher priority to use the available spectrum resources for ensuring their QoS. Second, to enhance the reliability of smart grid, an SGU is allowed to upgrade its priority whenever it has emergency information to report, even if it belongs to the lowest class. As a consequence, the priorities of SGUs jointly depend on their roles and the current situations. 2) The scheduler will manage spectrum resources and allocate channels to SGUs based on their current priorities, minimizing the long-term transmission delay. The entire transmission scheduling process is slotted and formulated as the semi-Markov decision process (SMDP). The methodology of adaptive dynamic programming (ADP) is employed to estimate the long-term transmission delay (corresponding to system cost) by learning from the PUs’ and SGUs’ activities. Through tuning the scheduling decision, the long-term delay is minimized. 3) Extensive simulation experiments are carried out to evaluate the proposed QoS differential scheduling scheme. The numeric results show that the proposed priority policy can keep the low transmission delay for high class SGUs. Furthermore, the significantly low transmission delay could be achieved when emergency events occur to SGUs. This fact verifies that the proposed scheduling scheme is able to provide satisfying differential QoS. The remainder of this paper is organized as follows. The smart grid communications network structure and the QoS differential transmission scheduling scheme are introduced in Section II. In Section III, the transmission scheduling problem is formulated within the framework of SMDP. Section IV discusses the ADP solution for the formulated SMDP problem. The simulation results are shown in Section V. Finally, the conclusion is given in Section VI.

Fig. 1.

CR-based NAN.

II. C OGNITIVE R ADIO -BASED S MART G RID C OMMUNICATIONS N ETWORKS A. Smart Grid Communications Networks In general, the smart grid communications network is established to have hierarchical infrastructure, which can cover the entire power grid from the energy sources to household consumers [14]. The communications architecture has three tiered networks, including home area networks (HANs), neighborhood area networks (NANs), and wide area networks (WANs). The HANs organize various smart devices and applications in houses for energy management and provide demand response based on control commands, i.e., dynamic pricing [19]. The NANs collect energy consumption information from households and neighborhood distribution grid monitoring information, then deliver them to the local access points. The WANs communicate NANs with the utility control center, transmitting power information, and general control commands. In this paper, we consider a cognitive NAN in the smart grid, as shown in Fig. 1. The NAN cognitive gateway (NGW) consists of the access point and the spectrum manager. Energy consumption information is provided from household meters (HAN gateway, HGW) or electric vehicle charging stations. Other information about the neighborhood electricity distribution grid is also needed, which can be provided by substations, renewable energy, detectors, or monitors. This electricity data from SGUs will be received and delivered by the access point. The spectrum manager serves as a scheduler

YU et al.: QoS DIFFERENTIAL SCHEDULING IN CR-BASED SMART GRID NETWORKS

to allocate channels (spectrum resources) to the SGUs. The utility providers will purchase licensed spectrum bands to provide hard QoS for smart grid communications network, and rent other spectrum bands from primary systems, since there are large-scale energy consumption data transmission from countless nodes. In this paper, we discuss the transmission scheduling in the case of unlicensed channels. This means that the scheduler allocates the limited unlicensed channels to SGUs, then the SGUs, as the unlicensed users, perform data transmission without interfering the PUs. Once the PUs, as the licensed users, appear, the scheduler will withdraw the corresponding channels from SGUs, leading to the transmission delay of SGUs. The transmission delay of SGUs may be also caused by spectrum sensing. SGUs may spend specific time for channel detection. The development of the IEEE 802.22 WRAN standard proposes the spectral awareness by the geolocation/ database methods [20]. Hence, we consider that the spectrum manager (scheduler) will obtain the information of the spectrum resources from the spectrum database without performing local sensing. The scheduler has the full knowledge of PU occupation, and allocates the channels to SGUs efficiently. Therefore, the transmission delay of SGUs is mainly caused by the arrivals and departures of PUs. Meanwhile, the channel availability for SGUs is also determined by the PUs’ activities in this paper. In smart grid communications networks, most nodes are static, unlike the mobile networks. Thus, we consider that the condition of the wireless links between SGUs and access points are static, and the transmission rate is stable and constant in the scheduling process. For the SGUs, the regular data packets will be generated from time-to-time. The packets will stack in the buffer following the first-in-first-out (FIFO) rule and wait for transmission scheduling. However, the emergency data packets will be generated whenever they are needed. The emergency packets will directly move to the head of the buffer queue and have higher priority for transmission than regular packets. Thus, the buffer queue system is a priority-based FIFO system. B. QoS Differential Transmission Scheduling The SGUs are categorized into different classes based on their roles in NAN. Fig. 1 shows two classes of SGUs, c1 and c2. The SGUs, who provide vital messages of control, protection, management, belong to the higher class (c1). Those, who carry meter reading or transmit delaytolerant data, belong to the lower class (c2). This means that the data transmission of low-class SGUs always lags behind that of high-class SGUs. However, when some emergencies happen, such as the damage or regular hard check of devices, the smart meters should have high priority for transmission to report the emergencies. Therefore, emergency priority of SGUs should be beyond the SGU class. But the SGUs of c1 still has higher priority than the SGUs of c2, when both of them have emergencies. For the SGUs satisfaction, interruption during data transmission is more unexpected than being blocked occasionally on a new transmission. An interrupted SGU will be more

Fig. 2.

437

QoS differential transmission scheduling.

eager to finish its interrupted transmission than a newly arrived SGU. Therefore, in the same class, the interrupted SGUs should have higher priority than the newly arrived SGUs, decreasing the transmission delay of the interrupted SGUs. Similarly, the interrupted emergency SGUs have higher priority than the newly arrived emergency SGUs. It should be noticed that the interrupted emergency SGUs of c2 have higher priority than the interrupted SGUs of c1. As shown in Fig. 2, there are five priority queues for the differential QoS. Being sorted from high to low priority, they are PUs, interrupted emergency SGUs, newly arrived emergency SGUs, interrupted SGUs, and newly arrived SGUs. Before transmission, SGUs should queue according to their priorities. In each priority queue, SGUs are sorted by their classes in the decreasing order. The SGUs with high class move to the head of the queue. The SGUs with low class move to the tail of the queue. The scheduler preferentially allocates channels to the users in the highest priority queue, which means that the SGU has chance for data transmission only when its above queues are empty. The transmission of low priority SGUs will be interrupted at anytime when the PUs or high-priority SGUs arrive. The interrupted SGUs will go back to their corresponding queues and wait for being assigned. The above QoS-aware priority policy guarantees the differential QoS provision in smart grid. Based on this priority policy, the scheduler will adjust the allocation decision at each time slot to minimize the transmission delay of SGUs according to the channel’s state and SGU’s state. In Section III, we formulate the transmission scheduling procedure as an SMDP problem. III. S EMI -M ARKOV D ECISION P ROCESS F ORMULATION In this section, the problem of transmission scheduling in CR-based smart grid communications networks is formulated as an SMDP. For the SMDP, the transmission scheduling process is divided into a series of stages, and the number of

438

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

stages cannot be limited. The optimal transmission scheduling decision is made stage-by-stage. Hence, the optimization of the entire process can be achieved. A. Components of SMDP For transmission scheduling, the scheduler will make a decision on the channel allocation to SGUs and minimize the system cost according to the knowledge of the communications networks states, including the availability of channels (the activities of PUs) and SGUs conditions. The components of the SMDP corresponding to the transmission scheduling problem are as follows. 1) Stage: The scheduling process is naturally divided into a series of stages. Stages are indexed by integer k = 1, 2,… The duration of one stage is identically defined as τ . The scheduler makes the channel allocation decision at the beginning of each stage. PUs and SGUs will come at the beginning of each stage and leave at the end of each stage after services. 2) State: The current system state consists of the channel availability and the conditions of SGUs. Let ch n (k) denote the availability of channel n at stage k, n = 1, 2, . . . , N. ch n (k) = 1 means channel n is available for SGUs, and ch n (k) = 0 means channel n is occupied by a PU, which depends on the PU’s activity. For the state of SGUs, there are M SGUs opportunistically accessing to N channels, m = 1, 2, . . . , M. pm (k) denotes the number of packets in the buffer of SGU m at stage k. Im (k) is the interruption indicator. Im (k) = 1 denotes SGU m is interrupted at stage k. Otherwise, Im (k) = 0. E m (k) is the emergency indicator. E m (k) = 1 denotes that SGU m has emergency packets at stage k. Otherwise, E m (k) = 0. Let x m (k) presents the states of SGU m at stage k. We have x m (k) = ( pm (k), Im (k), E m (k)).

(1)

We call the system state at the beginning of stage k as the state of stage k, denoted by x(k). Thus, x(k) is defined by x(k) = (ch n (k), x m (k)|n = 1, . . . , N, m = 1, . . . , M). (2) The value x(k) does not change in the short duration of one stage. The set of all possible states is called as state space, denoted by X. 3) Decision: At stage k, the scheduler should make a decision u(k) according to the state x(k). The decision u(k) is defined by u(k) = (u m (k)|m = 1, 2, . . . , M)

(3)

where u m (k) denotes the channel allocation of SGU m at stage k. If u m (k) = n, this means that the SGU m is permitted to access to channel n. u m (k) = 0 denotes SGU m has no channel for packet transmission. Before SGU m is allocated a channel, i.e., u m (k) > 0, its packets must be nonempty, i.e., pm (k) > 0. The decision space denoted by U. The subset U[x(k)] contains all possible decisions given the system state x(k). The transmission rate mainly depends on the wireless channel condition. We set the transmission rates of all channels are the same, and the Ps number of packets just

can be successfully delivered in one stage. Hence, we have pm (k + 1) = pm (k) − Ps , if u m (k) > 0 and pm (k) > Ps . 4) Policy: Policy is a sequence of decision functions. That is π = [μ(1), μ(2), . . . , μ(k), . . .]. If for all stages k, there is μ(k) ≡ μ, then decision functions do not change with stages and the policy is stationary. Since we only consider the stationary policy in this paper, each decision function μ(k) : X → U is a mapping from the state space X to the decision space U. The stationary policy is denoted by μ. The decision at stage k is also represented by u(k) ≡ μ[x(k)]. 5) Utility Function: The utility function of stage k is determined by state x(k) and decision u(k), described as U [x(k), u(k)]. The packet transmission delay is the measurement to estimate the QoS performance in CR-based smart grid communications networks. Hence, let the utility function have the following form: U [x(k), u(k)] =

M 

Wm (k)τm (k)1{ pm (k)>0}

(4)

m=1

where 1 A is an indicator function, providing that 1 A = 1 when A is true, and 1 A = 0 when A is false. In (4), the transmission delay is computed only when the SGU has packets for transmission. τm (k) denotes the transmission delay of SGU m at stage k, which is caused by blocking and interruption. The transmission delay of SGU m is calculated by  τ, u m (k) = 0 (5) τm (k) = 0, u m (k) > 0. At stage k, when the SGU m is blocked or interrupted, i.e., SGU m has packets in buffer but it has no channel for transmission, SGU m has to wait in its queue in this stage, and the duration of stage is τ . Thus, the delay time τm (k) = τ . Otherwise, the SGU can transmit packets on the given channel, or the SGU is idle and has no packet at stage k. Thus, we have τm (k) = 0. To guarantee the differential QoS of SGUs, the scheduler distributes the channels based on the priorities of SGUs. Wm (k) is the weight of SGU m at stage k, which is related to its role class, interruption, and emergency situations. Let Wm (k) = Wb,m + W I 1{Im (k)=1} + W E 1{Em (k)=1}

(6)

where Wb,m is the basic weight of SGU m according to its class in smart grid communications networks. If SGU m corresponds to class i in smart grid, we have Wb,m = Wc,i , i = 1, 2, . . . , I. Wc,i is the weight of class i . There are total I classes (I  M) and class 1 (c1) has the highest priority with the highest weight, i.e., Wc,1 > Wc,2 > · · · > Wc,I . W I and W E are interruption weight and emergency weight, respectively. For the above weights, the following constrains must be satisfied:

Wc,i

Wc,i > Wc,i+1 + W I + W I < Wc,i+1 + W E .

(7) (8)

Constrain (7) means a regular SGU in class i still has higher priority than an interrupted SGU in class i + 1, keeping the interrupted priority within one class. Constrain (8) means

YU et al.: QoS DIFFERENTIAL SCHEDULING IN CR-BASED SMART GRID NETWORKS

Fig. 3.

439

HDP architecture for the QoS differential transmission scheduling.

an interrupted SGU in class i has lower priority than an emergency SGU in class i + 1, which permits the communications of the emergency SGU beyond the limit of class. The utility function describes the transmission delay at each stage. The transmission delay of the entire scheduling process is defined as the system cost, i.e., the long-term transmission delay. Under policy μ, the system cost is provided by J [x(k)] =

∞ 

U (x( j ), μ[x( j )]) .

(9)

j =k

The number of stages can be infinite. The target of the SMDP problem is to find out the optimal policy μ∗ to minimize the system cost. The optimal policy leads to minimum system cost, denoted by J ∗ . IV. A DAPTIVE DYNAMIC P ROGRAMMING S OLUTION In this section, the methodology of ADP is introduced to solve the corresponding problems in SMDP. The ADP is the powerful tool for controlling nonlinear systems with extensive use. Optimal control schemes are developed for the unknown discrete-time nonlinear systems using ADP [21]–[23]. Newly developed ADP approaches are proposed to obtain near optimal control [24]–[26]. The ADP method is also applied to design decentralized controller for large-scale nonlinear systems [27]. In this paper, we employ the heuristic dynamic programming (HDP) that is the basic architecture of ADP [28]. A typical HDP contains action, model, and critic networks, which adopt neural networks for function approximation. For transmission scheduling problem, the HDP can learn from the activities of PUs and SGUs, and the QoS differential priority policy to estimate the optimal long-term system costs by online network training, even if the scheduler has no full knowledge of the system model. A. Bellman Equation By solving the Bellman optimality equation, the optimal policy μ∗ of the SMDP problem could be obtained. The optimal system cost from stage k is described by J ∗ [x(k)] = min([U (x(k), u(k)] + J ∗ [x(k + 1)]). u(k)

(10)

The principle of optimality is to minimize J in immediate future and subsequently minimize the sum of U over all stages. If there is an available way to compute the optimal cost J ∗ [x(k + 1)], the optimal transmission scheduling policy can be obtained by u∗ (k) = arg min([U (x(k), u(k)] + J ∗ [x(k + 1)]).

(11)

However, due to the huge size of state space and the overwhelming computational requirement, the accurate solution of J ∗ [x(k + 1)] in (11) becomes unrealistic. Hence, we present an HDP architecture for transmission scheduling problem to produce an approximating optimal solution. B. HDP Architecture The HDP architecture for transmission scheduling is shown in Fig. 3, including the scheduler, the system model, and the critic network. If the number of stages is infinite, the long-term system cost is also infinite according to (9), since the transmission delay always occurs with the PUs’ activities. Hence, we rewrite the system cost (9) as the following form: J (k) =

∞ 

γ j −k U ( j ).

(12)

j =k

For presentation, let U (k) = U [x(k), u(k)] and J (k) = J [x(k)]. γ is the discount factor (0 ≤ γ ≤ 1). γ = 0 means that only the current utility value U (k) is considered in system cost, and the future values are neglected. γ = 1 means all future utility values are equally considered in the infinite range. The long-term system cost can be estimated only when γ < 1. In Fig. 3, the scheduler will give a decision u(k) based on the system state x(k) and approximating system cost Jˆ(k). The scheduler behavior can be described by the following equation: u(k) = arg min[U (k) + γ Jˆ(k + 1)].

(13)

The U (k) is computed by x(k), so that the scheduler also has an input of x(k). The system model in Fig. 3 describes the activities of PUs and SGUs. Its inputs are scheduler decision u(k) and system state x(k), and the output is the system state of the next stage xˆ (k + 1). The system state consists of channel

440

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

state and SGU state. The channel state changes according to the PUs’ activities, without the effect of SGUs and scheduler. For SGU state, the numbers of packets from SGU m at the next stage is provided as follows:  pm (k), um = 0 (14) pm (k + 1) = pm (k) − Ps , u m > 0. The interruption indicator of SGU m at the next stage is provided by  1, pm (k) > 0 and u m = 0 (15) Im (k + 1) = 0, or. The emergency indicator of SGU will change according to the appearance of emergency packets. The emergency packets will be generated based on smart grid service needs. For the critic network in Fig. 3, we employ the back propagation neural network (BPNN). The input of critic network is the system state. The output is the approximating system cost Jˆ(k + 1) approaching the optimal system cost J ∗ by network training. The aim of the critic network training is to minimize the following error function:  1 2 E c (k) = ec (k) E c = 2 k k 1 ˆ = [ J (k) − U (k) − γ Jˆ(k + 1)]2 (16) 2 k

where Jˆ(k) = Jˆ[x(k), u(k), Wc ] and Wc is the critic network weight parameters. When E c (k) = 0 is obtained for stage k, we get the following derivation from (16): Jˆ(k) = U (k) + γ Jˆ(k + 1)

= U (k) + γ [U (k + 1) + γ Jˆ(k + 2)] ∞  = γ j −k U ( j )

(17)

j =k

which is the same to the system cost in (12). Therefore, we train a neural network by minimizing the error function (16). Then, we can obtain the approximating value estimating the system cost defined in (12) for stage k + 1. C. Critic Network Training For the weight update of the critic network, the gradient-based adaptation is adopted to minimize the error function (16). The weight update formulation of critic network is presented as follows:   ∂ E c (k) Wc (k) = lc (k) − ∂Wc (k)   ∂ E c (k) ∂ Jˆ(k) = lc (k) − (18) · ∂ Jˆ(k) ∂Wc (k) Wc (k + 1) = Wc (k) + Wc (k)

(19)

where lc is the positive learning rate of the critic network, which usually decreases with time to a small value for control system. Wc consists of Wc1 and Wc2 . Wc1 denotes the weight matrix between the input and the hidden layer, and

Wc2 denotes the weight matrix between the hidden layer and the output, which are derived by  T 1 Wc1 = − lc ec xˆ T Wc2 · (1 − ch2 · ch2 ) 2 T Wc2 = −lc ec ch2

(20) (21)

where ch1 and ch2 are the input matrix and output matrix of hidden layer. Updating the weight parameters by the above formulations, the output of the critic network will approach to the approximating value of the system cost defined in (12). By adopting this HDP architecture in transmission scheduling problem, the long-term system cost is estimated and minimized. The optimal scheduling policy can be obtained by HDP with a well-trained critic network. V. S IMULATION R ESULTS A. Simulation Setup In the simulations, we consider the cognitive NAN in smart grid. The utility provider rents a spectrum band, which consists of four orthogonal channels with identical bandwidth. These channels are shared by their licensed PUs, N = 4. There are eight SGUs opportunistically accessing to the licensed channels, M = 8. There are two classes of SGUs, c1 and c2. SGUs 1–4 belong to c1, and SGUs 5–8 belong to c2. The transmission scheduling process has 1000 stages, and the duration of one stage τ is 1. The PUs arrive as the Poisson process with arrival rate of λPU . The PUs occupy the channel for the duration of 10τ each time. The activities of PUs are known and recorded in the spectrum database. By accessing to the database, the scheduler can allocate the channels to SGUs when the channels are not occupied by PUs. The SGUs also arrive as Poisson process with arrival rate of λSGU and bring three packets each time when they arrive. Each packet can be transmitted successfully at one stage, i.e., Ps = 1. There are four channels and eight SGUs. Each SGU has 3 states at each stage. Thus, the system state x have 4 + 8 × 3 = 28 components at each stage. The decision u has eight components. Therefore, in the proposed HDP architecture, the scheduler has 28 + 1 = 29 inputs of state and eight outputs of decision. The system model has 28 + 8 = 36 inputs and 28 outputs of next state. The critic network is chosen as 28-35-1 structure of BPNN, including 28 input neurons, 35 hidden layer neurons, and one output neuron. The output layer uses purelin function. The hidden layer uses the sigmoidal function, provided by y=

1 − e−x . 1 + e−x

(22)

The weight parameters of critic network are initialized randomly, which is similar to the training of regulate BPNN for function approximation. Then, the gradient-based method is employed for the weight update to approach the minimum system cost in (18)–(21). The simulation results are presented using the self-developed simulator based on MATLAB.

YU et al.: QoS DIFFERENTIAL SCHEDULING IN CR-BASED SMART GRID NETWORKS

Fig. 4.

Output of the critic network with γ = 0.9.

Fig. 6.

441

Delay per packet of SGU 1 in terms of SGU arrival rate.

there is no need to train the critic network. A higher γ is corresponding to the long-term system cost, which considers more utility values in future stages with the sacrifice of convergence speed. The fluctuation of output is caused by the dynamic arrivals and departures of PUs and SGUs. Therefore, it is not appropriate for critic network training to set the learning rate as a too low or too high value. For a low learning rate, the user’s dynamical activities will overwhelm the weak learning ability of the critic network. A high learning rate will make the network parameters change dramatically, and eventually miss the optimal convergence value. C. Scheduling Performance

Fig. 5.

Output of the critic network with γ = 0.7.

B. Convergence Performance The convergence performance of the proposed HDP architecture is presented to solve the transmission scheduling problem. The learning rate lc is set to 0.2, which is kept in the whole network training process. Because the PUs and SGUs states are changing continuously, the network should keep this learning ability in the whole scheduling process. With the discount factor γ = 0.9, Fig. 4 shows the output of the critic network Jˆ, which is called the approximating optimal system cost. We find that our HDP architecture can estimate the Jˆ approaching to the optimal system cost J ∗ by network training. Fig. 5 shows the approximating optimal system cost with γ = 0.7. Compared with γ = 0.9, the small discount factor γ improves the convergence speed. Because a smaller γ means that the future utility values are less considered in the system cost in network training. If the system cost only contains the utility value in current stage (γ = 0), the scheduler can give an optimal decision immediately, and

The transmission delay is presented to show the performance of the proposed QoS differential scheduling. The delay and the transmitted packet number of each SGU are summed up in the whole process. Thus, we have the overall delay and the throughput. Let the overall delay divide the throughput, which is the delay per packet. The delay per packet is calculated after 100 experiments. Fig. 6 shows the mean and standard deviation of delay per packet of SGU 1, with fixed λPU , when the smart grid communications network introduces the QoS differential priority policy, SGU 1 has the highest priority in c1. Thus, we find the SGU 1 has lower delay than that without priority. Even the arrival rate λSGU of all SGU increase, leading to heavy SGU traffic, the SGU 1 still has low transmission delay guaranteeing the QoS of high priority SGUs. Fig. 7 shows the delay per packet of SGU 1 when all SGUs have fixed λSGU . While the PU traffic is heavy, the transmission delay of SGUs will also be high, in the case of priority policy, since the PUs have the absolute priority to use the licensed channels regardless of the highest priority of SGU 1. But the proposed priority policy still decreases the transmission delay of high priority SGUs compared with the case with no priority. In the experiments, we let SGUs generate emergency packets with the probability of 10%. Figs. 8 and 9 show the

442

Fig. 7.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 27, NO. 2, FEBRUARY 2016

Delay per packet of SGU 1 in terms of PU arrival rate.

Fig. 9.

Delay per emergency packet of SGUs in terms of PU arrival rate.

smart grid and their situations of interruption and emergency, which guarantees the QoS of high-priority SGUs and all emergency SGUs. VI. C ONCLUSION

Fig. 8.

Delay per emergency packet of SGUs in terms of SGU arrival rate.

delay per emergency packet of SGUs with fixed λPU and fixed λSGU , respectively. In our priority policy, the SGU 8 belongs to the lowest class with lowest priority. In both Figs. 8 and 9, we find that the proposed QoS differential priority policy reduces the delay of emergency packets of SGU 1 and SGU 8 significantly. Actually, delays of emergency packets for all SGUs are similarly low in our policy. Because the probability of emergency packets is low, the probability of two or more SGUs simultaneously appearing emergency packets is even lower. In general, if one SGU has emergency packets, it will have the highest priority with very high probability in the scheduling. Therefore, all SGUs have the similar low delay levels. Since the emergency packets are generated randomly, all the standard deviations are rather high, which are not marked in figures for presentation. In general, the QoS differential priority policy assigns the priority levels to the SGUs based on their roles in

In this paper, the QoS differential transmission scheduling is proposed in the CR-based smart grid communications networks. A QoS-aware priority policy is devised for differential QoS provision. The SGUs are assigned to have different classes according to their roles in the smart grid. The priorities of SGUs will be readjusted depending on their current situations of interruption or emergency. Based on the proposed priority policy, the scheduler makes decision of channel allocation for the minimum transmission delay of SGUs. The entire transmission scheduling problem is formulated as an SMDP with infinite stages and solved by the methodology of ADP. An HDP architecture is established, which has the ability of online network training, learning from the activities of PUs and smart grid users, and adjusting scheduling decision to achieve transmission delay minimization. Simulation results illustrate that the proposed priority policy is able to guarantee low transmission delay for high priority SGUs. Meanwhile, the transmission delay of emergency data decreases to a satisfying low level for all users, ensuring the differential QoS provision in smart grid. R EFERENCES [1] H. Farhangi, “The path of the smart grid,” IEEE Power Energy Mag., vol. 8, no. 1, pp. 18–28, Jan./Feb. 2010. [2] H. Nicanfar, P. Jokar, K. Beznosov, and V. C. M. Leung, “Efficient authentication and key management mechanisms for smart grid communications,” IEEE Syst. J., vol. 8, no. 2, pp. 629–640, Jun. 2013. [3] F. Bouhafs, M. Mackay, and M. Merabti, “Links to the future: Communication requirements and challenges in the smart grid,” IEEE Power Energy Mag., vol. 10, no. 1, pp. 24–32, Jan./Feb. 2012. [4] A. R. Metke and R. L. Ekl, “Security technology for smart grid networks,” IEEE Trans. Smart Grid, vol. 1, no. 1, pp. 99–107, Jun. 2010. [5] G. T. Heydt, “The next generation of power distribution systems,” IEEE Trans. Smart Grid, vol. 1, no. 3, pp. 225–235, Dec. 2010.

YU et al.: QoS DIFFERENTIAL SCHEDULING IN CR-BASED SMART GRID NETWORKS

[6] Y. Yan, Y. Qian, H. Sharif, and D. Tipper, “A survey on smart grid communication infrastructures: Motivations, requirements and challenges,” IEEE Commun. Surveys Tuts., vol. 15, no. 1, pp. 5–20, Feb. 2013. [7] C. Bennett and D. Highfill, “Networking AMI smart meters,” in Proc. IEEE Energy 2030 Conf. (ENERGY), Nov. 2008, pp. 1–8. [8] H. Liu, H. Ning, Y. Zhang, and L. T. Yang, “Aggregated-proofs based privacy-preserving authentication for V2G networks in the smart grid,” IEEE Trans. Smart Grid, vol. 3, no. 4, pp. 1722–1733, Dec. 2012. [9] D. He, C. Chen, J. Bu, S. Chan, Y. Zhang, and M. Guizani, “Secure service provision in smart grid communications,” IEEE Commun. Mag., vol. 50, no. 8, pp. 53–61, Aug. 2012. [10] X. Ma, H. Li, and S. Djouadi, “Networked system state estimation in smart grid over cognitive radio infrastructures,” in Proc. 45th Annu. Conf. Inf. Sci. Syst. (CISS), Mar. 2011, pp. 1–5. [11] The Smart Grid Utility Data Market. [Online]. Available: http://www. sbireports.com/Smart-Grid-Utility-2496610, accessed Jan. 2, 2015. [12] C.-H. Lo and N. Ansari, “The progressive smart grid system from both power and communications aspects,” IEEE Commun. Surveys Tuts., vol. 14, no. 3, pp. 799–821, Jul. 2012. [13] Docket No 03-222 Notice of Proposed Rule Making and Order, ET, FCC, Washington, DC, USA, 2003. [14] R. Yu, Y. Zhang, S. Gjessing, C. Yuen, S. Xie, and M. Guizani, “Cognitive radio based hierarchical communications infrastructure for smart grid,” IEEE Netw., vol. 25, no. 5, pp. 6–14, Sep./Oct. 2011. [15] Y. Zhang, R. Yu, M. Nekovee, Y. Liu, S. Xie, and S. Gjessing, “Cognitive machine-to-machine communications: Visions and potentials for the smart grid,” IEEE Netw., vol. 26, no. 3, pp. 6–13, May/Jun. 2012. [16] R. Deng, J. Chen, X. Cao, Y. Zhang, S. Maharjan, and S. Gjessing, “Sensing-performance tradeoff in cognitive radio enabled smart grid,” IEEE Trans. Smart Grid, vol. 4, no. 1, pp. 302–310, Mar. 2013. [17] Q. Li, Z. Feng, W. Li, T. A. Gulliver, and P. Zhang, “Joint spatial and temporal spectrum sharing for demand response management in cognitive radio enabled smart grid,” IEEE Trans. Smart Grid, vol. 5, no. 4, pp. 1993–2001, Jul. 2014. [18] J. Huang, H. Wang, Y. Qian, and C. Wang, “Priority-based traffic scheduling and utility optimization for cognitive radio communication infrastructure-based smart grid,” IEEE Trans. Smart Grid, vol. 4, no. 1, pp. 78–86, Mar. 2013. [19] Y. Zhang, R. Yu, S. Xie, W. Yao, Y. Xiao, and M. Guizani, “Home M2M networks: Architectures, standards, and QoS improvement,” IEEE Commun. Mag., vol. 49, no. 4, pp. 44–52, Apr. 2011. [20] Y. Liu, R. Yu, M. Pan, and Y. Zhang, “Adaptive channel access in spectrum database-driven cognitive radio networks,” in Proc. IEEE Int. Conf. Commun. (ICC), Jun. 2014, pp. 4933–4938. [21] D. Liu, D. Wang, and X. Yang, “An iterative adaptive dynamic programming algorithm for optimal control of unknown discrete-time nonlinear systems with constrained inputs,” Inf. Sci., vol. 220, pp. 331–342, Jan. 2013. [22] D. Liu, D. Wang, D. Zhao, Q. Wei, and N. Jin, “Neural-network-based optimal control for a class of unknown discrete-time nonlinear systems using globalized dual heuristic programming,” IEEE Trans. Autom. Sci. Eng., vol. 9, no. 3, pp. 628–634, Jul. 2012. [23] D. Wang, D. Liu, Q. Wei, D. Zhao, and N. Jin, “Optimal control of unknown nonaffine nonlinear discrete-time systems based on adaptive dynamic programming,” Automatica, vol. 48, no. 8, pp. 1825–1832, 2012. [24] X. Yang, D. Liu, and Y. Huang, “Neural-network-based online optimal control for uncertain non-linear continuous-time systems with control constraints,” IET Control Theory Appl., vol. 7, no. 17, pp. 2037–2047, Nov. 2013. [25] Q. Wei and D. Liu, “Numerical adaptive learning control scheme for discrete-time non-linear systems,” IET Control Theory Appl., vol. 7, no. 11, pp. 1472–1486, Jul. 2013. [26] H. Li and D. Liu, “Optimal control for discrete-time affine non-linear systems using general value iteration,” IET Control Theory Appl., vol. 6, no. 18, pp. 2725–2736, 2012. [27] D. Liu, D. Wang, and H. Li, “Decentralized stabilization for a class of continuous-time nonlinear interconnected systems using online learning optimal control approach,” IEEE Trans. Neural Netw. Learn. Syst., vol. 25, no. 2, pp. 418–428, Feb. 2014. [28] F.-Y. Wang, H. Zhang, and D. Liu, “Adaptive dynamic programming: An introduction,” IEEE Comput. Intell. Mag., vol. 4, no. 2, pp. 39–47, May 2009.

443

Rong Yu (S’05–M’08) received the Ph.D. degree from Tsinghua University, Beijing, China, in 2007. He was with the School of Electronic and Information Engineering, South China University of Technology, Guangzhou, China. He joined the Institute of Intelligent Information Processing, Guangdong University of Technology, Guangzhou, in 2010, where he is currently a Full Professor. His current research interests include wireless communications and networking, including cognitive radio, wireless sensor networks, and home networking. He has authored or co-authored over 70 international journal and conference papers. He is the co-inventor of over 10 patents. Prof. Yu is the member of the Home Networking Standard Committee, China, where he leads the standardization work of three standards. He is currently serving as the Deputy Secretary General of the Internet of Things (IoT) Industry Alliance, Guangdong, and the Deputy Head with the IoT Engineering Center, Guangdong. Weifeng Zhong (S’13) received the B.Eng. degree in automation from the Guangdong University of Technology (GDUT), Guangzhou, China, in 2013, where he is currently pursuing the Ph.D. degree in control science and engineering. He is also a member with the Institute of Intelligent Information Processing, GDUT. His current research interests include vehicle mobility, energy scheduling, vehicle-to-grid networks, and cognitive radio networks in the smart grid.

Shengli Xie (M’01–SM’02) received the M.S. degree in mathematics from Central China Normal University, Wuhan, China, in 1992, and the Ph.D. degree in control theory and applications from the South China University of Technology, Guangzhou, China, in 1997. He is currently a Full Professor and the Head of the Institute of Intelligent Information Processing with the Guangdong University of Technology, Guangzhou. He has authored or co-authored two books and over 150 scientific papers in journals and conference proceedings. His current research interests include wireless networks, automatic control, and blind signal processing. Prof. Xie received the second prize of the China’s State Natural Science Award for his work on blind source separation and identification in 2009. Yan Zhang (M’05–SM’10) received the Ph.D. degree from Nanyang Technological University, Singapore. He is currently with the Simula Research Laboratory, Oslo, Norway, and an Adjunct Associate Professor with the Department of Informatics, University of Oslo, Oslo. His current research interests include wireless networks, cyber physical systems, and smart grid communications. Prof. Zhang is also a Regional Editor, an Associate Editor, on the Editorial Board, or a Guest Editor of a number of international journals. Yun Zhang received the B.S. and M.S. degrees from Hunan University, Changsha, China, in 1982 and 1986, respectively, and the Ph.D. degree from the South China University of Science and Technology, Guangzhou, China, in 1998, all in automatic engineering. He is currently a Professor with the Department of Automation, Guangdong University of Technology, Guangzhou. His current research interests include intelligent control systems, network systems, and signal processing.

QoS Differential Scheduling in Cognitive-Radio-Based Smart Grid Networks: An Adaptive Dynamic Programming Approach.

As the next-generation power grid, smart grid will be integrated with a variety of novel communication technologies to support the explosive data traf...
2MB Sizes 0 Downloads 8 Views