sensors Article

A Novel Dynamic Spectrum Access Framework Based on Reinforcement Learning for Cognitive Radio Sensor Networks Yun Lin 1, *, Chao Wang 1 , Jiaxing Wang 2 and Zheng Dou 1 1 2

*

College of Information and Communication Engineering, Harbin Engineering University, Harbin 150001, China; [email protected] (C.W.); [email protected] (Z.D.) Beijing Huawei Digital Technologies Co., Ltd., Beijing 100032, China; [email protected] Correspondence: [email protected]

Academic Editor: Leonhard M. Reindl Received: 15 July 2016; Accepted: 7 October 2016; Published: 12 October 2016

Abstract: Cognitive radio sensor networks are one of the kinds of application where cognitive techniques can be adopted and have many potential applications, challenges and future research trends. According to the research surveys, dynamic spectrum access is an important and necessary technology for future cognitive sensor networks. Traditional methods of dynamic spectrum access are based on spectrum holes and they have some drawbacks, such as low accessibility and high interruptibility, which negatively affect the transmission performance of the sensor networks. To address this problem, in this paper a new initialization mechanism is proposed to establish a communication link and set up a sensor network without adopting spectrum holes to convey control information. Specifically, firstly a transmission channel model for analyzing the maximum accessible capacity for three different polices in a fading environment is discussed. Secondly, a hybrid spectrum access algorithm based on a reinforcement learning model is proposed for the power allocation problem of both the transmission channel and the control channel. Finally, extensive simulations have been conducted and simulation results show that this new algorithm provides a significant improvement in terms of the tradeoff between the control channel reliability and the efficiency of the transmission channel. Keywords: dynamic spectrum access; control channel; power allocation; reinforcement learning

1. Introduction Cognitive radio (CR) is a promising technology which can fully use the spectrum by dynamically accessing the primary network. Consequently, dynamic spectrum access technology plays a very significant role and has become a hot research topic. As illustrated in Figure 1, dynamic spectrum access strategies can be classified into three models, e.g., the dynamic exclusive use model, the open sharing model, and the hierarchical model. Among those models, the hierarchical model is a hierarchical access structure for primary users (PUs) and secondary users (SUs), and is the most promising and effective one for current spectrum access policies [1]. The basic idea of the hierarchical model is that the SUs can use the licensed spectrum of PUs, as long as they can limit any interference perceived by PUs. Furthermore, there are two models of the spectrum sharing between PUs and SUs, namely spectrum underlay and spectrum overlay. Spectrum underlay introduces severe constraints on the transmission power of the SUs, therefore, it spreads the transmitted signals over a wide frequency band. The SUs can achieve low data rates with very low transmission power. If the PUs transmit in all the time-slots, the spectrum underlay does not need to detect and perceive the spectrum of the PUs.

Sensors 2016, 16, 1675; doi:10.3390/s16101675

www.mdpi.com/journal/sensors

Sensors 2016, 16, 1675 Sensors 2016, 16, 1675

2 of 22 2 of 23

Dynamic Spectrum Access

Dynamic Exclusive Use Model

Spectrum Property Rights

Open Sharing Model (Spectrum Commons Model)

Dynamic Spectrum Allocation

Hierarchical Access Model

Spectrum Underlay (Ultra Wide Band)

Spectrum Overlay (Opportunistic Spectrum Access)

Figure ctrum acce ss models. mode ls . Figure1.1.The Thedynamic dynamicspe spectrum access

Spectrum overlay, first presented by Mitola, can be also regarded as opportunistic spectrum Spectrum overlay, first presented by Mitola, can be also regarded as opportunistic spectrum access access (OSA). Compared to the spectrum underlay, this model needs to detect and per ceive the (OSA). Compared to the spectrum underlay, this model needs to detect and perceive the spectra of spectra of the PUs. It finds spatial and temporal spectrum white space for SUs to use, which is also the PUs. It finds spatial and temporal spectrum white space for SUs to use, which is also termed as termed as the spectrum holes (SHs). Therefore, this model does not need to obey the severe the spectrum holes (SHs). Therefore, this model does not need to obey the severe transmission power transmission power constraints of the SUs, and the SUs can achieve high date rates with high constraints of the SUs, and the SUs can achieve high date rates with high transmission power. transmission power. In most cases, the spectrum overlay and underlay models are used separately. In this paper, In most cases, the spectrum overlay and underlay models are used separately. In this paper, a a hybrid spectrum access model is proposed to use both the overlay and underlay methods hybrid spectrum access model is proposed to use both the overlay and underlay methods simultaneously to further improve the current spectrum efficiency. simultaneously to further improve the current spectrum efficiency. The spectrum hole (SH) is a part of the licensed spectrum which is not being used by the owner The spectrum hole (SH) is a part of the licensed spectrum which is not being used by the owner during a period of time [1]. Among key technologies in CR, the design of the control channel is during a period of time [1]. Among key technologies in CR, the design of the control channel is essential because the SUs need a control channel to coordinate and they have no licensed spectrum to essential because the SUs need a control channel to coordinate and they have no licensed spectrum carry the control information. The vulnerabilities resulting from utilizing a dedicated control channel to carry the control information. The vulnerabilities resulting from utilizing a dedicated control have been well studied. Existing studies of the control channel have shown that using SHs to convey channel have been well studied. Existing studies of the control channel have shown that using SHs control information is only a basic approach and many shortcomings have been pointed out [2–6]. to convey control information is only a basic approach and many shortcomings have been pointed Firstly, the SUs may not have a common SH as control channel which would lead to low connectivity out [2–6]. Firstly, the SUs may not have a common SH as control channel which would lead to low of the SUs. Secondly, the arrival of PU is unknown which causes interruptions in the use of the connectivity of the SUs. Secondly, the arrival of PU is unknown which causes interruptions in the control channel. use of the control channel. As the SUs communicate only in the SHs, the SUs need information about those unused bands As the SUs communicate only in the SHs, the SUs need information about those unused bands in which the PUs are inactive. Each SU should maintain a list of SHs which probably will differ in which the PUs are inactive. Each SU should maintain a list of SHs which probably will differ from from one to another. The SUs can communicate with each other if there is a common SH in their one to another. The SUs can communicate with each other if there is a common SH in their lists. lists. Consequently, there should be a way to pass information about the lists between SUs during the Consequently, there should be a way to pass information about the lists between SUs during the initial communication. initial communication. Most of the existing MAC protocols of CR sensor networks are focused on avoiding common Most of the existing MAC protocols of CR sensor networks are focused on avoiding common control channels. However, in this paper, a new method of spreading the power spectrum density in control channels. However, in this paper, a new method of spreading the power spectrum density in a control channel over an ultra-wide bandwidth is proposed to exploit the underused (gray) spectral a control channel over an ultra-wide bandwidth is proposed to exploit the underused (gray) spectral regions. Like underlay spectrum sharing, the SUs can always access to the spectrum as long as the regions. Like underlay spectrum sharing, the SUs can always access to the spectrum as long as the interference causing by SUs at the PU receiver can satisfactorily meet the threshold constraint [7]. interference causing by SUs at the PU receiver can satisfactorily meet the threshold constraint [7]. According to the above analysis and considering the low power spectrum density of underlay According to the above analysis and considering the low power spectrum density of underlay waveforms, we propose to design a control channel to convey a small amount of control information, waveforms, we propose to design a control channel to convey a small amount of control information, which is termed as SUCCH. At the same time, the spectrum overlay waveform is adopted to exchange which is termed as SUCCH. At the same time, the spectrum overlay waveform is adopted to a large amount of date, which is named as SUTCH. Our study is based on a spectrum sharing exchange a large amount of date, which is named as SUTCH. Our study is based on a spectrum system consisting of two different waveforms. The first one is the Direct Sequence Code Division sharing system consisting of two different waveforms. The first one is the Direct Sequence Code Multiple Access (DS-CDMA), which is defined as the underlay waveform used to convey control Division Multiple Access (DS-CDMA), which is defined as the underlay waveform used to convey information. The second one is the Non-Contiguous Orthogonal Frequency Division Multiplexing control information. The second one is the Non-Contiguous Orthogonal Frequency Division (NC-OFDM), which is defined as the overlay waveform used to convey data information. The spectrum Multiplexing (NC-OFDM), which is defined as the overlay waveform used to convey data of NC-OFDM-based SUs is shared with the PUs which utilize DS-CDMA. Spreading Gain of DS-CDMA information. The spectrum of NC-OFDM-based SUs is shared with the PUs which utilize provides the required anti-jamming capability for the interference which may be caused by the SUs. DS-CDMA. Spreading Gain of DS-CDMA provides the required anti-jamming capability for the In the meantime, based on the properties of the non-continuous power spectrum of NC-OFDM, it is interference which may be caused by the SUs. In the meantime, based on the properties of the more flexible for the SUs to access the SHs which are discontinuous in the frequency spectrum [8]. It is non-continuous power spectrum of NC-OFDM, it is more flexible for the SUs to access the SHs which are discontinuous in the frequency spectrum [8]. It is of great significance to discuss and

Sensors 2016, 16, 1675

3 of 22

of great significance Sensors 2016, 16, 1675to discuss and study this issue, since the existing DS-CDMA is anticipated 3 of 23to be one of the spectrum sharing applications used in the future [9]. study thistoissue, since existing DS-CDMA is anticipated to be one of the spectrum In order set up thethe hybrid spectrum access model, several questions should besharing answered. applications used in the future [9]. The first one is the procedure for network setup between two SUs. The second one is the maximum In order to set up the hybrid spectrum access model, several questions should be answered. access capacity of the SUTCH with different strategies. The third one is the reliability of the SUCCH. The first one is the procedure for network setup between two SUs. The second one is the maximum The fourth one is the power allocation strategies of the SUs between the SUTCH and SUCCH. In the access capacity of the SUTCH with different strategies. The third one is the reliability of the SUCCH. rest of this paper, will be answered in detail. Specifically, SectionIn2the builds The fourth one isthe theabove power questions allocation strategies of the SUs between the SUTCH and SUCCH. application a mechanism establishing the CR sensor networks. Section 3, rest of scenarios this paper,and theproposes above questions will befor answered in detail. Specifically, Section 2Inbuilds a transmission for analyzing the maximum access fornetworks. differentInpolices applicationchannel scenariosmodel and proposes a mechanism for establishing thecapacity CR sensor Section with different objectives inchannel the fading environment discussed. In capacity Section for 4, the reliability of the 3, a transmission model for analyzingwill the be maximum access different polices with different objectives in the fading environment will be discussed. In Section 4, the reliability of SUCCH is analyzed, and a hybrid spectrum access algorithm based on reinforcement learning model is the SUCCH analyzed, and a problem hybrid spectrum access algorithm based on reinforcement learning proposed for the ispower allocation of the SUTCH and the SUCCH. Finally, Section 5 presents model is proposed power6allocation problem of the SUTCH and the SUCCH. Finally, Section our simulation results for andthe Section concludes the paper. 5 presents our simulation results and Section 6 concludes the paper.

2. Application Scenarios

2. Application Scenarios

In this section the application scenario is described as below. As shown in Figure 2, there are four In this section the application scenario is described as below. As shown in Figure 2, there are activefour PUsactive and each one is authorized to use a certain frequency band to communicate. The different PUs and each one is authorized to use a certain frequency band to communicate. The typesdifferent of circlestypes represent the represent interference of each PU, of and six PU, SUs and are shown Figure 2. in In this of circles the ranges interference ranges each six SUs in are shown paperFigure there 2. is In a channel which is termed a SH and a SU that can communicate in this channel because this paper there is a channel which is termed a SH and a SU that can communicate in it is athis channel whose authorized PU iswhose currently inactivePU oristhe SU is beyond interference range of channel because it is a channel authorized currently inactivethe or the SU is beyond the interference range of that PU. that PU. PU1 PU2

SU1

SU2

SU4

SU5

SU3

SU6 PU3 PU4

Figure 2. The SUs among four PUs.

Figure 2. The SUs among four PUs. A SU can establish the connection with another SU as long as they both have a shared SH in

A SUrespective can establish the connection with another SUtoasidentify long asitsthey both have a the shared SH in their lists of SHs, so it is important for a SU neighbors during initial communication used to s et up CR sensor networks. In order to fully utilize the primary spectrum their respective lists of SHs, so it is important for a SU to identify its neighbors during the initial and maximize thetoefficiency of sensor spectrum, underlay overlay transmissions, which exploit the and communication used set up CR networks. In and order to fully utilize the primary spectrum white and grey spaces respectively, should be used together [1,10,11]. However, for spectrum maximize the efficiency of spectrum, underlay and overlay transmissions, which exploit the white and underlay, the SUs need to transmit at low power to avoid any interference with the PUs, whereas the grey spaces respectively, should be used together [1,10,11]. However, for spectrum underlay, the SUs PUs will cause interference with SUs [12]. In consideration of the low power spectrum density of need to transmit at low power to avoid any interference with the PUs, whereas the PUs will cause underlay waveforms, the control channel is designed to convey a small amount of control interference with which SUs [12]. In consideration of the the lowspectrum power spectrum density ofis underlay waveforms, information, is named as SUCCH, while overlay waveform used to exchange the control channel is designed to convey a small amount of control information, which is named as a large amount of data, which is named as SUTCH. Considering the perspective of a SU, the current SUCCH, while the spectrum overlay waveform is used to exchange a large amount of data, which is spectrum usage is depicted in Figure 3. named as SUTCH. Considering the perspective of a SU, the current spectrum usage is depicted in Figure 3.

Sensors 2016, 16, 1675

4 of 22

Sensors 2016, 16, 1675

i i Sensors 2016, 16, 1675 = = 1 2 i i = = 1 2

4 of 23

i = 3

i = 4

i = 3

i = 4

i = 5 i = 5

i = 6 i = 6

j = 1

i = 7



i = 7



j = 2

j = 1



j = 2



i = N i=7 N 7j = M j=4 M 4

i = N i=6 N 6j = M j=3 M 3

i = N i=5 N 5

i = N i=4 N 4j = M j=2 M 2

i = N i=3 N 3j = M j=1 M 1

i = N i=2 N 2j = M j = M

i = N i=1 N 1

SU TCH SU Figure 3. Spe ctral occupancy of the hybrid acce ss. PU Figure 3. Spectral occupancy TCH of the hybrid access. PU

i = N i = N

4 of 23

SU CCH SU CCH

Before explaining the protocol used to set up CR sensor networks, it is necessary to discuss the Figure 3. Spe ctral occupancy of the hybrid acce ss.

Before explaining theand protocol set up networks, it is necessary discuss capabilities of the SUs define used some to terms thatCR willsensor be used in the coming discussion. to A SU can the switch between spectra autonomously and sense the spectrum. Each SU identifies itself by using a can capabilities of the SUs and define some terms that will be used in the coming discussion. A SU Before explaining the protocol used to set up CR sensor networks, it is necessary to discuss the different Orthogonal Variable Spreading Factor (OVSF) [12] over spectrum underlay. The number of switch betweenofspectra sense the be spectrum. Each SU identifies capabilities the SUs autonomously and define someand terms that will used in the coming discussion.itself A SU by canusing the SUs in the current CR sensor networks issense a priori available to all theitself SUs.by a different Orthogonal Variable Spreading Factor (OVSF) [12] over spectrum underlay. Theusing number of switch between spectra autonomously and the information spectrum. Each SU identifies a TheOrthogonal proposed protocol isSpreading firstly discussed under [12] a distributed architecture scenario, whichofis different Variable Factor (OVSF) over spectrum underlay. The number the SUs in the current CR sensor networks is a priori information available to all the SUs. also called Multi-Hop Architecture. Each SU starts to send beacons in different OVSF over the in the current CR sensor networks is a initially priori information available to all the SUs. TheSUs proposed protocol is firstly discussed under a distributed architecture scenario, which is spectrum underlay to indicate its presence. At the mean time every SU monitors the spectrum TheMulti-Hop proposed protocol is firstly discussed under astarts distributed architecture scenario, which is over also called Architecture. Each SU initially to send beacons in different OVSF underlay randomlyArchitecture. selecting a form OVSF while initially starting a timer which counts to TS also calledby Multi-Hop Eachof SU initially starts to send beacons in different OVSF over spectrum underlay its presence. At the meanthe time every SU monitors spectrum underlay seconds. Ifunderlay nonetoofindicate those beacons captured during seconds, SUmonitors willthe change another spectrum to indicate its ispresence. At the meanTStime everytheSU the to spectrum by randomly selecting a form of OVSF while initially starting a timer which counts to T S seconds. form of OVSF in the next time slot. If a beacon is while received by selecting form counts of OVSF, underlay by randomly selecting a form of OVSF initially startingthe a current timer which to the TS If none of those beacons is captured during the T seconds, the SU will change to another form of SUs will sent a response in the same form which is considered as the task of carrying on the S seconds. If none of those beacons is captured during the TS seconds, the SU will change to another negotiations. After theIfcontrol information with each the other, the common SH in the two OVSF in the nextin time slot.time If aslot. beacon is received by selecting current form the SUs form of OVSF the exchanging next a beacon is received by selecting the current formof of OVSF, OVSF, the SUs will start to provide service. The procedure is simply illustrated in Figure 4. will sent a response in the same form which is considered as the task of carrying on the negotiations. SUs will sent a response in the same form which is considered as the task of carrying on the Afternegotiations. exchangingAfter the control information with each other, theeach common SH common in the two will start to exchanging the control information with other, the SHSUs in the two SU2 SU1 SUs service. will startThe to provide service. The procedure is simply illustrated in Figure 4. provide procedure is simply illustrated in Figure 4. Time Slot 1 select a form of OVSF to send a SU1 beacon Time Slot 1 select a form of Time OVSF Slot 2 to send a beacon OVSF not match change a form of Time Slot 2 OVSF OVSF not match change a form of OVSF

select a form of OVSF to send a SU2 beacon

RTS

RTS

RTS

RTS

RTS

RTS

RTS

RTS CTS

get a response Send ACK

select a form of OVSF to send a beacon OVSF not match change a form of OVSF OVSF not match change a form of OVSF OVSF matched a beacon is received Send a response OVSF matched a beacon is received Send a response

CTS ACK

Timeget Slota 3response Send ACK Start to bearer service

ACK

start to bearer service

Time Slot 3

Figure 4. The proce dure of ne twork se tup be twe e n two SUs. Start to bearer service start to bearer service Figure 4. The proce dure of ne twork se tup be twe e n two SUs.

Figure 4. The procedure of network setup between two SUs.

Sensors 2016, 16, 1675

5 of 22

In Figure 4, “Request to Send (RTS)” and “Clear to Send (CTS)” exchange messages to reserve a channel for communications in a similar manner that the IEEE 802.11 Distributed Coordination function (DCF) designs the MAC protocol [13]. RTS or CTS carries information about SUs’ lists of SH Sensors 2016, 1675 5 of 23 and accesses SUs16,states. In Figure 4, “Request to Send (R TS)” and “Clear to Send (CTS)” exchange messages to reserve a 3. Subchannel Selection Policies channel for communications in a similar manner that the IEEE 802.11 Distributed Coordination

Suppose wireless a frequency-selective Additive White about Gaussian Noise (AWGN), functionthe (DCF) designschannel the MACisprotocol [13]. RTS or CTS carries information SUs’ lists of SH and accesses the bandwidth is B SUs Hz, states. and the power spectral density is N0 . In this paper, it is divided into N Rayleigh fading subchannels, and the subchannel coherence bandwidth is ∆f Hz. Therefore, B = N∆f. These 3. Subchannel Selection Policies subchannels are indexed by i = 1, 2, . . . , N, and the gains of every subchannel are independent and Suppose the wireless identically distributed (i.i.d). channel is a frequency-selective Additive White Gaussian Noise (AWGN), the bandwidth is B Hz, and the power spectral density is N0. In this paper, it is divided into N Active PUs use DS-CDMA technology to access the spectrum band with spreading gain G. Rayleigh fading subchannels, and the subchannel coherence bandwidth is ∆f Hz. Th erefore, B = N∆f . According to the Central Limit Theorem, the interference process in the receiver of the SUs caused These subchannels are indexed by i = 1, 2, …, N, and the gains of every subchannel are independent by a large number ofdistributed PUs is considered a Gaussian approximation. Furthermore, according to the and identically (i.i.d). second-order statistics, theDS-CDMA interference processtoisaccess a white [14]. Therefore, in each subchannel, Active PUs use technology theprocess spectrum band with spreading gain G. According to the Central Limit Theorem, the interference process in the receiver of 1)N the SUs caused the average interference introduced by the PUs at the receiver of the SUs is (K − ∆f, K ≥ 1, where 0 by a large number of PUs is considered a Gaussian approximation. Furthermore, according to the K is a system parameter related to the characteristics of PUs network [15]. second-order statistics, the interference process is a white process [ 14]. Therefore, in each As shown in Figure 4, the SUs utilize NC-OFDM to access the SUTCH which is indexed by subchannel, the average interference introduced by the PUs at the receiver of the SUs is (K − 1)N0 ∆f, j = 1, 2, K. .≥. 1, , M, 0 ≤KM N. The SUs spread their SUCCH power of spectrum density where is a≤system parameter related to the characteristics PUs network [15]. over an ultra-wide bandwidth As to exploit underused defined as the interference threshold shown inthe Figure 4, the SUs(gray) utilize spectral. NC-OFDMQtoisaccess the SUTCH which is indexed by j = 1, of the 2, …, is M,the 0 ≤maximum M ≤ N. Th eallowable SUs spreadtemporal their SUCCH power spectrum over PUs, which interference in the density receiver of an theultra-wide PUs caused by bandwidth exploit the underused (gray) spectral. QAs is defined as thein interference of the concurrent activitytoof the SUs in the same subchannel. mentioned Figure 4, threshold the protocol to set up PUs, which is the maximum allowable temporal interference in the receiver of the PUs caused by CR sensor networks is based on the time-slot structure. Therefore, in order to satisfy the interference concurrent activity of the SUs in the same subchannel. As mentioned in Figure 4, the protocol to set threshold constraint, the power of the SUs accessing the SUTCH should be controlled in each time-slot. up CR sensor networks is based on the time-slot structure. Therefore, in order to satisfy the In interference this paper,threshold the structure of the system is depicted in Figure 5.beFor subchannel j, constraint, the accessing power of the SUs accessing the SUTCH should controlled j in each time-slot. the instantaneous gain between the transmitter and receiver of the SU is defined as gss , and the In this paper, the structure of the accessing system depicted in Figure 5. j, the as g j . instantaneous gain between the transmitter of the SU isand the receiver ofFor thesubchannel PU 𝑗is defined ps instantaneous gain between the transmitter and receiver of the SU is defined as 𝑔 , and the j j

𝑠𝑠 Subscripts s and p refer to the secondary and the primary user, respectively. The g and are 𝑗 g instantaneous gain between the transmitter of the SU and the receiver of the PU is defined ssas 𝑔𝑝𝑠 . ps assumed as the stationary and ergodic independent distributed random variables with 𝑗 𝑗 unit-mean. Subscripts s and p refer to the secondary and the primary user, respectively. The 𝑔𝑠𝑠 and 𝑔𝑝𝑠 are j j j j Their Probability Functions (PDFs) are defined as f ss ( gssrandom ) and fvariables Channel assumed asDensity the stationary and ergodic independent distributed with unit-mean. ps ( g ps ), respectively.

j

𝑗

j

𝑗

𝑗

𝑗

Density defined as we 𝑓𝑠𝑠 (𝑔suppose (𝑔𝑝𝑠perfect ) , respectively. gains gTheir g ps are i.i.d., j =Functions 1, 2, . . . (PDFs , M. )Inare this paper, the Channel Side 𝑠𝑠 ) and 𝑓𝑝𝑠 ss andProbability 𝑗

𝑗j

Channel gainspair 𝑔𝑠𝑠 (g and, 𝑔g are i.i.d., 1, 2, …, M.inInthe this transmitters. paper, we suppose the perfect Channel Information (CSI) bej =available Here, the CSI contains the ss 𝑝𝑠ps )𝑗 can 𝑗 Side Information (CSI) pair (𝑔𝑠𝑠 , 𝑔𝑝𝑠 ) can be available in the transmitters. Here, the CSI contains the probability distribution of the channel gain, as well as the actual value at a certain time-slot. Actually, probability distribution of the channel gain, as well as the actual value at a certain time-slot. the CSI pair can be estimated by a spectral coordinator or proper signaling. Note that, the result Actually, the CSI pair can be estimated by a spectral coordinator or proper signaling. Note that, the derivedresult fromderived this assumption is an upper-bound in thein case a perfect CSI from this assumption is an upper -bound the without case without a perfect CSIpair. pair. j

Cross-Sub-Channel 𝑗

g 𝑝𝑠

Secondary-Sub-Channel 𝑗

g 𝑠𝑠

Primary user Receiver

Secondary user Receiver Secondary user Transmitter

Figure 5. The structure of the acce ss ing syste m for subchanne l j.

Figure 5. The structure of the accessing system for subchannel j. In this paper, we focus on the maximum achievable spectrum capacity of SUTCH, which is studied [16,17]. Since more SUs will compete to access to the underused band. this paper, we focus onthan theone maximum achievable spectrum capacity frequency of SUTCH, which

In is studied [16,17]. Since more than one SUs will compete to access to the underused frequency band. The SUs’ total available spectrum capacity is upper-bound by the case of only one SU, which is due to

Sensors 2016, 16, 1675

6 of 22

the fact that SUs will impose interference on each other. Therefore, the discussion of the individual SU can also be used as the upper-bound of the total spectrum capacity of all SUs. At a given time-slot, the power allocation policy of SUTCH is defined as ρψ , which is based on a selection criterion ψ(,..,), and set:   ∆

j

j

µ j = ψ g ps , gss

(1)

For the observing random variables µj , j = 1, . . . , M, the selection sequence γM is defined as follows: ∆ γ M = ( µ r1 , µ r2 , . . . , µ r M ) = ρ ψ ( µ 1 , µ 2 , . . . , µ M ) (2) The M-tuple selection sequence is arranged, so that its first element is the most suitable subchannel for SUTCH based on the selection criteria in Equation (1). The probability distribution function of random variable γj is defined as kj (γ), j = 1, . . . , M. It is important to note that if j1 , j2 are entities in γM and j1 < j2 , then it can be considered that compared to the choice j2 , the SUs can get a better performance by choosing subchannel with index j1 . j j Suppose ψ( g ps , gss ) is constant, which means subchannels are considered equally. The SUs will randomly choose M out of N subchannels without any a priori information. This selection strategy is defined as the uniform subchannel selection, whereas, if the prior information of the subchannel obtained by cooperation or other techniques is −1, the SUs will choose the corresponding value of j j ψ( g ps , gss ). This selection strategy is defined as the non-uniform selection strategy. The transmission power of the SUTCH in the subchannel j is referred to Psj . Ps (Ps1 , . . . , PsM ) is defined as the transmission power vector of SUTCH over M subchannels. Suppose that SUTCH accesses to the chosen subchannel j with the transmission power of Psj , and the corresponding interference at the receiver of the PUs is Qj , where: j

Q j = g ps Psj

(3)

Since the PUs utilize DS-CDMA with spreading gain G, therefore, the narrow-band interference Qj spreads over the whole bandwidth and manifests itself as an equivalent wide-band interference equal to G−1 Qj at the receiver of the PUs. Suppose the SUTCH transmits with the transmission power vector Ps (Ps1 , Ps2 , . . . , PsM ) in M accessible subchannels. Correspondingly, an equivalent narrow-band interference vector Q = (Q1 , Q2 , . . . , QM ) will be imposed on the receivers of the PUs. Meanwhile, the SUCCH transmits with the transmission power vector Psc (Psc1 , Psc2 , . . . , PscN ). Therefore, in order to comply with the interference threshold Q of the PUs, the constraint function is as follows: M

1 G



j gss Psj

j =1

N

+∑

! gips Psci

≤Q

(4)

i =1

In this paper, the objective is to achieve the maximum capacity of SUTCH. As discussed above, the transmitting power of SUTCH in each accessible subchannel should be optimally allocated. Meantime, the interference threshold constraint should also be considered. Consequently, according to selection policy ρψ , for a given Q and for M accessible subchannels, the maximum capacity of SUTCH ψ

is defined as C M , which can be obtained by the following constrained optimization problem: ψ

CM

s.t.

M

= max ∑ ∆ f

R

j

 log 1 +



gss Psj j

KN0 ∆ f + gss Pscj j j  g ps gss  j j j j j j × f ss gss f ps g ps dgss dg ps Ps j=1

1 G

M



j =1

j g ps Psj

M

N

j =1

i =1

N

+ ∑

i =1

(5)

! gips Psci

∑ Psj + ∑ Psci ≤ Ps

≤Q

Sensors 2016, 16, 1675

7 of 22

where, Q is the interference threshold of the PUs, which is the maximum allowable temporal interference in the receiver of the PUs caused by concurrent activity of the SUs in the same subchannel. Ps N0 is the power spectral density, ∆f is the subchannel coherence bandwidth. K is a system parameter related to the characteristics of PUs network [15] within the range of 2–8. Equation (5) is derived from Shannon’s Capacity formula with the SUs power vector Ps and Psc . Equation (6) is the constraint function of interference threshold of the PUs and maximum transmitting power of the SUs. Actually, in contrast to the constraint of maximum transmitting power of the SUs, the constraint function of interference threshold of the PUs is much tighter [18]. Therefore, in this paper, the constraint of maximum transmitting power of SUs is not considered. At the same time, as mentioned above, the SUCCH spreads over an ultra-wide bandwidth to exploit the underused spectrum with a very low PSD, therefore, the interference caused by SUCCH is very low. In this paper, in order to simplify the analysis, the effect of SUCCH will not be considered, and Equation (5) will be further simplified as follows:   j M R gss Psj ψ C M = max ∑ ∆ f log 1 + KN ∆ f Ps j=1

× f ss 1 G

∑ g ps Psj

j

s.t.

j

0

j

g ps gss  j j j j j gss f ps g ps dgss dg ps !



M

j

(6)

≤Q

j =1

  j j Suppose ψ g ps , gss = 1, thus the SUs will randomly choose M out of N subchannels without any priori information by ρ1 , which is a uniform subchannel selection policy. Consequently, substituting j Psj = Q j /g ps , j = 1, . . . , M and defining θQ j , Q j /KN0 ∆ f Equation (6) can be simplified as follows: ρ

C M1

  M R  = max ∑ ∆ f log 1 + νj θQ j h j νj dνj Q j =1

νj

(7)

M

∑ Q j = GQ,

s.t.

j =1 j

0 ≤ Q j ≤ GQ

j

where v j , gss /g ps , 0 ≤ vj ≤ ∞, v j is the reward factor of the subchannel j. θQ j is defined as the spectrum sharing load factor of the subchannel j. q q j j j Suppose the statistics characteristics of g ps , gss is i.i.d. Rayleigh random variables, g ps and j

gss are exponentially distributed random variables with unit-mean, therefore, the PDF of v j can be converted into [17]: j

h j (νj )

=

R ∞ R g ps v j − g j − g j j j d e ps e ss dg ps dgss 0 dv j 0 j

= = = = =

g

j

− g ps jss j j − g jps g ps dg ps e 0 g ps e R ∞ j − g j (1+ ν ) j j dg e ps ps  0 g ps  j ∞ j − g ( 1 + − 1+1νj g ps e ps νj )   ∞ 0 j − g ps (1+νj ) 1 e − 2 ( 1 + νj ) 0 1 0 < νj < ∞ 2 (1+νj )

R∞



R∞ 0

e

j

− g ps (1+νj )

j dg ps



(8)

Substituting Equation (9) into Equation (7), and integrating by part, Equation (10) can be gotten ρ as follows, which is the simplified optimization problem of C M1 :

Sensors 2016, 16, 1675

8 of 22

M

θQj

j =1

θQj − 1

C M1 = max ∑ ∆ f ρ

θQ

  log θQ j (9)

M

s.t.

∑ θQj = GNθQ ,

0 ≤ θQ j ≤ GNθQ

j =1

where, θQ is defined as the spectrum sharing load factor, and θQ = (θQ1 , θQ2 , . . . , θQM ) is defined as the spectrum sharing load vector: ∆

θQ =

Q Q = KN0 N∆ f KN0 B

(10)

Furthermore, the following pseudo linear approximation is used to get an approximate solution for Equation (10) [16]: x log( x ) ≈ −1.2015 − 0.0052x + 1.0772 × log (3.0262x + 308829) x−1

(11)

Substituting Equation (12) into Equation (10), the Lagrangian function of the optimization problem Equation (10) is shown as follows [19,20]: L(θQ , λ) =

  M ∑ −1.2015 + −0.0052 × θQ j + 1.0772 × log 3.0262 × θQ j + 3.8829 j =1 !

−λ

M

(12)

∑ θQ j − GNθQ

j =1

where λ is the Lagrangian coefficient. The derivative with respect to the θQ j on Equation (13) is taken, and then it is equal to zero, the following formula can be obtained: ∗ θQ = j

1.0772 3.8829 − . λ∗ + 0.0052 3.0262

(13)

Substituting Equation (14) into Equation (10), the following formula can obtained: M





j =1

 1.0772 3.8829 − = GNθQ λ∗ + 0.0052 3.0262

(14)

Equivalently, Equation (16) can be derived from Equation (15): λ∗ = −0.0052 +

1.0772 GNθ Q M

+

3.8829 3.0262

.

(15)

Eventually, substituting Equation (16) into Equation (14) gives: θθ∗j =

GNθQ , M

j = 1, 2, . . . , M

(16)

Note that Equation (17) suggests that for given G, N, M and θ Q , the maximum capacity is achieved by dividing the total acceptable interference GNθ Q into equal portions for M accessible subchannels. Actually, it is a direct consequence of selecting M out of N subchannels without any prior knowledge. Furthermore, according to Equation (3) and θQ j , Q j /KN0 ∆ f , the optimal transmitting power vector P∗s can be obtained as follows: P∗s =

1 GQ 1 GQ 1 GQ , ,..., M g1ps M g2ps M g ps M

! (17)

Sensors 2016, 16, 1675

9 of 22

Equation (18) suggests that the interference share for each accessible subchannel j, θQ j is mapped j

j

to the corresponding transmission power Psj , proportional to 1/g ps . So, if g ps is large, then the SUs will creates a large interference in the receivers of Pus. In this case, Equation (18) suggests a lower SUs transmission power in accessible subchannel j. Equivalently, substituting Equation (18) into Equation (10), Equation (19) can be derived: ρ C M1

GNθQ ≈ M∆ f log GNθQ − M



GNθQ M

 (18)

In a practical case, Q = G−1 N0 B and M < N, the spectrum sharing load factor can be obtained from Equation (17) as θθ j = N/KM, which is much higher than unity. As mentioned above, ρ1 randomly choose subchannels, which ignores the fact that it is more reasonable for the SUs to allocate higher transmission power to certain subchannels because of their corresponding CSIs, so it is essential to discuss the non-uniform selection policy for SUTCH with j j a prior knowledge of CSIs pair (gss , g ps ), since it will lead to a larger capacity or a smaller interference on the PUs. Actually, an appropriate selection policy should consider the interference of the PUs receivers j caused by SUs transmission. Such policy should select the lower subchannel gain of g ps , because it j

will create a lower interference in the receivers of the PUs. Therefore, a lower g ps will give the SUs the flexibility of allocating a higher power, which will result in a higher capacity. Such a selection policy is named as SU-PU-based selection policy, which is simplified as ρ ps . In order to implement ρ ps , the SUs j

requires g ps during each time-slot. Therefore, a signaling channel between the receivers of the PUs and the transmitters of the SUs is required. Similar to ρ ps , another selection policy can be derived. It will select those subchannels which achieve the highest capacity corresponding to allocating the transmitting power of SUs. Such policy j selects the subchannel with the higher gss , because it will create a higher power in the receivers of the SUs. Such selection is name as SU-SU-based selection policy, which is simplified as ρss . In order to j implement ρss , the SUs requires gss during each time-slot. Therefore, a signaling channel between the receivers of the SUs and the transmitters of the SUs is also required. In the following, the maximum capacity is derived with different selection policy ρ ps and ρss . Considering ρ ps , the selection criteria can be assumed as follows:   j j j ψ g ps , gss = g ps

(19)

j

Consequently, µ j = g ps and based on µj , j = 1, 2, . . . , M, the selection sequence is defined as follows: ∆ γ M = (µ1 , µ2 , . . . , µ M ) = ρ ps (µ1 , µ2 , . . . , µ M ) (20) where µ1 ≤ µ1 ≤ . . . ≤ µM . Using order statistics [21], the probability distribution function of µj , ∀j is shown as follows:  N−j j −1 k j (µ) = Nj Fµ (µ) 1 − Fµ (µ) f µ (µ) (21) where:



Nj =

N! , ( j − 1) ! ( N − j ) !

(22)

and fµ (µ), Fµ (µ) are the probability density function and probability distribution function of µ. Assuming the same assumption as discussed above in Equation (9) we obtain: f µ (µ) = e−µ , Fµ (µ) = 1 − e−µ .

(23)

Sensors 2016, 16, 1675

10 of 22

Equivalently: k j (µ) = Nj (1 − e−µ )

j −1 − µ ( N − j +1)

e

.

(24)

Using a binomial expansion to replace (1 − e−µ )j −1 in Equation (25) gives: j −1

j −1 − µ ( N − l )

k J (µ) = Nj ∑ Fl

e

,

(25)

l =0

! j−1 where, , (−1) j−1−l . l Thus, the optimization problem of maximizing the capacity of SUTCH, while satisfying the tolerable interference constraint of the PUs with selection policy ρ ps is shown as follows: j −1 Fl

ρ ps CM

s.t.

M j −1

= max ∑ ∑

θQ j=1 l =0

h i j−1 θ Q j log ( N −l )θ Q j ∆ f Nj Fl , ( N − l ) θ Q −1 j

M

∑ θQ j = GNθQ j ,

(26)

0 ≤ θQ j ≤ GNθQ .

j =1

However, in practice, M < N, thus, NθQ j  1. Therefore, Equation (27) can be approximated by Equation (28): h i j −1 M j −1 ρ ps NF C M ≈ max ∑ ∑ ∆ f Nj −l l log ( N − l ) θQ j , θQ j=1 l =0 (27) M s.t. θ = GNθ 0 ≤ θ ≤ GNθ . ∑ Qj Qj , Qj Q j =1

The Lagrange multiplier algorithm can be used to solve the optimization problem in Equation (28) [19]: h i M j −1 N F j −1 L(θQ j , λ) = ∑ ∑ Nj −l l log ( N − l ) θQ j j =1 l =0 ! (28) M −λ ∑ θQ j − GNθQ j =1

where, λ is the Lagrangian coefficient. Taking the derivative with respect to the θQ j on Equation (29) and setting it equal to zero gives: ∗ θQ = j j −1

j −1

where, v j , ∑l −0 Nj Fl

1 υ, λ∗ j

(29)

/N − l. Substituting Equation (30) into Equation (28): λ∗ =

M 1 υj . GNθQ j∑ =1

(30)

Substituting Equation (31) into Equation (30): ∗ θQ = GNθQ j

υj M

.

(31)

∑ υj

j =1

Furthermore, according to Equation (3) and θQ j , Q j /KN0 ∆ f , the optimal transmitting power vector P∗s with selection policy ρ ps can be obtained as follows:

Sensors 2016, 16, 1675

11 of 22

P∗s

=

GQ

υ2

υ1

υM

!

, j ,..., j j g ps g ps g ps

M

∑ υj

(32)

j =1

Equivalently, substituting Equation (33) into Equation (28) yields the approximated maximum achievable capacity of the SUTCH with selection policy ρ ps , which is shown in Equation (34):  ρ ps

CM ≈

M j −1

∑∑

j =1 l =0

j −1

∆ f Nj Fl N−l



 υj    log ( N − l ) GNθQ  M   υ ∑ j

(33)

j =1

Considering ρss , the selection criteria can be assumed as follows:   j j j ψ g ps , gss = gss

(34)

j

Consequently, µ j = gss and based on µj , j = 1, 2, . . . , M, the selection sequence is defined as follows: ∆ γ M = (µ1 , µ2 , . . . , µm ) = ρss (µ1 , µ2 , . . . µm ). (35) where µ1 ≥ µ2 ≥ . . . ≥ µM . Using order statistics [21], the probability distribution function of µj , ∀j is shown as follows:   j −1 N−j k j (µ) = Nj Fµ (µ) 1 − Fµ (µ) f µ ( µ ), (36) Using a binomial expansion to replace (1 − e−µ )N−j in Equation (37) one obtains: N−j

N − j −µ(l + j)

k J (µ) = Nj ∑ Fl

e

,

(37)

l =0

! N−j where , (−1)l . l Thus the optimization problem of maximizing the capacity of the SUTCH while satisfying the tolerable interference constraints of the PUs with selection policy ρ ps is shown as follows: N−j Fl

ρ C Mss

s.t.

M N−j

= max ∑ ∑ ∆ f θ Q j =1 l =0

θQ

j l+j

N−j

Nj Fl l+j

θQ

j l+j

M

∑ θQ j = GNθQ j ,

−1

 log

θQ

j



l+j

(38)

0 ≤ θQ j ≤ GNθQ .

j =1

Utilizing the following approximation for small values of θQ j /l + j, l = 0, 1, . . . , N − j as: ∗ θQ = GNθQ j

χ2j M

(39)

∑ χ2j

j =1

where: ∆

χj =

N−j

N−j



l =0

Nj

Fl

2(l + j)3/2

(40)

Furthermore, according to Equation (3) and θQ j , Q j /KN0 ∆ f , the optimal transmitting power vector P∗s with selection policy ρss can be achieved as follows:

Sensors 2016, 16, 1675

12 of 22

P∗s

GQ

=

M

∑ χ2j

χ2M χ21 χ22 , , . . . , j j j gss gss gss

! (41)

j =1

Equivalently, substituting (41) into (39) yields the approximated maximum achievable capacity of the SUTCH under SU-SU-based selection policy is shown as follows: 1/2

 M N−j

ρ

C Mss ≈

∑∑

N−j

∆f

j =1 l =0

Nj Fl

(l + j)3/2

 χ2j    GNθ   Q M   ∑ χ2j

(42)

j =1

4. Reinforcement Learning for Improving Performance In Section 3, the maximum achievable capacity of the SUTCH is analyzed. In Section 4, the reliability of the SUCCH is taken into consideration by the Bit Error Rate (BER). Suppose the signal waveform of the SUCCH is as follows: (

√ s1 ( t ) = ε b √ s2 ( t ) = − ε b

(43)

Suppose the two signal waveforms in Equation (45) are transmitted with the same probability. Since the SUCCH spreads its power spectrum density over an ultra-wide bandwidth to exploit the underused (gray) spectral regions, the interference process caused by the PUs and the SUCCH can be considered as a Gaussian approximation. If the SUCCH transmits s1 (t), after the despread-demodulation algorithm at the receiver of the SUCCH, the received signal is as follows: r=



εb +

1 GSUCCH

n+

Y

M

y =1

j =1

∑ σPU + ∑ σSUCCH

! (44)

where n is additive Gaussian white noise with mean zero, variance N0 /2 and σPU , σSUCCH represent the interference caused by the PUs and the SUTCH. GSUCCH is the spreading gain of the SUCCH. The receiving signal of the SUCCH is compared with the threshold zero, which is as follows: S1



r < 0.

(45)

s2

Suppose the PUs and the SUCCH are i.i.d. random processes, then two probability density functions of r are given as follows: p (r | s1 ) =

p (r | s2 ) =

1 v u u t

2π GSUCCH

Y M N0 2 2 2 + ∑ σpu + ∑ σSUTCH y =1 j =1

1 v u u t

2π GSUCCH

Y M N0 2 2 2 + ∑ σpu + ∑ σSUTCH y =1 j =1

!e

!e

√ −(r − ε b )2 /N0

√ −(r + ε b )2 /N0

Consequently, the average error probability of the SUCCH is as follows:

(46)

Sensors 2016, 16, 1675

13 of 22

Pe

1 2

["P (e|s1 ) + P (e|s2 )] # + R0 R∞ 1 =2 p (r |s1 )dr + p (r |s2 )dr −∞ 0    v u GSUCCH ε b  u = Q Y M  t =

N0 2 +

(47)

2 + 2 ∑ σPU ∑ σSUCCH

y =1

j =1

Suppose the control information of the SUCCH consists of 8 bits. According to Figure 4, the transmitter and receiver of the SUs need to coordinate access to the spectrum three times. Therefore, the probability of successful establishment for the SUCCH can be concluded. Furthermore, the total interference caused by the SUs is divided into two parts: QSUTCH and QSUCCH . QSUTCH represents the interference caused by the activity of the SUTCH, while QSUCCH represents the interference caused by the activity of the SUTCH. The loading factor Г is defined as the radio of QSUTCH and QSUCCH , which is as follows: ∆ Q Γ = SUCCH , 0 < Γ < 1 (48) QSUTCH In consideration of the link access protocol design described above and the probability of successful establishment for SUCCH, the lower PSD of SUCCH means it may take more time to complete the setup procedure for the SUs. In other words, accessible subchannels will remain idle for a long period of time, which will lead to spectrum resource waste. However, increasing the transmitting power of the SUCCH will decrease the transmitting power of the SUTCH, because of the total interference constraint caused by the SUs is certain at a time-slot. Lower transmitting power of the SUTCH will lead to reduce the capacity of data. Therefore, it’s a trade-off, which is essential to choose the appropriate transmitting power of SUTCH according to the characteristic of the activity of the PU. For this purpose, a hybrid access method based on Reinforcement Learning model is proposed to solve this problem. The most prominent feature of Reinforcement Learning model is its autonomous learning and online learning ability. By trial and error, Reinforcement Learning model can get a better strategy based on the subchannel environment. The Cross model [22] is now widely recognized as one of the Reinforcement Learning models with memory-less characteristics, which means the learning process is a Markov Decision Process (MDP). The basic idea is to follow the rules of “Results” [23], namely, if system is rewarded by choosing a strategy, then the next period will get higher probability of choosing such strategy. On the contrary, if it is punished, the next period will reduce the probability of choosing such strategy. Bush and Mosteller [24] introduced the Bush-Mosteller model in 1955 [25]. Afterwards, Roth and Erev improved this model and introduced the Roth-Ever model. Nowadays, as two models of reinforcement learning, both of them [26] are widely adopted. They are easy to realize and have very low computation complexity, which fit for the real-time applications. Therefore, in this paper, these two models are introduced and some necessary modifications are adopted for the application, so the model of MDP Cross and Statistical Mean are proposed. As mentioned above, the process of connection setup is defined as the time-slotted. The optional strategies for the SUs are defined as follows: Asu = (Γ1 , Γ2 , . . . , Γn , . . . , Γn0 , Γ R )

(49)

where Asu is the vector of optional strategies, R the number of the strategy, n is the chosen strategy and n0 are not chosen strategies in a certain time-slot. Consequently, during the time-slot k to access the initial stage, the SUs can update the probability of choosing strategy n and n0 by the following formula:

Sensors 2016, 16, 1675

14 of 22

pn (k + 1) = pn (k) + R [u (k)] × (1 − pn ) 0 0 0 pn (k + 1) = pn (k ) − R [u (k)] × pn (k ) R [u (k)] = α × u (k) + β

n = Asu (k) n0 6= Asu (k)

(50)

where Asu (k) is the accessible strategy of the SUs at the time-slot k, which can be seen the action of MDP. pn (k) is the probability of the accessible strategy n of the SUs at time-slot k, pn (k) is the probability of the unused strategy n0 of the SUs at time-slot k, which can be seen as the state of MDP. u(k) is the reward function of the accessible performance of the SUs, which can be seen as the reward of MDP. α and β are the adjustment factors, which can be used to determine the updating rate of u(k). R[u(k)] is defined as the monotone function of u(k), which is −1 < R[u(k)] < 1. When the SUTCH successfully accesses idle subchannels, it obtains the reward, which is defined as follows: ∂1 I (k) CSUTCH (k) T (k)

(51)

where, T(k) is the transmission duration of the SUs in time-slot k and ∂1 is a weighting factor and I(k) is indicator function, which is defined as follows: ( I (k ) = 1 SUTCH successfully access at time-slot k (52) I (k ) = 0 SUTCH fail to access at time-slot k When the SUTCH fails to access the idle subchannels, it wastes the opportunity for transmission and pays the cost, which is shown as follows:

− ∂2 I (k) CSUTCH (k) T 0 (k)

(53)

where, T0 (k) is the access duration of the SUs and ∂2 is also a weighting factor. Equivalently: u (k) = ∂1 I (k) CSUTCH (k ) T (k) − ∂2 I 0 (k) CSUTCH (k) T 0 (k )

0 ≤ ∂i ≤ 1, i = 1, 2

(54)

In order to weaken the impact of weighting on updating the probability of the choosing strategy, Equation (52) can be further defined as follows: pn (k + 1) = pn (k) + ε × [1 − pn (k)] pn (k + 1) = pn (k) − ε × phn (k) i 0 0 0 p n ( k + 1) = p n ( k ) + ε × 1 − p n ( k ) 0

0

0

p n ( k + 1) = p n ( k ) − ε × p n ( k )

n = Asu (k) , u (k) > 0 n = Asu (k ) , u (k) < 0 n0 6= Asu (k) , u (k) < 0

(55)

n0 6= Asu (k) , u (k) > 0

where, ε = R[u(k)] = α × u(k)+ β. The solution to update the probability of choosing strategy is the model of MDP Cross. If the u(k) > 0, which means the accessible strategy n is fit for the current 0 subchannel environment. Therefore, the pn (k + 1) should be increased, while the pn (k + 1) should be decreased. However, if the u(k) < 0, which means the accessible strategy n is not fit for the current 0 subchannel environment, therefore, the pn (k + 1) should be decreased, while the pn (k + 1) should be increased. In practice, the probability of choosing a strategy is usually not only dependent on the latest result, it also takes the “system history” into account. “System history” presents users with more information about the status of environment. In order to incorporate the “system history”, the Statistical Mean is proposed, in which the reward function is modified as follows: n ( k ) /F n pnsuc (k) = Fsuc access ( k ) n n n p f ail (k ) = Ff ail (k) /Faccess (k)

u (k) = ∂1 pnsuc (k) − ∂2 pnfail (k)

(56)

Sensors 2016, 16, 1675

15 of 22

n ( k ) represents the amount of data traffic which SUTCH has transmitted based on strategy n where, Fsuc n at time-slot k, Faccess (k) and Ffnail (k) are the idea and wasted amount, respectively. Therefore the probability of choosing a strategy in the Statistical Mean is shown as follows:

pn (k + 1) = pn (k) + ε × [1 − pn (k)] pn (k + 1) = pn (k) − ε × phn (k) i 0 0 0 p n ( k + 1) = p n ( k ) + ε × 1 − p n ( k )  0 0 Sensors 2016, 16, 1675 n0 pn (k + 1) = p ( k ) − ε × p n ( k )

n = Asu (k), ∀ j, j 6= n, un (k + 1) > u j (k + 1) n = Asu (k), ∃ j, j 6= n, un (k + 1) ≤ u j (k + 1) 0

n0 6= Asu (k), un (k + 1) ≤ un (k + 1) 0

n0 6= Asu (k ), un (k + 1) > un (k + 1)

(57) 16 of 23

5. Simulation Study 5. Simulation Study In this section, the achievable spectrum efficiencies with different subchannel selection policies In this section, the achievable spectrum efficiencies with different subchannel selection policies are compared. Here, the spectrum sharing load factor is θQ = −30 dB and the number of subchannels are compared. Here, the spectrum sharing load factor is𝑗 θ Q𝑗= −30 dB and the number of subchannels is is N = 40. The mean values of random variables j j 𝑔𝑝𝑠 , 𝑔𝑠𝑠 are denoted by 𝜆 𝑝𝑠 , 𝜆 𝑠𝑠 , respectively. The N = 40. The mean values of random variables g ps , gss are denoted by λ ps , λss , respectively. The achieved achieved spectrum efficiency is defined as follows: spectrum efficiency is defined as follows: ρ

 ρψ / M f C C

(58) (58)

 C M Cρψ ρ= M /M∆ f

Here, in order to facilitate the comparison, 𝐶 is defined as the achieved spectrum efficiency Here, in order to facilitate the comparison, Cρ1𝜌1is defined as the achieved spectrum efficiency with uniform subchannel selection, 𝐶𝜌𝑠𝑠 is defined as the achieved spectrum efficiency with the with uniform subchannel selection, Cρss is defined as the achieved spectrum efficiency with the SU-SU-based selection policy, 𝐶 𝑝𝑠 is defined as the achieved spectrum efficiency with the SU-SU-based selection policy, Cρ ps is 𝜌defined as the achieved spectrum efficiency with the SU-PU-based SU-PU-based selection policy.selection policy. thefirst first simulation, suppose interference threshold is a constant 𝜆 ,=and 𝜆 𝑠𝑠,the and InInthe simulation, suppose thethe interference threshold is a constant and λand Cρthe ps = λ𝑝𝑠 ss ψ 𝐶 is analyzed by increasing M, which is depicted in Figure 6. 𝜌𝜓 is analyzed by increasing M, which is depicted in Figure 6. 3.5

Achieved Spectral Efficiency(bit/s/Hz)

C1 Css

3

Cps 2.5

2

1.5

1

0.5

0

2

4

6

8

10

M Figure 6. Achie ve d spectral spe ctralefficiency e fficie ncyofofthe the SUTCH with threselection e se le ction policie s with Achieved SUTCH with three policies with M. M.

Asdepicted depictedininFigure Figure6,6,C𝐶ρ 𝜌1isislower lowerthan thanthat thatofofC𝐶 and 𝐶 , therefore, it indicates that ρ1 As ρss𝜌𝑠𝑠and Cρ ps𝜌,𝑝𝑠therefore, it indicates that ρ1 1 hasa apoorer poorerperformance performancecompared comparedtotoρρ andρρpsps.. For M = 1, the gap andC𝐶ρss𝜌𝑠𝑠isislarge. large. has gap between C𝐶ρ𝜌11 and ssssand However, with the increase of M, the gap is reduced. This result is reasonable because the tap However, with the increase of M, the gap is reduced. This result is reasonable because the tap is relatedis to the M/N and the larger tap is. The forM/N, a larger torelated the M/N ratio, andratio, the larger M/N, theM/N, lowerthe thelower tap is.the The reason is reason that forisa that larger the M/N, set of M subchannels byC𝐶ρ𝜌ss1 probably and 𝐶𝜌𝑠𝑠 has probably a large overlap. ofthe M set subchannels accessibleaccessible by Cρ1 and a large has overlap. Withthe the increase of the M, rate the rate of decrease ρps is reduced with the slowest This is With increase of M, of decrease of ρps of is reduced with the slowest rate. Thisrate. is mainly mainly due tothat the fact that the total interference of the receivers of the is a constant. due to the fact the total interference thresholdthreshold of the receivers of the PUs is a PUs constant. At the j 𝑗 same time, ρps selects these subchannels with the lower g psthe , which enables the SUsenables transmitters to At the same time, ρ p s selects these subchannels with lower 𝑔𝑝𝑠 , which the SUs send the maximum transmitting power, without generating high interference on the receivers of transmitters to send the maximum transmitting power, without generating high interference onthe the receivers of the Pus and satisfying the constraint of the interference threshold of the PUs. According to Figure 6, for a large number of accessible subchannels with constant interference constraint, ρps achieves a better performance. In the second simulation, the influence of the number of subchannels N is analyzed. Suppose M = 1, 𝜆𝑝𝑠 = 𝜆 𝑠𝑠 , the 𝐶𝜌 is analyzed by increasing N. The result is depicted in Figure 7.

Sensors 2016, 16, 1675

16 of 22

Pus and satisfying the constraint of the interference threshold of the PUs. According to Figure 6, for a large number of accessible subchannels with constant interference constraint, ρps achieves a better performance. In the second simulation, the influence of the number of subchannels N is analyzed. Suppose 17 M =Sensors 1, λ ps2016, = λ16, the Cρψ is analyzed by increasing N. The result is depicted in Figure 7. ss ,1675 Sensors 2016, 16, 1675 17 of of 23 23

AchievedSpectral SpectralEfficiency(bit/s/Hz) Efficiency(bit/s/Hz) Achieved

3.5 3.5

C C11 C Css

33

ss

C Cps ps

2.5 2.5 22 1.5 1.5 11 0.5 0.50 0

22

44

M M

66

88

10 10

Figure Achie spe ee fficie the SUTCH unde thre se policie with N. Figure7.7. 7.Achieved Achie ve ve d dspectral spe ctral ctralefficiency fficie ncy ncyofof ofthe theSUTCH SUTCHunder unde rrthree thre eeselection se le le ction ction policie sswith withN. N. Figure policies

As As seen seen in in Figure Figure 7, 7, for for all all the the different different subchannel subchannel selection selection policies, policies, the the 𝐶 𝐶𝜌𝜌𝜓 increases increases with with

As seen inof Figure 7, for all the different subchannelselecting selection policies, the Cρ𝜓ψ increases with the for the increase increase of N. N. This This is is because because that that the the probability probability of of selecting proper proper subchannels subchannels for SUTCH SUTCH is is the increasing increase ofwith N. This is because that the probability of selecting proper subchannels for SUTCH is increasing with N. N. Furthermore, Furthermore, it it is is interesting interesting to to find find that that the the gap gap between between these these three three selection selection increasing with N. Furthermore, it is interesting to find that the gap between these three selection policies policies also also increases increases with with the the increase increase of of N N and and ρ ρssss outperforms outperforms the the others others in in this this simulation. simulation. policies also increases with the increase of N and ρ outperforms others in this simulation. ss In the third simulation, both the influences of g ps and g ss arethe evaluated. Suppose In the third simulation, both the influences of g ps and g ss are evaluated. Suppose N N == 40, 40, M M == 1. 1. In the third simulation, both the influences of g and g are evaluated. Suppose N = 40, M = 1. The 𝐶 is analyzed with 𝜆 /𝜆 for different θ Q values. The simulation result is depicted in Figure ps ss 𝑠𝑠 The 𝐶𝜌𝜌𝜓𝜓 is analyzed with 𝜆 𝑝𝑠 /𝜆 for different θ Q values. The simulation result is depicted in Figure 𝑝𝑠 𝑠𝑠 The8.Cρψ is analyzed with λ ps /λss for different θ Q values. The simulation result is depicted in Figure 8. 8.

AchievedSpectral SpectralEfficiency(bit/s/Hz) Efficiency(bit/s/Hz) Achieved

80 80 70 70

C C11 Q=-30dB Q=-30dB

60 60

C Q=-30dB Css Q=-30dB ss

50 50

C Q=-30dB Cps Q=-30dB ps C C11 Q=-20dB Q=-20dB

40 40

C Q=-20dB Css Q=-20dB ss C Q=-20dB Cps Q=-20dB ps

30 30 20 20 10 10 00 -10 -10

-5 -5

00

 ps// ss ps

55

10 10

ss

Figure 8. Achie spe ee fficie of the SUTCH unde thre se policie ss with 𝜆𝜆𝑝𝑠 /𝜆 . Figure Achie ve ve d dspectral spe ctral ctral fficie ncy ncy the SUTCH unde rrthree thre eeselection se le le ction ction policiewith withλ ps /𝜆ss𝑠𝑠 𝑝𝑠 𝑠𝑠. . Figure 8. 8.Achieved efficiency ofof the SUTCH under policies /λ

As As depicted depicted in in Figure Figure 8, 8, it it is is clearly clearly observed observed that that the the 𝐶 𝐶𝜌𝜌𝜓𝜓 of of the the SUTCH SUTCH decreases decreases with with the the increase of 𝜆 /𝜆 . Meantime, the 𝐶 of the SUTCH decreases with the decrease of θ Q . This is due increase of 𝜆 𝑝𝑠 of the SUTCH decreases with the decrease of θQ. This is due 𝑝𝑠 /𝜆 𝑠𝑠 𝑠𝑠 . Meantime, the 𝐶𝜌 𝜌𝜓 𝜓 to the fact that with the increase of 𝜆 /𝜆 ,, the attenuation of g ps is decreased while that of g ss is to the fact that with the increase of 𝜆 𝑝𝑠 𝑝𝑠 /𝜆𝑠𝑠 𝑠𝑠 the attenuation of g ps is decreased while that of g ss is increased. Consequently, the 𝐶 of SUTCH is lower increased. Consequently, the 𝐶𝜌𝜓 of SUTCH is lower with with the the same same transmitting transmitting power power .. On On the the 𝜌𝜓

Sensors 2016, 16, 1675

17 of 22

As depicted in Figure 8, it is clearly observed that the Cρψ of the SUTCH decreases with the increase of 16, λ ps1675 /λss . Meantime, the Cρψ of the SUTCH decreases with the decrease of θ Q . This 18 is due Sensors 2016, of 23 to the fact that with the increase of λ ps /λss , the attenuation of gps is decreased while that of gss is other hand, with the decrease the power allocated selected subchannel bound to be increased. Consequently, the Cof ofQ,SUTCH is lower with to theeach same transmitting power.is On the other ρψ θ reduced, which will lead deterioration in the of the SUTCH. hand, with the decrease ofto θ Qthe , the power allocated to 𝐶 each selected subchannel is bound to be reduced, 𝜌𝜓 which will lead to the deteriorationthe in the SUTCH. Compared comprehensively, 𝐶𝜌𝜓Cρof thethe SUTCH with ρ1 has the lowest value, since it just ψ of Compared comprehensively, the C of the SUTCH with ρ1 under has thedifferent lowest conditions, value, sincethe it ρψ ignores any a priori knowledge of subchannel’s status. However, just ignores any a priori knowledge of subchannel’s status. conditions, performance of the ρss and ρps are different. When the ratio However, of M/N is under small, different the best subchannel the performance of the ρ and ρ are different. When the ratio of M/N is small, the best subchannel ss ps selection policy is ρss . However, if the ratio of M/N is large, the best subchannel selection policy is selection policy is ρss . However, if the ratio of M/N is large, the best subchannel selection policy is ρps . ρps. In in Equation (49), the the BER BER of of SUCCH SUCCH is is derived. derived. In the the fourth fourth simulation, simulation, as as mentioned mentioned above, above, in Equation (49), Therefore, Monte Carlo Simulation is used to prove its rationality. The simulation parameters Therefore, Monte Carlo Simulation is used to prove its rationality. The simulation parameters are are 2 2 = σ22 shown shown in in Table Table1. 1. Suppose Suppose σ𝜎PU = 𝜎SUTCH.. 𝑃𝑈

SUTCH

Table 1. 1. Simulation Simulation parame parameters. Table te rs.

Parameter Value Parameter Value N N 4040 Number of active PUs and SUs [1, 40] Number active PUs and SUs [1, 40] GGSUCCH 2048 SUCCH 2048 Loading factor [1/160, 1/200, ...,..., 1/400] Loading factorГ Г [1/160, 1/200, 1/400] Random test times for each Г 750,000 Random test times for each Г 750,000

In Figure 9, the Simulation Simulation BER is calculated In Figure 9, the BER is calculated by by Monte Monte Carlo Carlo Simulation Simulation Experiment, Experiment, while while the the Theoretical BER is calculated by Equation (49). As depicted in Figure 9, the simulation BER follows the Theoretical BER is calculated by Equation (49). As depicted in Figure 9, the simulation BER follows Theoretical BERBER veryvery closely. the Theoretical closely. 10

-1

Theoretical BER Simulation BER

-2

BER

10

10

-3

-4

10 0.25%

0.28%

0.31%

0.42%

0.5%

0.63%



Figure ore tical and simulation BER BER versus ve rsusdifferent diffe re ntГ.Г. Figure9.9.The Theoretical and simulation

As mentioned in Section 4, the trade-off problem between the reliability of the SUCCH and the As mentioned in Section 4, the trade-off problem between the reliability of the SUCCH and the efficiency of the SUTCH is discussed. Here, suppose the arrival rate of the authorized PUs accessing efficiency of the SUTCH is discussed. Here, suppose the arrival rate of the authorized PUs accessing to to the subchannels follows a Poisson distribution. Simulation parameters are shown in Table 2. the subchannels follows a Poisson distribution. Simulation parameters are shown in Table 2. Suppose 𝑗 Suppose 𝜆 𝑚 represents the arrival rate of the PUs in accessible subchannels. j λm represents the arrival rate of the PUs in accessible subchannels. In the fifth simulation, the achieved spectral efficiency, achieved data traffic and unused data In the fifth simulation, the achieved spectral efficiency, achieved data traffic and unused data traffic are used to compare the accessible performance of the three different selection policies. Here, traffic are used to compare the accessible performance of the three different selection policies. Here, achieved spectral efficiency represents the proportion between data traffic and unus ed data traffic. achieved spectral efficiency represents the proportion between data traffic and unused data traffic. Data traffic is the total amount of unit data traffic when the SUTCH has successfully accessed to the idle subchannel, while unused data traffic is the achievable amount of unit data traffic during the time cost in establishing the connection.

Sensors 2016, 16, 1675

18 of 22

Data traffic is the total amount of unit data traffic when the SUTCH has successfully accessed 19 to of the Sensors 2016, 16, 1675 23 idle subchannel, while unused data traffic is the achievable amount of unit data traffic during the time cost in establishing the connection. Table 2. Simulation parame te rs. Parameters Values Table 2. Simulation parameters. N 40 Parameters Values M [0, 6] Number of the N active PUs 4034 M the SUs [0, 6] Number of 1 Number λ ofpsthe 34 , λssactive PUs 1, 1 Number of the SUs 1 𝑗 [80, 160] 𝜆 𝑚 , j λ= 1,, λ2, …, 6 1, 1 ps ss j Q 0.0001 W [80, 160] λm , j = 1, 2, . . . , 6 QSUCCH [0.01, 0.02, Q /Q 0.0001 W..., 0.1] QSUCCH [0.01, 0.02, θQ /Q −30 ..., dB0.1] θ − 30 dB Q G 128 G 128 e 0.01 and 0.05 e 0.01 and 0.05 ∂∂11,, ∂∂22 0.005, 0.005 0.005, 0.005 RR 1010 n (k), n = [1, R] [1/R] PPn(k), n = [1, R] [1/R] Learning time 100 (times of SUs access) Learning time 100 (times of SUs access) 10–12, the the different different performances performances of of the the three three strategies strategies are are shown shownin indetail. detail. Random Random In Figures 10–12, strategy has hasthe the worst accessible performance, because it simply thefactor loading factor Г strategy worst accessible performance, because it simply chooseschooses the loading Г randomly randomly without proper strategies. accessible strategies. Meanwhile, the accessible performance of MDP without proper accessible Meanwhile, the accessible performance of MDP Cross is Cross is better that of Statistical Mean. Furthermore, the fluctuation of performance curve of better than thatthan of Statistical Mean. Furthermore, the fluctuation of performance curve of MDP MDP is Cross is than lowerthat than of Statistical is the duefact to that, the fact that, in the simulation, Cross lower of that Statistical Mean. ItMean. is dueIt to in the simulation, suppose j 𝑗 suppose 𝜆 , j = 1, 2, …, 6 ∈ [80, 160] the state parameters of the accessible subchannel are changing λ , j = 1, 2, . . . , 6 ∈ [80, 160] the state parameters of the accessible subchannel are changing very m 𝑚 very fast, therefore, it is a quick-changing subchannel environment. In the quick-changing fast, therefore, it is a quick-changing subchannel environment. In the quick-changing subchannel subchannel environment, the information history stateofinformation subchannelisenvironment is changing very environment, the history state subchannelof environment changing very fast. However, fast. However, will useinformation, a lot of history information, so the fast-changing of history Statistical MeanStatistical will use a Mean lot of history so the fast-changing of history information will information will make bad influence on choosing the optimal strategy of Г. Thstrategy erefore, make a bad influence on achoosing the optimal allocation strategy of allocation Г. Therefore, the accessible theMDP accessible Cross fits bettersubchannel in the quick-changing subchannel environment . of Cross strategy fits betterofinMDP the quick-changing environment. 1 MDP Cross Random Statistical Mean

Achieved Spectral Efficiency

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0

20

40 60 Learning Time

80

100

Figure 10.10. Achie ve d spe ctral eefficiency fficie ncy of SUTCH under unde r three thre estrategies strate giewith s with le arning time . Figure Achieved spectral of the the SUTCH learning time.

Sensors 2016, 16, 1675 Sensors 2016, 16, 1675 Sensors 2016, 16, 1675

19 of 22 20 of 23 20 of 23

Achieved Data Traffic Achieved Data Traffic Achieved Data Traffic

Sensors 2016, 16, 1675

20 of 23

1000 1000 1000 800 800 800 600 600 600

MDP Cross MDP Cross Random MDP Cross Random Statistical Mean Random Statistical Mean Statistical Mean 200 2000 20 40 60 80 100 0 20 40 60 80 100 Learning Time 200 Learning Time 0 20 40 60 80 100 Learning Time Figure 11. Achie ve d data traffic of the SUTCH unde r thre e strate gie with le arning time Figure 11. Achieved data traffic of the SUTCH under three strategiess with learning time.. 400 400 400

Unused DataData Traffic Unused Unused DataTraffic Traffic

Figure 11. Achie ve d data traffic of the SUTCH unde r thre e strate gie s with le arning time . Figure 11. Achie ve 900 d data traffic of the SUTCH unde r thre e strate gie s with le arning time . 900 MDP Cross MDP Cross 900 Random 800 Random 800 MDP Cross Statistical Mean Statistical Mean Random 800 700 700 Statistical Mean 700 600 600 600 500 500 500 400 400 400 300 300 300 200 20 40 60 80 100 2000 0 20 40 60 80 100 Learning Time 200 Learning Time 0 20 40 60 80 100 Learning Time Figure 12. Unuse d data traffic of the SUTCH unde r thre e strate gie s with le arning time . Figure 12. Unuse d data traffic of the SUTCH unde r thre e strate gie s with le arning time . Figure 12. Unused data traffic of the SUTCH under three strategies with learning time. Figure 12. Unuse d data traffic of the SUTCH unde r thre e strate gie s with le arning time .

Achieved Spectral Efficiency Achieved Spectral Efficiency Achieved Spectral Efficiency

In the sixth simulation, different performances of the three strategies under constant In the sixth simulation, different performances of the𝑗 three strategies under constant application scenarios are shown in Figures 13–15. Suppose 𝜆𝑗𝑚 three is defined as constant constant, which is In In thethe sixth simulation, different the three under sixth simulation, different performances of the strategies under application constant application scenarios are shown inperformances Figures 13–15.ofSuppose 𝜆strategies 𝑚 is defined as constant, which is j shown as follows: 𝑗 scenarios arefollows: shown in Figures 13–15. λm is defined which is shown as follows: shown as application scenarios are shown in Suppose Figures 13–15. Supposeas𝜆 constant, 𝑚 is defined as constant, which is shown as follows: m11 , m22 , m33 , m44 , m55 , mi66   1/ 90,1/100,1/110,1/120,1/130,1/140 h 2 m , 3m , m4, m 5, m ,6m   1/ 90,1/100,1/110,1/120,1/130,1/140 λ1m , λ , λ 2 , λ3m , λ4 m ,5λm6 = [1/90, 1/100, 1/110, 1/120, 1/130, 1/140] mm1 , mm ,0.8 m , m , m , m   1/ 90,1/100,1/110,1/120,1/130,1/140

0.8 0.8

0.6 0.6 0.6 MDP Cross MDP Cross Random MDP Cross Random Statistical Mean Random Statistical Mean Statistical Mean

0.4 0.4 0.4 0.2 0.2 0.2 0 00 0 0 0

20 20 20

40 60 40 60 Learning Time Learning Time 40 60 Learning Time Figure 13. Achie ve d data traffic of the SUTCH unde r thre e

80 80 80

100 100 100

strate gie s with le arning time . Figure 13. Achie ve d data traffic of the SUTCH unde r thre e strate gie s with le arning time . Figure 13. ve d data r thre e strate gie s with with le arning time Figure 13. Achie Achieved data traffic traffic of of the the SUTCH SUTCHunde under three strategies learning time..

Sensors 2016, 16, 1675 Sensors 2016, 16, 1675 Sensors 2016, 16, 1675

20 of 22 21 of 23 21 of 23

Achieved AchievedData DataTraffic Traffic

1000 1000 800 800 600 600 MDP Cross Cross MDP Random Random Statistical Mean Mean Statistical

400 400 200 200 00

20 20

40 60 40 60 Learning Time Learning Time

80 80

100 100

Figure 14. Achie ve d data traffic of the SUTCH unde r three thre e strategies strate gie s withlearning le arning time . Figure Achieved datatraffic trafficof ofthe the SUTCH SUTCH unde under Figure 14.14. Achie ve d data r thre e strate gie swith with le arningtime. time . 900 900

MDP Cross MDP Cross Random Random Statistical Mean Statistical Mean

Unused UnusedData DataTraffic Traffic

800 800 700 700 600 600 500 500 400 400 300 300 200 200 0 0

20 20

40 60 40 60 Learning Time Learning Time

80 80

100 100

Figure 15. Unuse d data traffic of the SUTCH unde r thre e strate gie s with le arning time . Figure 15.15. Unuse d data r thre strate gie swith withlearning le arningtime. time . Figure Unused datatraffic trafficof ofthe theSUTCH SUTCHunde under threee strategies

As shown in these figures, the Random strategy still has the worst accessible performance. As shown in these figures, the Random strategy still has the worst accessible performance. As shown these figures, the Random strategy still hasisthe worst accessible Meanwhile, theinaccessible performance of Statistical Mean better than that of performance. MDP Cross. Meanwhile, the accessible performance of Statistical Mean is better than that of MDP Cross. Meanwhile, of Statistical is better thatthan of MDP Furthermore,the the accessible fluctuationperformance of the performance curve ofMean Statistical Meanthan is lower that ofCross. MDP Furthermore, the fluctuation of the performance curve of Statistical Mean is lower than that of MDP Furthermore, thetofluctuation of the curve of Statistical Mean is lower that of MDP Cross. It is due the fact that, in performance a slow -changing subchannel environment, the than slow-changing of Cross. It is due to the fact that, in a slow -changing subchannel environment, the slow-changing of Cross. It is due to the fact that, in a slow-changing subchannel environment, the slow-changing the history information will have a good influence on choosing the optimal allocation strategy of Г the history information will have a good influence on choosing the optimal allocation strategy of Г.. of the history will have good influence allocation strategy Therefore, theinformation accessible strategy of aStatistical Mean on fits choosing better in the theoptimal slow-changing subchannel Therefore, the accessible strategy of Statistical Mean fits better in the slow-changing subchannel of Г. Therefore, the accessible strategy of Statistical Mean fits better in the slow-changing environment. environment. subchannel environment. In addition, as shown from Figure 10 to Figure 15, both Statistical Mean and MDP Cross can In addition, as shown from Figure 10 to Figure 15, both Statistical Mean and MDP Cross can In addition, as from Figureenvironment, 10 to Figure 15, both Statistical andstate MDPinCross can learn learn and adapt toshown the subchannel and converge to Mean a stable a short time. learn and adapt to the subchannel environment, and converge to a stable state in a short time. and adapt to the subchannel environment, and converge to a stable state in a short time. Meanwhile, Meanwhile, they have the same rate of convergence. According to the an alysis in Section 4, both Meanwhile, they have the same rate of convergence. According to the an alysis in Section 4, both they have the same of convergence. According to thecomplexity. analysis in Section 4, both Mean Statistical Mean andrate MDP Cross have low computation Therefore, theyStatistical can be adopted Statistical Mean and MDP Cross have low computation complexity. Therefore, they can be adopted and MDP Cross have low computation complexity. Therefore, they can be adopted in practice. in practice. in practice.

6. Conclusions 6. Conclusions 6. Conclusions Dynamic spectrum access is an important and necessary technology for future cognitive sensor Dynamic spectrum access is an important and necessary technology for future cognitive sensor Dynamic access and is andiscussed importanta new and necessary forsensor futurenetworks cognitivewithout sensor networks. Thisspectrum paper identified mechanismtechnology to set up CR networks. This paper identified and discussed a new mechanism to set up CR sensor networks networks. This holes papertoidentified and discussed a new mechanismchannel to set model up CR was sensor networks using spectrum convey control information. A transmission discussed for without using spectrum holes to convey control information . A transmission channel model was without using spectrum holes to convey control information . A transmission channel model was analyzing the maximum access capacity of different policies and objectives in the fading environment. discussed for analyzing the maximum access capacity of different policies and objectives in the discussed for analyzing the maximum access capacity of different policies and objectives in the fading environment. The maximum achievable capacity of the SUTCH under ρ1 achieves the fading environment. The maximum achievable capacity of the SUTCH under ρ1 achieves the poorest performance, since it totally ignores any prior knowledge of the subchannel’s status. When poorest performance, since it totally ignores any prior knowledge of the subchannel’s status. When

Sensors 2016, 16, 1675

21 of 22

The maximum achievable capacity of the SUTCH under ρ1 achieves the poorest performance, since it totally ignores any prior knowledge of the subchannel’s status. When M/N is small, the best policy for subchannel selection is ρss . In contrast when this ratio is higher, ρss is better. To solve the trade-off between transmitting power of SUTCH and SUCCH’s capacity, a hybrid access method based on Reinforcement Learning model of MDP Cross and Statistical Mean is also proposed. Both of them outperform the Random strategy, which verified the effectiveness of the proposed methods. In addition, Statistical Mean is more suitable for slow variation application scenarios while MDP Cross performs better in fast variation scenarios. As is well known, there are many standard structure and policy of reinforcement learning, such as Q-learning and greedy algorithm. Therefore, in the next research, the different learning function and policy should be discussed, which can make a better trade-off between the performance and computation complexity. Acknowledgments: This paper is supported by the Key Development Program of Basic Research of China (JCKY2013604B001), Nation Nature Science Foundation of China (61301095), Nature Science Foundation of Heilongjiang Province of China (F201408) and the Fundamental Research Funds for the Central Universities (No. HEUCF100814 and HEUCF100816). Author Contributions: Yun Lin analyzed the basic theory and wrote the paper, Chao wang designed the experiments and finish the simulation. Jiaxing wang analyzed the simulation result. Zheng Dou checked the research result. Conflicts of Interest: The authors declare no conflict of interest.

References 1. 2. 3. 4.

5. 6. 7. 8. 9. 10.

11.

12. 13.

Zhao, Q.; Sadler, B.M. A Survey of Dynamic Spectrum Access: Signal Processing, Networking, and Regulatory Policy. IEEE Signal Process. Mag. 2007, 24, 79–89. [CrossRef] Joshi, G.P.; Nam, S.Y.; Kim, S.W. Cognitive Radio Wireless Sensor Networks: Applications, Challenges and Research Trends. Sensors 2013, 13, 11197–11228. [CrossRef] [PubMed] Liu, X. A Novel Wireless Power Transfer-Based Weighed Clustering Cooperative Spectrum Sensing Method for Cognitive Sensor Networks. Sensors 2015, 15, 27760–27782. [CrossRef] [PubMed] Kondareddy, Y.R.; Agrawal, P.; Sivalingam, K. Cognitive radio network setup without a common control channel. In Proceedings of the 2008 IEEE Military Communications Conference (MILCOM 2008), San Diego, CA, USA, 16–19 November 2008; pp. 1–6. Baldo, N.; Asterjadhi, A.; Zorzi, M. Dynamic spectrum access using a network coded cognitive control channel. IEEE Trans. Wirel. Commun. 2010, 9, 2575–2587. [CrossRef] Cormio, C.; Chowdhury, K.R. An Energy-Efficient Spectrum-Aware Reinforcement Learning-Based Clustering Algorithm for Cognitive Radio Sensor Networks. Sensors 2015, 15, 19783–19818. Khoshkholgh, M.G.; Navaie, K.; Yanikomeroglu, H. Access strategies for spectrum sharing in fading environment: Overlay, underlay and mixed. IEEE Trans. Mob. Comput. 2010, 9, 1780–1793. [CrossRef] Gastpar, M. On Capacity under Receive and Spatial Spectrum Sharing Constraints. IEEE Trans. Inf. Theory 2007, 53, 471–487. [CrossRef] Viterbi, A.J. CDMA: Principles of Spread Spectrum Communication; Addison-Wesley: Redwood City, CA, USA, 1995. Chakravarthy, V.; Wu, Z.; Temple, M.; Garber, F. Novel Overlay/Underlay Cognitive Radio Waveforms Using SD-SMSE Framework to Enhance Spectrum Efficiency—Part I: Theoretical Framework and Analysis in AWGN Channel. IEEE Trans. Commun. 2009, 57, 3794–3804. [CrossRef] Chakravarthy, V.; Wu, Z.; Temple, M. Novel Overlay/Underlay Cognitive Radio Waveforms Using SD-SMSE Framework to Enhance Spectrum Efficiency—Part II: Analysis in Fading Channel. IEEE Trans. Commun. 2010, 58, 1868–1876. [CrossRef] Jasbi, F.; So, D.K. Hybrid Overlay/Underlay Cognitive Radio Network with MC-CDMA. IEEE Trans. Veh. Technol. 2016, 65, 2038–2047. [CrossRef] Su, H.; Zhang, X. Cross-Layer Based Opportunistic MAC Protocols for QOS Provisioning over Cognitive Radio Wireless Networks. IEEE J. Sel. Areas Commun. 2008, 26, 118–129. [CrossRef]

Sensors 2016, 16, 1675

14. 15. 16. 17. 18. 19. 20. 21. 22. 23. 24. 25. 26.

22 of 22

Gupta, P.; Kumar, P.R. The Capacity of Wireless Networks. IEEE Trans. Inf. Theory 2000, 46, 388–404. [CrossRef] Tse, D.; Viswanath, P. Fundamentals of Wireless Communication; Cambridge University Press: Cambridge, UK, 2004. Jafar, S.A.; Srinivasa, S. Capacity Limits of Cognitive Radio with Distributed and Dynamic Spectral Activity. IEEE J. Sel. Areas Commun. 2007, 25, 529–537. [CrossRef] Ghasemi, A.; Sousa, E.S. Fundamental Limits of Spectrum Sharing in Fading Environments. IEEE Trans. Wirel. Commun. 2007, 6, 649–658. [CrossRef] Ross, S.M. A First Course in Probability; University of Southern California Press: Los Angeles, CA, USA, 2012. Khoshkholgh, M.G.; Navaie, K.; Yanikomeroglu, H. Achievable Capacity in Hybrid DS-CDMA/OFDM Spectrum-Sharing. IEEE Trans. Mob. Comput. 2010, 9, 765–777. [CrossRef] Boyd, S.; Vandenberghe, L. Convex Optimization; Cambridge University Press: Cambridge, UK, 2004. Papoulis, A.; Pillai, S.U. Probability, Random Variables, and Stochastic Processes, 4th ed.; McGraw-Hill: New York, NY, USA, 2002. Kaelbling, L.P. Reinforcement Learning: A Survey. J. Artif. Intell. Res. 1996, 4, 237–285. Brodersen, R.W.; Wolisz, A.; Cabric, D.; Mishra, S.M.; Willkomm, D. CORVUS: A Cognitive Radio Approach for Usage of Virtual Unlicensed Spectrum; White Paper; University of Berkeley: Berkeley, CA, USA, 2004. Bush, R.R.; Mosteller, F. Stochastic Models for Learning; John Wiley & Sons: New York, NY, USA, 1955. Roth A, E.; Erev, I. Learning in Extensive Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Run. Games Econ. Behav. 1995, 8, 164–212. [CrossRef] Li, J.; Bai, C.; Peng, H. Review of Learning Model and Experiment Based on Learning Theory. Sci. Technol. Manag. Res. 2013, 6, 143–150. © 2016 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC-BY) license (http://creativecommons.org/licenses/by/4.0/).

A Novel Dynamic Spectrum Access Framework Based on Reinforcement Learning for Cognitive Radio Sensor Networks.

Cognitive radio sensor networks are one of the kinds of application where cognitive techniques can be adopted and have many potential applications, ch...
6MB Sizes 1 Downloads 8 Views