Improved system identification with Renormalization Group.

ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

Contents lists available at ScienceDirect

ISA Transactions journal homepage: www.elsevier.com/locate/isatrans

Improved system identification with Renormalization Group$ Qing-Guo Wang a,n, Chao Yu a, Yong Zhang b a b

Department of Electrical and Computer Engineering, National University of Singapore, 117576, Singapore Qiming Venture Partners, Shanghai 200121, China

art ic l e i nf o

a b s t r a c t

Article history: Received 17 July 2013 Received in revised form 12 September 2013 Accepted 6 October 2013 This paper was recommended for publication by Dr. A.B. Rad

This paper proposes an improved system identification method with Renormalization Group. Renormalization Group is applied to a fine data set to obtain a coarse data set. The least squares algorithm is performed on the coarse data set. The theoretical analysis under certain conditions shows that the parameter estimation error could be reduced. The proposed method is illustrated with examples. & 2013 ISA. Published by Elsevier Ltd. All rights reserved.

Keywords: System identification Least squares estimate Renormalization Group

1. Introduction System identification is concerned with building mathematical models of dynamical systems from measured data [1–3]. The ordinary least squares (OLS) method has since been the dominant algorithm for parameter estimation due to its simplicity in concept and convenience in implementation [4,5]. Given a data set, one can always get an estimate of OLS. It is known that the OLS estimate could be biased for a regression model with noise [6,7]. And it is a very challenging problem to analyze the properties of OLS estimate analytically. With regard to the asymptotic properties of the least square estimates, it is known that the OLS estimate will converge to its real value when the system is disturbed with white noise. Otherwise, when the system is disturbed with correlated noise, the OLS estimate could be biased. Griliches [8] gives the expression of the bias for ARAR(1,1) models. Phillips and Inder [9,10] analyze the bias for simple first order ARARX models. Stocker [11] presents an expression of the bias for a common model but only gives very simple examples for illustration. He argues that for complicated models (e.g. models with higher order or with several exogenous variables), it is not ☆ This research is funded by the Republic of Singapore's National Research Foundation through a Grant to the Berkeley Education Alliance for Research in Singapore (BEARS) for the Singapore-Berkeley Building Efficiency and Sustainability in the Tropics (SinBerBEST) Program. BEARS has been established by the University of California, Berkeley as a center for intellectual excellence in research and education in Singapore. n Corresponding author. Tel.: þ 65 65162282. E-mail addresses: [email protected] (Q.-G. Wang), [email protected] (C. Yu), [email protected] (Y. Zhang).

practicable any more to get fully parameterized bias formulas. These would become very extensive even if the order of the model or the number of exogenous variables increases only slightly. Zheng [12,13] proposes a bias-eliminated least squares (BELS) algorithm to identify system parameters. The instrumental variable (IV) method [14–16] is also a very efficient way to avoid correlated noise and reduce the estimate bias. When the number of data points is limited, it is more difficult to analyze the bias. Many papers in the literature discuss the finitesample bias of OLS estimate for very simple models. Even when the system is with white noise, Hurwicz [17] proves that the OLS method yields biased estimates of regression coefficients, and it is possible to obtain explicit formula for the bias in very small samples. The general structure of the bias is revealed by Shaman and Stine [18,19]. Breton and Pham [20] provide an exact formula for the bias. Patterson [21] exploits Shaman and Stine' characterization of the bias in higher order models. When the system is disturbed with correlated noise, unfortunately, it has been proved difficult to investigate finite-sample bias analytically [22]. In the absence of such results, Monte Carlo experiments will provide an alternative source of information [22]. Sargent [22], Tjøstheim and Paulsen [23] analyze the bias through simulation. Maeshiro [24] shows that the bias of OLS estimate is determined by two effects, the dynamic effect and the correlation effect. The former is the bias of the parameters when the disturbance is white noise; the latter is the effect that contaminates the parameters when the disturbance is correlated with the lagged dependent variable. When the two effects have opposite signs, the OLS estimate performs well in terms of bias. In this paper, we present an improved system identification method with Renormalization Group (RG). Technically, the proposed

0019-0578/$ - see front matter & 2013 ISA. Published by Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.isatra.2013.10.003

Please cite this article as: Wang Q-G, et al. Improved system identification with Renormalization Group. ISA Transactions (2014), http: //dx.doi.org/10.1016/j.isatra.2013.10.003i

Q.-G. Wang et al. / ISA Transactions ∎ (∎∎∎∎) ∎∎∎–∎∎∎

2

method groups the given data set into a RG data set. Performing the least squares algorithm, the proposed method obtains an estimate based on the given data and another estimate based on the RG data. Through comparisons of the two estimates, we find that the proposed method could get a better estimate under certain conditions. The contributions of this paper are summarized as follows:

The proposed method forms a data set based on the system

inputs and outputs, and produces a new data set from the given data set with the members of the former different from those of the latter, whereas none of the existing methods has a similar idea. We present theoretical analysis and simulation results for an academic model to illustrate the effectiveness of our method.

The rest of this report is organized as follows. Section 2 states our problem and motivation. Section 3 details the asymptotic analysis. Section 4 discusses the finite-sample case. Section 5 presents the simulation examples. Section 6 concludes the paper.

where

θT ¼ ½α1 ; …; αn ; β0 ; …; βm ;

ð6Þ

and

φTt ¼ ½yt 1 ; …; yt n ; ut ; …; ut m : The parameter vector J¼

ð7Þ

θ is chosen to minimize the loss function

N

1 ∑ ðy φTt θÞ2 : 2 t¼1 t

ð8Þ

The OLS estimate is given by

θ^ ¼ ðΦT ΦÞ 1 ðΦT YÞ;

ð9Þ

where Y ¼ ½y1 ; …; yN , Φ ¼ ½φ1 ; …; φN estimation error is given by T

T

T

and E ¼ ½ɛ 1 ; …; ɛ N . The

Δθ ¼ θ^ θ ¼ ðΦT ΦÞ 1 ðΦT YÞ θ T T T T ¼ ðΦ ΦÞ 1 ðΦ ðΦθ þEÞÞ θ ¼ ðΦ ΦÞ 1 ðΦ EÞ:

ð10Þ

The bias of the estimate is the expectation of Δθ, that is E½Δθ ¼ E½θ^ θ ¼ E½ðΦ ΦÞ 1 ðΦ EÞ; T

T

2. Problem statement and motivation

and the variance of the estimate is given by

In this section, we first describe the system and give some details of OLS estimation. Then we state the problem and give our motivation. In the end, we illustrate the idea of the proposed method by a simple example.

cov½θ^ ¼ E½ðθ^ E½θ^ Þðθ^ E½θ^ ÞT :

ð11Þ

ð12Þ

When ɛt is white noise, we have [25] E½Δθ ¼ 0; and

2.1. System description

cov½θ^ ¼ s2e ðΦ ΦÞ 1 : T

Consider a linear dynamic model yt ¼ α1 yt 1 þ α2 yt 2 þ ⋯ þ αn yt n þ β0 ut þ β 1 ut 1 þ β 2 ut 2 þ ⋯ þ β m ut m þ ɛ t ;

ð1Þ

where ut is input, yt is output and ɛt is noise. Let et be a white noise which has the properties:

When ɛt is the correlated noise, the OLS estimate could be biased. With different weights wt assigned to the measurements [1], the criterion (8) becomes J¼

1 N ∑ wt ðyt φTt θÞ2 : 2 t¼1

ð13Þ

Then the resulting estimate is given by

E½et ¼ 0, E½e2t ¼ s2e , E½et es ¼ 0 for all t a s.

θ^ W ¼ ðΦT W ΦÞ 1 ðΦT WYÞ;

ð14Þ

where W ¼ diag ðw1 ; …; wN Þ is the weighting matrix. The estimation error is given by

In (1), ɛt may have three forms:

ΔθW ¼ θ^ W θ ¼ ðΦT W ΦÞ 1 ðΦT WEÞ:

ɛ t ¼ et ;

ð2Þ

ɛ t ¼ λ1 ɛt 1 þ λ2 ɛt 2 þ ⋯ þ λn ɛ t n ;

ð3Þ

E½ΔθW ¼ E½ðΦ W ΦÞ 1 ðΦ WEÞ;

ɛ t ¼ et þ λ1 et 1 þ λ2 et 2 þ ⋯ þ λn et n :

ð4Þ

and the variance of the estimate has the same form with (12). When ɛt is the white noise with EðEÞ ¼ 0 and EðEE T Þ ¼ Q , the estimate is unbiased. Furthermore, it has been shown [26] that when W ¼ Q 1 , θ^ is a best linear unbiased estimate (BLUE) and the variance of the estimate is given by

The bias of the estimate is given by

According to [1], we summarize some special cases of (1) in Table 1. 2.2. OLS estimation

T

T

cov½θ^ W ¼ ðΦ Q 1 ΦÞ 1 : T

We rewrite (1) as yt ¼ φTt θ þ ɛt ;

ð5Þ

Table 1 Some models as special cases.

ð15Þ

ð16Þ

When ɛt is the correlated noise, the weighted least squares (WLS) estimate could be biased. The weighted least squares method is efficient but it depends on knowing the variance structure, which is seldom available in practice. 2.3. Idea of the proposed method

αi

βi

ɛt

Name of model structure

αi a 0 αi a 0 αi a 0 αi a 0 αi a 0 αi a 0

βi ¼ 0 βi ¼ 0 βi ¼ 0 βi a 0 βi a 0 βi a 0

(2) (3) (4) (2) (3) (4)

AR ARAR ARMA ARX ARARX ARMAX

For system identification, one can always fit some model based on system inputs and outputs by OLS. However, when the system is equipped with correlated noise, the OLS estimate may be biased. This paper studies the estimation error of OLS analytically and finds some way to reduce it. We present an improved identification method using Renormalization Group (RG). RG was first proposed to study the



System response

3

3

2.5

2.5

2

2

1.5

1.5

yt

yt

Spatial distribution of fine data

3

1

1 group 1 group 2 group 3 group 4 group 5

0.5

measurements group 1 group 2 group 3 group 4 group 5

0.5

0

0 0

0.5

1

1.5

2

2.5

3

0

5

10

15

20

25

30

35

t

yt−1

Fig. 1. Data grouping. (a) Spatial distribution of fine data, (b) system response.

critical phenomena in the quantum field in 1971 by Wilson, who won the Nobel Prize for physics in 1982 [27], due to this great contribution. RG is widely used to analyze various physical problems [28–30]. Renormalization Group designs some Renormalization Group Transformation (RGT) to relate macroscopic physics quantity to microscopic one and invokes “scale invariance” to solve the problem. Considering system (5), we first form a data STt ¼ ðyt ; φTt Þ, t ¼ 1; 2; …; N and construct a data set S ¼ fS1 ; S2 ; …; SN g. Then we perform a RGT on S to obtain a transformed data set S R ¼ fSR1 ; SR2 ; …; SRK g, with STRj ¼ ðyRj ; φTRj Þ; j ¼ 1; 2; …; K. A RGT on S is to group a number of data points of S into one data point in S R in a systematic way. There are different ways to do this grouping. One example is to group all the data of S using K-means clustering method [31]. We define r ¼ K=N as the transformation ratio. For each group, the coarse data may be determined by simple linear superposition of all the fine data that belong to this group. Then the weighted least squares method is performed on the coarse data. For easy reference, we call the data set, S, as the fine data set and least squares estimate based on S as the OLS estimate. The data set S R obtained from a RGT on the fine data set is called as the coarse data set and the resulting estimate as the Renormalization Group Weighted Least Squares (RGWLS) estimate. In this paper, through both theoretical analysis and simulation examples, we will show that the estimation error of RGWLS could be smaller than that of OLS estimate under certain conditions. To see quickly the idea of the proposed method, we consider a simple model yt ¼ 0:8yt 1 þ 0:5ut þ γ ðet 0:8et 1 Þ; where y0 ¼ 0, ut is a unit step signal, et is a white noise. Let E½e2t ¼ 1 and γ ¼ 0:0625. The model is simulated over 30 steps. We have

θT ¼ ½0:8; 0:5; φTt ¼ ½yt 1 ; ut ;

t ¼ 1; 2; …; 30; T

Y ¼ ½y1 ; …; y30 ;

Φ ¼ ½φ1 ; …; φ30 T : Then the OLS estimate is obtained based on (9). Next we carry out the RG method. We first form the fine data as STt ¼ ðyt ; φt Þ ¼ ðyt ; yt 1 ; ut Þ ¼ ðyt ; yt 1 ; 1Þ and the fine data set as S ¼ fS1 ; S2 ; …; S30 g. Then a RGT is performed on the fine data set S to get the coarse data set S R . In this case, it is done by K-means clustering method based on Euclidean distance measure with K¼5, and for each group, the coarse data is obtained by the linear superposition of all the fine data in this group. We have SR1 ¼ S1 ; SR2 ¼ S2 ;

SR3 ¼ S3 þ S4 ; SR4 ¼ S5 þ ⋯ þ S8 ; SR5 ¼ S9 þ ⋯ þ S30 : It follows that

φR1 ¼ φ1 ; φR2 ¼ φ2 ; φR3 ¼ φ3 þ φ4 ; φR4 ¼ φ5 þ ⋯ þ φ8 ; φR5 ¼ φ9 þ ⋯ þ φ30 ;

yR1 ¼ y1 ; yR2 ¼ y2 ; yR3 ¼ y3 þy4 ; yR4 ¼ y5 þ⋯ þ y8 ; yR5 ¼ y9 þ⋯ þ y30 ;

ɛ R1 ¼ ɛ1 ; ɛ R2 ¼ ɛ2 ; ɛ R3 ¼ ɛ3 þ ɛ 4 ; ɛ R4 ¼ ɛ5 þ ⋯ þ ɛ8 ; ɛ R5 ¼ ɛ9 þ ⋯ þ ɛ30 :

The spatial distribution of different groups of the fine data is shown in Fig. 1(a) and the system response is shown in Fig. 1(b). It is obvious that yRj, φRj and ɛRj satisfy yRj ¼ φTRj θ þ ɛ Rj ;

j ¼ 1; 2; 3; 4; 5:

1 1 Choosing the weighting matrix as W ¼ diag ð1; 1; 14 ; 16 ; 484 Þ, the RGWLS estimate is obtained based on (14). Employing the Monte Carlo method, the simulation is repeated 100 times. Then we get Eð J Δθ J 2 Þ ¼ 0:38 and Eð J ΔθR J 2 Þ ¼ 0:19, where J J 2 denotes the l2-norm. Therefore, the RGWLS estimate is better than the OLS estimate.

3. Asymptotic analysis It is known that the bias of the OLS estimate is related to the number of data N if N is finite. If the system is disturbed by white noise, it is possible to obtain explicit formula for the bias in very small samples [17–21]. However, if the system is disturbed by colored noise, it has been proved to be difficult to investigate finite-sample bias analytically [22]. In the absence of such results, researchers [22–24] perform Monte Carlo experiments to provide alternative sources of information. In this section, we discuss the properties of the OLS and RGWLS estimates as N-1, analytically. With (10) being multiplied and divided by N, we have 2 !3 !1 ΦT Φ ΦT E 5 lim Δθ ¼ lim 4 : ð17Þ N-1 N-1 N N

Property 1 (Salas et al. [32]). If the limits limN-1 HðNÞ and limN-1 GðNÞ both exist, and limN-1 GðNÞ is nonsingular, then lim ½G 1 ðNÞHðNÞ ¼

N-1

1 lim GðNÞ lim HðNÞ :

N-1

N-1

ð18Þ



4

Then

With the above property, (17) is equivalent to !# " !# 1 " ΦT Φ ΦT E lim lim Δθ ¼ lim N-1 N-1 N-1 N N ¼

T ∑N t ¼ 1 ðφt φt Þ N-1 N

∑N t ¼ 1 ðφt ɛ t Þ : N-1 N

2

Eðy2t 1 Þ ¼

lim

lim Δθ ¼ ½Eðφt φ

N-1

φt ɛ t Þ:

N-1

1 N ∑ Z t φTt lim N-1 N t ¼ 1

1

N-1

1 N ∑ Z t φTt N-1N t ¼ 1

1 N ∑ Z t yt ; lim N-1 N t ¼ 1

1

lim

ð21Þ

1 N ∑ Z t ɛt N-1N t ¼ 1

lim

¼ ½EðZ t φTt Þ 1 ½EðZ t ɛ Tt Þ:

Explicit expressions for the covariance matrix PIV are given in [36,14]. 3.1. The asymptotic properties of OLS estimate It can be seen from (20) that when ɛt is the white noise, the OLS estimate will converge to its real value. However, when ɛt is the colored noise, the OLS estimate could be biased. It is known that Δθ is related to model structure, input signal and noise properties. But these factors cannot be reflected in (20) explicitly. Therefore, for formulation of the estimation error and easy illustration of the proposed method, we consider a simple ARMA(1,1) model, which is given by jα1 j o1;

ð23Þ

where ɛ t ¼ et þ λ1 et 1 . According to (6) and (7), we have θ ¼ α1 and φt ¼ yt 1 . It follows from (20) that lim Δα1 ¼ ½Eðy2t 1 Þ 1 ½Eðyt 1 ɛt Þ:

N-1

ð24Þ

Introducing the lag operator Lzt ¼ zt 1 into (23), we obtain yt ¼

1 ɛt ¼ ∑ αp1 ɛ t p : ð1 α1 LÞ p¼0

p q

q 1 Eðɛ t 1 p ɛ t 1 q Þ

p q

¼ ∑∑αp1 αq1 ð1 þ λ1 Þs2e þ2 ∑∑ αp1 αq1 λ1 s2e 2

p¼q

q ¼ pþ1

1

¼ ∑ α p¼0

λ

2 2 2p 1 ð1 þ 1 Þse þ 2

1

∑ α12p þ 1 λ1 s2e

p¼0

ð1 þ λ1 Þ 2 2λ1 α1 2 ð1 þ λ1 þ 2λ1 α1 Þ 2 ¼ s þ s ¼ se : 1 α21 e 1 α21 e 1 α21 2

2

ð26Þ

1

Eðyt 1 ɛ t Þ ¼ Eð ∑ αp1 ɛt 1 p ɛ t Þ p¼0 1

¼ ∑ αp1 E½ðet 1 p þ λ1 et 2 p Þðet þ λ1 et 1 Þ;

ð25Þ

ð27Þ

p¼0

where

(

E½ðet 1 p þ λ1 et 2 p Þðet þ λ1 et 1 Þ ¼

λ1 s2e ; for p ¼ 0; for p 4 0:

0;

Hence Eðyt 1 ɛ t Þ ¼ λ1 s2e :

ð28Þ

Substituting (26) and (28) into (24) gives " #1 2 ð1 þ λ1 þ2λ1 α1 Þ 2 lim Δα1 ¼ s ðλ1 s2e Þ e N-1 1 α21 ¼

ð22Þ

In order to let θ^ converge to θ, it is necessary that EðZ t φTt Þ is nonsingular and EðZ t ɛ t Þ ¼ 0.pAs ffiffiffiffi to the accuracy properties of the IV estimates, it is shown that N ðθ^ θÞ converges in distribution to a Gaussian distribution, which is denoted by pffiffiffiffi N ðθ^ θÞ Nð0; P IV Þ:

yt ¼ α1 yt 1 þ ɛ t

!2 3 ! 5 ¼ E ∑∑αp αq ɛt 1 p ɛ t 1 q 1 1

We also have

where the elements in matrix Zt are called instruments or instrumental variables. They can be chosen in different ways. A calculation for the IV estimate (21) gives lim Δθ ¼

p 1 ɛt 1 p

¼ ∑∑αp1 αq1 E½ðet 1 p þ λ1 et 2 p Þðet 1 q þ λ1 et 2 q Þ

ð20Þ

As can be seen from (20), when Eðφt φTt Þ is non-singular and Eðφt ɛt Þ ¼ 0, θ^ will converge to θ. Otherwise, the OLS estimate could be biased. The instrumental variable (IV) method, which is an efficient method to reduce the bias of OLS estimate as N-1, was introduced by Reiersøl [35]. This method has been popular for quite a long period. The basic IV method [14] for estimating θ in (5) is given by lim θ^ ¼

p 1

p q

ð19Þ

When system (1) is asymptotically stable with ɛt a stationary stochastic process that is independent of the input signal, SöderT ström and Stoica [33,14,34] show that the sums ∑N t ¼ 1 ðφt φt Þ=N and ∑N ð φ ɛ Þ=N tend to the corresponding expected values with t t t¼1 probability one as the number of data points, N, tends to infinity. Then (19) becomes T 1 ½Eð t Þ

∑ α

p¼0

¼ ∑∑α α

1

lim

1

E4

λ1 ð1 α21 Þ : 2 1 þ λ1 þ 2λ1 α1

ð29Þ

In order to obtain a neater result, we consider a special case that λ1 ¼ α1 . Then (29) becomes lim Δα1 ¼ α1 :

ð30Þ

N-1

3.2. The asymptotic properties of RGWLS estimate RG is performed on the fine data set to obtain the coarse data set. The RGT groups all fine data based on the K-means clustering method and takes linear superposition of all the fine data as the coarse data in each group. Introduce the notations yRj ¼ ∑ yt ; φRj ¼ ∑ φt ; ɛ Rj ¼ ∑ ɛ t ; t Aj

t Aj

t Aj

j ¼ 1; …; K;

ð31Þ

where yRj, φRj and ɛ Rj satisfy yRj ¼ φTRj θ þ ɛ Rj : The parameter JR ¼

ð32Þ

θ is chosen to minimize the loss function

K

1 ∑ ½w ðy φTRj θR Þ2 ; 2 j ¼ 1 j Rj

ð33Þ

where wj is the chosen weight. The RGWLS estimate is given by

θ^ R ¼ ðΦTR W ΦR Þ 1 ðΦTR WY R Þ;

ð34Þ

where Y R ¼ ½yR1 …; yRK , ΦR ¼ ½φR1 ; …; φRK , E R ¼ ½ɛ R1 ; …; ɛRK , and W ¼ diagðw1 ; …; wK Þ. The estimation error of RGWLS is given by T

ΔθR ¼ θ^ R θ ¼ ðΦTR W ΦR Þ 1 ðΦTR WY R Þ θ T T ¼ ðΦR W ΦR Þ 1 ðΦR WðΦR θ þ E R ÞÞ θ T T ¼ ðΦR W ΦR Þ 1 ðΦR WE R Þ:

T

T

ð35Þ



Next we analyze the properties of limN-1 ΔθR . Let nj be the number of fine data in the jth group so that n1 þ ⋯ þ nK ¼ N:

ð36Þ

when t os Eðyt 1 ys 1 Þjt;s A j 1

¼ ∑

Suppose nj tends to infinity. Similarly with (17) and (19), we have N-1

N-1

"

p¼0q¼0

T R WE R Þ

" ¼

p q

¼

N-1

# 1"

K

lim ∑ ðwj φRj φTRj Þ

¼

N-1 j ¼ 1 K

#

K

N-1 j ¼ 1

∑ lim ðwj φRj φTRj Þ

þ

∑ lim ðwj φRj ɛ Rj Þ :

j ¼ 1 nj -1

ð37Þ

j ¼ 1 nj -1

K

∑

lim

j ¼ 1 nj -1 K

∑

lim

j ¼ 1 nj -1

1 n2j

tAj

1 n2j

tAj

!

∑ φt ∑ φt

sAj

lim ΔθR ¼

N-1

∑ Eðφt φ

j¼1

N-1

¼

λ

s2e jt;s A j :

ð43Þ

1

∑ αp1 αq1 Eðɛt p 1 ɛ s q 1 Þjt;s A j

2 þt s α2q ð1 þ λ1 Þs2e þ 1

∑∑

þ

ð38Þ

∑∑

α

λ

α

p ¼ t sþqþ1

α

2 ts 1 ð1 þ 1 Þ þ

2q þ t s þ 1 2 1 se 1

λ

∑∑

þt s1 α2q λ1 s2e 1

p ¼ t sþq1

!

t;s A j

λ

t s1 1þ 1 1 21

α

α

λ

t sþ1 1 1

s2e jt;s A j :

ð44Þ

Substituting λ1 ¼ α1 into (42)–(44) gives ( s2e for t ¼ s; Eðyt 1 ys 1 Þjt;s A j ¼ 0 for t a s:

∑ ɛ Ts

K

∑ Eðφt ɛ s Þjt;s A j :

ð39Þ

j¼1

ð45Þ

Hence K

∑ Eðyt 1 ys 1 Þjt;s A j ¼ Ks2e :

ð46Þ

j¼1

From (40), we also have

K

∑ Eðyt 1 ɛ s Þjt;s A j :

ð40Þ

! 1 ∑ αp1 ɛ t p 1 ɛ s p¼0

Eðyt 1 ɛs Þjt;s A j ¼ E

:

ð47Þ

t;s A j

When t¼ s 1

Eðyt 1 ɛs Þjt;s A j ¼ ∑ αp1 Eðet p 1 þ λ1 et p 2 Þðet þ λ1 et 1 Þjt A j p¼0

j¼1

¼ λ1 s2e jt A j ;

In (40)

"

Eðyt 1 ys 1 Þjt;s A j ¼ E

!

1

∑ αp1 ɛ t p 1

p¼0

!#

1

∑ αq1 ɛ s q 1

q¼0

1

When t ¼s, 1

Eðyt 1 ɛs Þjt;s A j ¼ ∑ αp1 Eðet p 1 þ λ1 es p 2 Þðet þ λ1 es 1 Þjt;s A j p¼0

¼ 0jt;s A j ; 1

1

Eðyt 1 ɛs Þjt;s A j ¼ ∑ αp1 Eðet p 1 þ λ1 et p 2 Þðes þ λ1 es 1 Þjt;s A j

∑ αp1 αq1 Eðɛ t p 1 ɛ t q 1 Þjt A j

p¼0

p¼0q¼0

¼ ∑∑α α p q

p 1

λ

(

λ

q 1 E½ðet p 1 þ 1 et p 2 Þðet1 q 1 þ 1 et1 q 2 Þjt A j

! 2 2 2q þ 1 2p þ 1 2 2 ∑∑α2p ð1 þ λ Þs þ ∑∑ α λ s þ ∑∑ α λ s 1 e 1 e 1 e 1 1 1 p¼q p ¼ qþ1 q ¼ pþ1

1 þ λ1 þ 2λ1 α1 2 ¼ se jt A j ; 1 α21

ð49Þ

when t 4s, let k Z 2 be a positive integer, then

Eðyt 1 ys 1 Þjt;s A j ¼ ∑

ð48Þ

when t os,

t;s A j ð41Þ

¼

α

p ¼ t sþq

sAj

j¼1

λ

p q

and

K

α

st þ1 1 1

p¼0q¼0

¼

∑ Eðyt 1 ys 1 Þjt;s A j

α

st 1 1þ 1 1 21

¼ ∑∑αp1 αq1 E½ðet p 1 þ λ1 et p 2 Þðes q 1 þ λ1 es q 2 Þjt;s A j

Compared with (20), the estimation error of RGWLS (39) is more complicated to be analyzed because of the data grouping. In order to make a quantitative description of limN-1 ΔθR , we still consider the simple model (23). It follows from (39) that " # 1" # lim Δα1R ¼

λ

1

!!

T s Þjt;s A j

t;s A j

2 st 1 ð1 þ 1 Þ þ

¼ ∑

converge to Eðφt φTs Þjt;s A j and Eðφt ɛs Þjt;s A j , respectively. Then (38) is rewritten as " # 1" # K

α

λ

!

Eðyt 1 ys 1 Þjt;s A j

!! ∑ φTs

!

¼

α

2p þ s t þ 1 2 1 se 1

þst 1 α2p λ1 s2e 1

∑∑

q ¼ st þp1

when t 4s

Let wj ¼ 1=n2j . Substituting (31) into (37) gives " ! !!# 1 K 1 T ∑ φt lim ΔθR ¼ ∑ lim ∑ φt 2 N-1 j ¼ 1 nj -1 nj tAj t Aj " ! !!# K 1 T ∑ lim ∑ φt ∑ ɛt 2 tAj tAj j ¼ 1 nj -1 nj " ! !!# 1 K 1 T ¼ ∑ lim ∑ φ φ ∑ t s n -1 n2 t Aj sAj j¼1 j j " ! !!# K 1 T ∑ lim ∑ φ ɛ ∑ : s t 2 tAj sAj j ¼ 1 nj -1 nj As nj -1, suppose

∑∑

q ¼ st þpþ1

#

K

2 þst α2p ð1 þ λ1 Þs2e þ 1

∑∑

q ¼ st þp

lim ∑ ðwj φRj ɛRj Þ

# 1"

1

∑ αp1 αq1 Eðɛt p 1 ɛ s q 1 Þjt;s A j

¼ ∑∑αp1 αq1 E½ðet p 1 þ λ1 et p 2 Þðes q 1 þ λ1 es q 2 Þjt;s A j

ΦR Þ ðΦ lim ΔθR ¼ lim ½ðΦ N-1 1 T T ¼ lim ΦR W ΦR lim ΦR WE R 1

T RW

5

¼

K

∑ ð1 þ λ1 Þs2e þ ∑ α1 λ1 s2e ∑ Eðyt 1 ɛ s Þ 2

p ¼ 0 t ¼ sþ1

þ ∑ α t Aj

2

ð42Þ

p ¼ k1 t ¼ sþk

j¼1

p ¼ 1 t ¼ sþ1

λ

2 k1 ð1 þ 1 Þs2e þ 1

∑α λ

p ¼ k t ¼ sþk

k 2 1 1 se

) K þ ∑ αk1 2 λ1 s2e ∑ Eðyt 1 ɛ s Þ j¼1 p ¼ k2 t ¼ sþk

:

ð50Þ

t;s A j



6

Substituting λ1 ¼ α1 into (48)–(50) gives 8 2 > < α1 se for t ¼ s; for t o s; Eðyt 1 ɛs Þjt;s A j ¼ 0 > : s2 for t ¼ s þ 1:

ð51Þ

e

Denoting f j ¼ dj =ðnj 1Þ where dj is the number of St such that St and St þ 1 belong to the jth group, we have Eðyt 1 ɛs Þjt;s A j ¼ α1 s2e þ f j s2e :

ð52Þ

Hence K

K

j¼1

j¼1

∑ Eðφt ɛs Þjt;s A j ¼ K α1 s2e þ ∑ f j s2e :

ð53Þ

Substituting (46) and (53) into (40) gives lim Δα1R ¼ α1 þ

N-1

∑Kj¼ 1 f j K

:

ð54Þ

Dividing (54) by (30), we have ∑K

fj

α1 þ j K¼ 1 limN-1 Δα1R ¼ : α1 limN-1 Δα1 With jα1 j o 1, it is easy to verify limN-1 Δα1R o1; lim Δα N-1

ð55Þ 4.2. RGWLS and GLS ð56Þ

1

if and only if 8 K ∑j ¼ 1 f j > > > o α1 o1; < 2K K > ∑j ¼ 1 f j > > : 0o o 1: K

ð57Þ

When ∑Kj¼ 1 f j =K approaches to α1, Δα1R tends to zero and the RGWLS estimate will have greatest improvement. In addition, when K¼ N, then nj ¼ 1 and fj ¼0. It is easy to see that (54) becomes lim Δα1R ¼ α1 ;

N-1

estimation using OLS. The K-means clustering method is used to obtain 20 groups of data based on SNR levels so that these 20 groups of data have different levels of SNR. We rank them from the 1st to the 20th group with the 1st having the highest SNR and the 20th the lowest SNR. Then, we do two kinds of simulations. Firstly, we perform OLS on each group of data and obtain θ^ 1 ; …; θ^ 20 and Δθ^ 1 ; …; Δθ^ 20 , where Δθ^ i ¼ θ^ i θi . Secondly, we perform OLS on the accumulated groups, that is, on the first group to get θ 1 and Δθ 1 (Δθ 1 ¼ θ 1 θ1 ), and then on the first two groups combined to get θ 2 and Δθ 2 , and so on. Both kinds of simulations are repeated for 1000 times (Monto Carlo method) and EjΔθ^ 1 j; …; EjΔθ^ 20 j, and EjΔθ 1 j; …; EjΔθ 20 j are approximately obtained from these 1000 runs each. The resulting EjΔθ^ i j of different groups are shown in Fig. 2a and the resulting EjΔθ i j of accumulated groups are shown in Fig. 2b. It is seen from Fig. 2a that for this example, EjΔθ^ i j increases with i, which means that the estimate is better for the group with a higher SNR. Fig. 2b shows that the best estimate is θ 4 . This means that the OLS estimate of a subset of the total data could be better than that of the total data if the SNR of the subset is higher than that of the whole set.

ð58Þ

which is the same as (30), implying that RGWLS is reduced to OLS.

According to (31), the data matrices are expressed as Y R ¼ TY; ΦR ¼ T Φ; E R ¼ TE; where T 2 0 61 6 T ¼6 40 ⋮

ð60Þ

is a mutation matrix given by 3 1 0 0 0 0 ⋯ 0 0 0 1 0 ⋯7 7 7: 0 1 1 0 1 ⋯5 ⋮

⋮

⋮

⋮

⋮

⋱

Substituting (60) into (35) yields

ΔθR ¼ ðΦT MΦÞ 1 ðΦT MEÞ;

ð61Þ

T

where M ¼ T WT. Then we have EðΔθR ΔθR Þ ¼ EððΦ M ΦÞ 1 ðΦ MEE T M ΦÞðΦ M ΦÞ 1 Þ: T

4. Finite-sample analysis In this section, we study the properties of the OLS and RGWLS estimates when N is finite. We discuss the tradeoff between signal to noise ratio (SNR) and N, and compare the RGWLS method with the generalized least squares (GLS) method. If the system is disturbed with colored noise, it is difficult to develop the finitesample properties of the least squares estimate analytically [22]. We provide examples for illustration in Section 5. 4.1. Tradeoff between SNR and N

T

T

ð62Þ

It is difficult to make further analysis if Φ is stochastic and Φ and E are correlated. T Assume that Φ is non-stochastic, Φ and E are uncorrelated with EðEÞ ¼ 0 and EðEE T Þ ¼ R. The GLS estimate θ^ G is BLUE of θ. Let G be a N N matrix. Consider the transformed specification GY ¼ GΦθ þ GE:

ð63Þ

The resulting OLS estimate is

θ^ G ¼ ðΦT GT GΦÞ 1 ðΦT GT GYÞ;

ð64Þ

and the estimation error is

It is known that the finite-sample properties of OLS estimate is related to SNR and the number of data points N. Generally, a higher SNR and a larger N will lead to a better OLS estimate. There might be some cases that SNR is high for part of total data, and when N increases, SNR will decrease. Hence, for such cases, there should be a tradeoff between SNR and N. The following example is given for illustration. Consider a model yt ¼ 2ut þ γ et ;

T

ð59Þ

where θ ¼ 2, ut ¼ 0:01nt, et is the white noise with and γ is a chosen parameter that is used to adjust the noise level, say γ is used to adjust SNR. Suppose the system is simulated for 1000 steps with γ ¼ 100 when t r 500 and γ ¼ 1 when t 4 500. We study effects of SNR on

ΔθG ¼ θ^ G θ ¼ ðΦT GT GΦÞ 1 ðΦT GT GEÞ ¼ ðΦT Q ΦÞ 1 ðΦT Q EÞ;

ð65Þ

where Q ¼ GT G. Then we have EðΔθG ΔθG Þ ¼ EððΦ Q ΦÞ 1 ðΦ Q EE T Q ΦÞðΦ Q ΦÞ 1 Þ T

T

T

T

¼ ðΦ Q ΦÞ 1 ðΦ QEðEE T ÞQ ΦÞðΦ Q ΦÞ 1 T

¼ ðΦ Q ΦÞ T

T

1

T

ðΦ QRQ ΦÞðΦ Q ΦÞ 1 : T

T

ð66Þ

Söderström [34] shows that when Q ¼ R , θ^ G is BLUE of θ. And the desired transformation G can be designed based on the matrix Q. There is a big difference between our method and the GLS method. It is difficult to analyze (62) since the assumptions for GLS do not hold. The mathematical expectation cannot be taken through the nonlinear operators. Therefore, we provide examples in Section 5 to show the effects of our method for finite-sample cases. 1



7

^ for different groups of data, (b) EðjΔθjÞ for accumulated groups of data. Fig. 2. Example for illustration. (a) EðjΔθjÞ

Table 4 REEI of Example 2.

Table 2 REEI of Example 1.

Example 2. Consider a second order system with exogenous input

Table 3 RIV of Example 1.

yt ¼ 1:5yt 1 0:7yt 2 þut 1 þ γ ɛ t ;

ð68Þ

where ut ¼1, ɛt ¼ et 1:5et 1 þ 0:7et 2 and y0 ¼ y1 ¼ 0. Choose Z t ¼ ½1=ðtÞ2 ; 1; t, where t ¼ 1; 2; …N 2. Simulation results are shown in Tables 4 and 5. Generally, as can be observed from Table 2 to Table 5: 5. Simulation examples In this part, examples are given to illustrate the proposed method. We take very large N for asymptotic analysis in the first two examples. Finite-sample cases are considered in the remaining examples. When N is large, we introduce the notation REEI by REEI ¼

J ΔθR J 2 ; J Δθ J 2

which describes the size of the ratio between the estimation errors of RGWLS and OLS. Apparently, if REEI o1, the RGWLS estimate does have improvement. We will discuss how the number of data points, the noise level and the transformation ratio affect REEI. Similarly, we introduce the notation RIV by RIV ¼

J ΔθIV J 2 ; J Δθ J 2

where ΔθIV is the estimation error obtained with IV method. The instrumental variable Zt could be chosen according to [37]. Example 1. Consider an ARMA(1,1) model yt ¼ 0:95yt 1 þ γ ɛ t ;

ð67Þ

where ɛ t ¼ et 0:95et 1 , y0 ¼ 1000, and γ is a chosen parameter that is used to adjust the noise level. Choose Z t ¼ 1=ðtÞ2 , where t ¼ 1; 2; …N 1. Simulation results are shown in Tables 2 and 3.

for all simulation cases, REEI o 1; for the same γ and r, a larger N leads to a larger REEI; for the same N and r, a smaller γ leads to a smaller REEI; for the same N and γ, a smaller r results in a smaller REEI; We have REEI oRIV for some cases but not all. Note that the result of our method depends on how to choose grouping methods and the parameter, while the result of IV relies much on the selected instrumental variables. The RG method provides a new option for system identification under colored noise with comparable results to the IV one at least. Since the RG is new, future research on it may advance it with much better results.

Although it is difficult to analyze the finite-sample properties of least squares estimate analytically, we still perform Monte Carlo experiments to get approximate results. In the following part, two examples are presented for illustration. The simulation for each example is repeated 100 times to obtain Eð J Δθ J 2 Þ and Eð J ΔθR J 2 Þ. Denote REEF ¼

Eð J ΔθR J 2 Þ ; Eð J Δθ J 2 Þ

ð69Þ

which describes the size of the ratio between the finite-sample bias of RGWLS estimate and that of OLS estimate. Apparently, if REEF o 1, the RGWLS estimate is better than the OLS estimate. We study how N, γ and r affect REEF. Example 3. Considering model (67), simulation results are shown in Table 6.



8

Table 5 RIV of Example 2.

Table 6 REEF of Example 3.

Table 7 REEF of Example 4.

Example 4. Considering model (68), simulation results are shown in Table 7. As can be observed from Tables 6 and 7, REEF o1 for all simulation cases. Although it is difficult to make theoretical analysis, the proposed method still leads to a better estimate. In addition, N, γ and r are all not necessarily linked to REEF, which differs from the asymptotic case. In the future, we will try to study how other factors influence the result, such as model structures and input signals. 6. Conclusions In this paper, an improved system identification method has been proposed based on Renormalization Group. Given a fine data set, it is easy to obtain the OLS estimate. By Renormalization Group, a coarse data set is created from the original fine data set. This new data set enables us to get the RGWLS estimate. Through comparisons, we find that the estimation error of RGWLS estimate could be smaller than that of the OLS estimate. Thus, the proposed method could have improvement. It should be pointed out that the proposed method requires sufficient data. Furthermore, its in-depth theory and additional potential values are to be investigated. References [1] Lennart L. System identification: theory for the user. Upper Saddle River, NJ: PTR Prentice Hall; 1999.

[2] Schoukens J. System identification. Wiley; 2012. [3] Ding F. System identification: new theory and methods, 2013. [4] Vecchia D, Splett J. Outlier-resistant methods for estimation and model fitting. ISA Trans 1994;33:411–20. [5] Klopfenstein Jr. R. Data smoothing using a least squares fit c þ þ class. ISA Trans 1998;37:3–19. [6] Peyton Jones JC, Muske KR. Identification and adaptation of linear look-up table parameters using an efficient recursive least-squares technique. ISA Trans 2009;48:476–83. [7] Natarajan K, Gilbert A, Patel B, Siddha R. Frequency response adaptation of pi controllers based on recursive least-squares process identification. ISA Trans 2006;45:517–28. [8] Griliches Z. A note on serial correlation bias in estimates of distributed lags. Econom: J Econom Soc 1961:65–73. [9] Phillips PCB, Wickens MR. Exercises in econometrics, vol. 2. P. Allan; 1978. [10] Inder BA. Bias in the ordinary least squares estimator in the dynamic linear regression model with autocorrelated disturbances. Commun Stat Simul Comput 1987;16:791–815. [11] Stocker T. On the asymptotic bias of OLS in dynamic regression models with autocorrelated errors. Stat Pap 2007;48:81–93. [12] Zheng WX. On a least-squares-based algorithm for identification of stochastic linear systems. IEEE Trans Signal Process 1998;46:1631–8. [13] Zheng WX. On least-squares identification of ARMAX models. In: Proceedings of 2002 IFAC 15th triennial world congress, Barcelona, Spain, 2002. [14] Söderström T, Stoica P. Instrumental variable methods for system identification. Berlin: Springer; 1983. [15] Victor S, Malti R, Oustaloup A. Instrumental variable method with optimal fractional differentiation order for continuous-time system identification. In: System identification, vol. 15. p. 904–9. [16] Liu X, Wang J, Zheng W. Convergence analysis of refined instrumental variable method for continuous-time system identification. Control Theory Appl IET 2011;5:868–77. [17] Hurwicz L. Least squares bias in time series. Stat Inf Dyn Econ Models 1950:365–83. [18] Shaman P, Stine RA. The bias of autoregressive coefficient estimators. J Am Stat Assoc 1988;83:842–8. [19] Stine RA, Shaman P. A fixed point characterization for bias of autoregressive estimators. Ann Stat 1989:1275–84. [20] Breton A, Pham DT. On the bias of the least squares estimator for the first order autoregressive process. Ann Ins Stat Math 1989;41:555–63. [21] Patterson K. Finite sample bias of the least squares estimator in an ar (p) model: estimation, inference, simulation and examples. Appl Econ 2000;32:1993–2005. [22] Sargent TJ. Some evidence on the small sample properties of distributed lag estimators in the presence of autocorrelated disturbances. Rev Econ Stat 1968;50:87–95. [23] Tjøstheim D, Paulsen J. Bias of some commonly-used time series estimates. Biometrika 1983;70:389–99. [24] Maeshiro A. Teaching regressions with a lagged dependent variable and autocorrelated disturbances. J Econ Edu 1996:72–84. [25] Eykhoff P. System identification: parameter and state estimation. NY: John Wiley & Sons; 1974. [26] Regression Analysis Tutorial, University of California at Berkeley, 1967. [27] Wilson KG. Renormalization group and critical phenomena. I. Renormalization group and the Kadanoff scaling picture. Phys Rev B 1971;4:3174. [28] Arulsamy AD. Renormalization group method based on the ionization energy theory. Ann Phys 2011;326:541–65. [29] Hung PQ, Xiong C. Renormalization group fixed point with a fourth generation: Higgs-induced bound states and condensates. Nucl Phys B 2011. [30] Dumitru A, Jalilian-Marian J, Lappi T, Schenke B, Venugopalan R. Renormalization group evolution of multi-gluon correlators in high energy QCD. Phys Lett B 2011;706:219–24. [31] Friedman J, Hastie T, Tibshirani R. The elements of statistical learning: data mining, inference, and prediction. New York, NY: Springer-Verlag; 2009. [32] Salas S, Hille E, Etgen G. Calculus: one and several variables. Wiley; 1990. [33] Söderström T. Ergodicity results for sample covariances. Probl Control Inf Theory 1975;4:131–8. [34] Söderström T, Stoica P. System identification. Prentice-Hall, Inc.; 1988. [35] Reiersøl O. Confluence analysis by means of lag moments and other methods of confluence analysis. Econom: J Econom Soc 1941;1:1–24. [36] Chung KL. A course in probability theory. Brace & World, New York: Harcourt; 1968. [37] Wang Q-G, Guo X, Zhang Y. Direct identification of continuous time delay systems from step responses. J Process Control 2001;11:531–42.


Communication: four-component density matrix renormalization group.

New applications of renormalization group methods in nuclear physics.

The ab-initio density matrix renormalization group in practice.

Ensemble renormalization group for the random-field hierarchical model.

Communication: Active space decomposition with multiple sites: density matrix renormalization group algorithm.

Kardar-Parisi-Zhang equation with spatially correlated noise: a unified picture from nonperturbative renormalization group.

A general non-Abelian density matrix renormalization group algorithm with application to the C2 dimer.

Reexamination of the nonperturbative renormalization-group approach to the Kosterlitz-Thouless transition.

Linear response theory for the density matrix renormalization group: efficient algorithms for strongly correlated excited states.

Tensor renormalization group study of classical XY model on the square lattice.

Self-consistent embedding of density-matrix renormalization group wavefunctions in a density functional environment.

Ab initio density matrix renormalization group study of magnetic coupling in dinuclear iron and chromium complexes.

Tensor Network Renormalization.

Thermodynamics of weakly coupled Falicov-Kimball chains from renormalization-group theory.

Renormalization group approach to the Fröhlich polaron model: application to impurity-BEC problem.

2 spins on the pyrochlore lattice.

An improved method for the identification of Gc1 subtypes (group-specific component) by isoelectric focusing.

Small-cluster renormalization group in Ising and Blume-Emery-Griffiths models with ferromagnetic, antiferromagnetic, and quenched disordered magnetic interactions.

Emergence of criticality in the transportation passenger flow: scaling and renormalization in the Seoul bus system.

Improved vitrectomy illumination system.

Improved x-ray detection and particle identification with avalanche photodiodes.

Improved facial nerve identification during parotidectomy with fluorescently labeled peptide.

Improved postgraduate training system needed.

Pride, Shame, and Group Identification.