1342

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

Randomized Gradient-Free Method for Multiagent Optimization Over Time-Varying Networks Deming Yuan and Daniel W. C. Ho, Senior Member, IEEE Abstract— In this brief, we consider the multiagent optimization over a network where multiple agents try to minimize a sum of nonsmooth but Lipschitz continuous functions, subject to a convex state constraint set. The underlying network topology is modeled as time varying. We propose a randomized derivative-free method, where in each update, the random gradient-free oracles are utilized instead of the subgradients (SGs). In contrast to the existing work, we do not require that agents are able to compute the SGs of their objective functions. We establish the convergence of the method to an approximate solution of the multiagent optimization problem within the error level depending on the smoothing parameter and the Lipschitz constant of each agent’s objective function. Finally, a numerical example is provided to demonstrate the effectiveness of the method.

Index Terms— Average consensus, distributed multiagent system, distributed optimization, networked control systems. I. I NTRODUCTION In recent years, the development of distributed methods for dealing with convex optimization problems has attracted considerable research interest [5]–[7], [12]–[17], [21]–[23]. Such distributed methods are, in general, a combination of the standard subgradient (SG) methods in convex optimization theory and the average consensus algorithms in multiagent systems, which focus on computing the average of the states distributed among the agents in the network in an iterative pattern [1]–[4], [10], [18], [20]. The average consensus algorithms arise in many applications, including distributed coordination of multiple autonomous vehicles [25], [26], synchronization of complex networks [11], [24], and so on. The problem of minimizing a sum of convex objective functions, which are distributed among multiple agents over a network, has attracted much research interest recently. Nedic and Ozdaglar [17] develops a general framework for the multiagent optimization problem over a network; they propose the distributed SG methods and analyze the convergence properties of the methods. Nedic et al. [13] further considers the case that the agents’ states are constrained to convex sets, and proposes the projected consensus algorithm. Ram et al. [12] considers the case of the noisy SGs corrupted by stochastic errors. Zhu and Martinez [6] further take the inequality and equality constraints into consideration, and they propose the distributed Lagrangian primal-dual SG method, the basic idea of which is to characterize the primal-dual optimal solutions as the saddle points of the associated Lagrangian. Johansson et al. [23] proposes a variant of the distributed SG method, Manuscript received April 16, 2013; revised February 25, 2014; accepted June 26, 2014. Date of publication August 1, 2014; date of current version May 15, 2015. This work was supported in part by the National Natural Science Foundation of China under Grant 61304042, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20130856, in part by the University Grants Commission under Grant CityU 114113, and in part by the Program for Changjiang Scholars. D. Yuan is with the College of Automation, Nanjing University of Posts and Telecommunications, Nanjing 210046, China (e-mail: [email protected]). D. W. C. Ho is with the Department of Mathematics, City University of Hong Kong, Hong Kong, and also with the School of Automation, Nanjing University of Science and Technology, Nanjing 210094, China (e-mail: [email protected]). Digital Object Identifier 10.1109/TNNLS.2014.2336806

in which the estimate is first adjusted along the negative SG direction of the local cost function, and then the updated information is shared with its neighboring agents by executing several consensus steps (lower bounded by some positive integer). Inspired by [23], Yuan et al. [5] further incorporates the global inequality constraints and investigates the convergence properties of the method. These methods or algorithms are synchronous because they require that all agents in the network update at the same time. From a different viewpoint, [22] develops an asynchronous distributed algorithm for the multiagent optimization problem; they also establish convergence results of the algorithm. The aforementioned methods or algorithms, however, rely on the assumption that the SGs of the objective functions are available to each agent, respectively. However, in some cases, it may be computationally infeasible or expensive to calculate the exact SGs (see [27], [29], and references therein). It is desirable to develop gradient-free methods for solving convex optimization problems in a distributed setting. On a much broader scale, our work in this brief is related to the distributed stochastic gradient methods [8], [9] (for recent convergence results on the asymptotic behavior of distributed stochastic gradient methods with decreasing step sizes, see [19]). The main goal of this brief is to study the convergence properties of the randomized derivative-free method for the multiagent optimization problem, in which the objective function is a sum of the agents’ local nonsmooth objective functions, and the underlying network topology of the problem is modeled as time varying. Different from existing results in the literature, we focus on using the random gradient-free oracles, which are developed in [27], other than the SGs of each objective function, to develop the distributed method for the multiagent optimization problem. It is noted that such gradientfree methods are absent in this field. We estimate the convergence performance of the proposed method, and show that each agent can achieve approximate solutions within the error level depending on the smoothing parameter and Lipschitz constant of each objective function; in particular, we establish the convergence of the method to an approximate solution of the multiagent optimization problem by choosing the smoothing parameters appropriately. In contrast to the SG-based methods in [7], [12], and [17], our proposed method applies to the case when the SGs of the objective functions cannot be evaluated, or, it is computationally demanding to find the SGs. Note also that as compared to [27], the proposed randomized gradient-free (RGF) method extends the method in [27] to the multiagent scenario. By comparison to previous work, our convergence results are different, and the main contributions of this brief are twofold. 1) First, different from the methods considered in existing papers, which rely on computing the SGs of each agent’s objective function, we propose the derivative-free method, which is based on random gradient-free oracles for the multiagent optimization problem. 2) Second, we obtain some error bounds on the convergence of the method and establish the relation between the error level and smoothing parameters.

2162-237X © 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

II. P ROBLEM F ORMULATION A. Notation and Terminology Let Rn be the n-dimensional vector space. We write x to denote the Euclidean norm of a vector x, and  X [x] to denote the Euclidean projection of a vector x on X. 1 ∈ Rn represents the vector with all ones. We use [x]i to denote the ith component of a vector x, and x T to denote the transpose of x. For a matrix P, [P]i j represents the element in the ith row and j th column of P, and its transpose is denoted P T . For k > s, we write P(k : s) = P(k)P(k − 1) · · · P(s + 1)P(s), and P(k : k) = P(k). We use E[x] to denote the expected value of a random variable x. For a function f , its gradient at a point x is represented by ∇ f (x). B. Multiagent Optimization and Its Smoothed Version We consider the constrained multiagent optimization problem, which has the form n  f i (x) (1) min f (x)  x∈X

i=1

where x ∈ Rm is a global decision vector; f i : Rm → R is the convex objective function of agent i known only by agent i and f i is Lipschitz continuous over X with Lipschitz constant G 0 ( f i ); X ⊆ Rm is a nonempty closed convex set. We denote the optimal solution of (1) by x  , and the optimal value by f  = f (x  ). The assumption that each f i is Lipschitz continuous is satisfied, for example, when the set X is compact, or, when each f i is polyhedral, i.e., f i is the pointwise maximum of a finite collection of affine functions. In this brief, we are interested in the case that each function f i is nonsmooth. Due to the fact that in many cases, the SGs of a nonsmooth function cannot be evaluated easily and efficiently, therefore, we focus on derivative-free optimization methods for solving (1). Now, we present the smoothed version of (1), given by min

x∈X

fμ (x) 

n  i=1

f μi i (x)

f i i (x) represents the Gaussian approximation of μ

where explicitly given by [27]



1 fμi i (x) = κ

 Rm

(2)

1343

Algorithm 1 RGF method Initialize: 1. Choose random x0i ∈ X, ∀i ∈ V . 2. A random sequence {ξki }k≥0 is locally generated in an i.i.d. manner according to the Gaussian distribution for each i ∈ V . Iteration (k ≥ 0): i = 1, . . . , n.  j i = nj =1 [P(k)]i j xk . 1. Compute a weighted average ϑk+1 i i i 2. Compute xk+1 =  X [ϑk+1 − γk gμi (xk )], where γk > 0 is the step size and gμi (xki ) is the random gradient-free oracle, given explicitly by     f i xki + μi ξki − f i xki i i ξk . gμi (xk ) = μi

1) P(k) is doubly stochastic. 2) The diagonal entries of P(k) are positive, i.e., there exists a scalar ν > 0 such that [P(k)]ii ≥ ν for all i ∈ V . Additionally, if [P(k)]i j > 0, then [P(k)]i j ≥ ν for all i, j ∈ V . Assumption 2: For all k ≥ 0, there exists a positive integer T such that the graph (V, E(P(kT )) ∪ · · · ∪ E(P((k + 1)T − 1))) is strongly connected. III. RGF M ETHOD In this section, motivated by the gradient-free method in [27] and projected consensus algorithm in [13], we present the RGF method for solving (1), which is given in Algorithm 1. For each agent i, the RGF method generates a random sequence {ξki }k≥0 ; we use Fk to denote the σ -field generated by the entire history of the random variables to iteration k, that is     Fk = x0i , i ∈ V ; ξsi , i ∈ V ; 0 ≤ s ≤ k − 1 with

 F0 = x0i , i ∈ V .

f i (x),

1 2 f i (x + μi ξ )e− 2 ξ  dξ

1 2 with κ = Rm e− 2 ξ  dξ = (2π)m/2 , and μi is a nonnegative scalar representing the smoothing parameter of function f i i , for all i. Note μ that the smoothed functions f i i are relatively well-behaved, and we μ

will prove this point in Section II-C. C. Network Model

We assume that n agents communicate over a network with timevarying topology. The network is modeled as a directed graph with node set V = {1, . . . , n} and time-varying link set. We define P(k) with entries [P(k)]i j to be the communication matrix for the network that respects the structure of the graph at time k; this induces the link set E(P(k)), which is the set of activated links at time k, defined as E(P(k)) = {( j, i) | [P(k)]i j > 0, i, j ∈ V }. Note that each link set E(P(k)) includes self-edges (i, i) for all i. The agents’ connectivity at each time instant k can be represented by a directed graph G(k) = (V, E(P(k))). For the communication matrix P(k) and agent network, we make the following standard assumptions [28]. Assumption 1: For all k ≥ 0, the communication matrix P(k) satisfies the following properties.

Remark 1: Note that the RGF method is synchronous, since all agents need to use the same step size values; this means that each agent needs to coordinate its step size values with those of its neighbors. Note also that to implement the RGF method, each agent needs to make two function evaluations per iteration. Now, we introduce the following lemma, collected from [27] [Theorem 1, and 4, (11) and (22)], to provide some important properties of the function f i i (x) and the random gradient-free oracle μ

gμi (xki ). Lemma 1: For each i ∈ V , we have the following. (a) f i i (x) is convex and differentiable, and it satisfies μ

√ f i (x) ≤ fμi i (x) ≤ f i (x) + mμi G 0 ( f i ).

(b) The gradient ∇ f i i (x) satisfies μ



E gμi (xki ) | Fk = ∇ fμi i (xki ). (c) The random gradient-free oracle gμi (xki ) satisfies

E gμi (xki )2 | Fk ≤ (m + 4)2 G 20 ( f i ). We now provide a result on the convergence of the RGF method; in particular, the theorem shows the basic convergence property of the

1344

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

local sequence {xki }k≥0 via the local weighted average vector defined at each agent i ∈ V t −1 γk xki i , t ≥ 1. (3) xt = k=0 t −1 k=0 γk

TABLE I T OTAL N UMBER OF I TERATIONS

Theorem 1: Let Assumptions 1 and 2 be satisfied, and let the sequence {xki }k≥0 be generated by the RGF method. Assume further that each f i is Lipschitz continuous with constant G 0 ( f i ). For each j ∈ V and all t ≥ 1, we have n 

√ j 0 μi E f ( xt ) − f  ≤ m G



i=1

n    ⎣1 E x0i − x  2 2 k=0 γk i=1

1

+ t −1

⎤ t −1 t −1   1 2 0 γk2 + n(m + 5)G γk k ⎦ + n(m + 4)2 G 0 2 k=0

(4)

k=0

0 = maxi∈V G 0 ( f i ) and k = maxi, j ∈V E x i − x j  . where G k k Proof: See Appendix A. Remark 2: Theorem 1 presents the basic convergence result of j the RGF method. The difference, E[ f ( xt )] − f  , which can be locally accessed by each agent j , is upper bounded by four terms; in particular, the first term in the upper bound (4) is the penalty incurred due to using the gradient-free oracle instead of the SG; the second and third terms are optimization error terms that are common to the gradient-free method in [27]; compared to the error bound in [27] (see [27, Th. 6]), the fourth term is an additional penalty incurred due to having different estimates in the network, which is the cost of aligning each agent’s decision with the decisions of its neighbors. The next lemma provides an estimate for the disagreement among the agents. Lemma 2: Let Assumptions 1 and 2 be satisfied, and let the sequence {xki }k≥0 be generated by the RGF method with the step size γ , where  γ ≥ 0. Assume sequence {γk }k≥0 satisfying limk→∞ γk =  further that each f i is Lipschitz continuous with constant G 0 ( f i ). For each i ∈ V , we have  

n B 0  γ G lim sup E xki − x k  ≤ (m + 4) 2 + 2 B (1 −  B) k→∞  where x k = 1/n ni=1 xki , B = 1 − v/4n 2 , and  B = B 1/T . Proof: See Appendix B. Equipped with Theorem 1 and Lemma 2, we now are ready to present the main result. Theorem 2: Under the conditions of Lemma 2, suppose moreover  γ = 0. For each j ∈ V and all t ≥ 1, we have that ∞ k=0 γk = ∞ if  n 

√ j 0 2  xt ) − f  ≤ m G μi + C G lim sup E f ( 0γ t →∞

(5)

i=1

B/B 2 (1 −  B)). where C = n(m + 5)2 (9/2 + 2n  Proof: See Appendix C. Remark 3: Theorem 2 shows that the difference,

j xt ) − f  , is upper bounded by two terms; lim supt →∞ E f ( the first term in the upper bound (5), which as we have stated earlier, is the penalty incurred due to optimizing the smoothed version (2) instead of the original one (1); the second one is the penalty incurred due to letting the step size sequence {γk }k≥0 not converge to 0. When the step size is diminishing, i.e.,  γ = 0, we can see from Theorem 2 that the second term on the right-hand side of (5)

 √ j 0 n μi . xt ) − f  ≤ m G is 0, which implies lim supt →∞ E f ( i=1

Remark 4: Compared to the error bounds in the literature on multiagent optimization (see [17, Proposition 3], [23, Th. 1]), there  additional term in the error bound in √ is an 0 n μi ). As we have stated earlier, this Theorem 2 (i.e., m G i=1 is the penalty incurred due to optimizing the smoothed version (2) instead of the original one (1). In addition, the error bound of Theorem 2 scales as m 2 in the number m of decision variables, which is a natural penalty incurred using the gradient-free oracles, which can be seen from Lemma 1(c) and [27] as well. IV. N UMERICAL E XAMPLE In this section, we consider a distributed convex optimization problem, which is a variant of Nesterov’s nonsmooth test problem [27], given by ⎤ ⎡ n m−1   ai ⎣|[x]1 − 1| + |1 + [x]s+1 − 2[x]s |⎦ (6) min x∈Rm

i=1

s=1

where the problem data are positive real numbers ai , i = 1, . . . , n. √ 0 ≤ 3(maxi∈V ai ) m. This problem has the following parameter: G We now study the convergence of the method; to be specific, we illustrate our method over a ring network topology, the agents (nodes) are connected to form a single undirected cycle. The network is timevarying, in the sense that: at each time instant 2k(k ≥ 0), half of the links (edges) are activated randomly, and at each time instant 2k + 1(k ≥ 0), the other half links are activated. The communication matrices are generated according to the Metropolis-based weights [28, Example 10.1]. It is easy to see that the communication matrices are guaranteed to be double stochastic. The problem data ai were generated from a uniform distribution on [0.5, 1.5]; the initial estimates were chosen as x0i = 0 for all i; the step sizes used for √ the RGF method and SG-based method were as follows: γk = 1/ k + 1, ∀k ≥ 0. Note that the simulation results for our method are based on the average of 50 trials. First, we study the effects of the number of agents (i.e., n) and the value of smoothing parameters (i.e., μi ) on the convergence of the RGF method; the performance of each case is evaluated in terms of the expected number of iterations to converge to within a given accuracy. The simulation results are shown in Table I. It can be seen from Table I that the RGF method requires more iterations to reach accuracies of 10−1 and 10−2 , the larger the value of n. It can also be seen from Table I that the RGF method with μi = 10−3 demonstrates almost the same efficiency as the RGF method with μi = 10−8 . Note that we choose m = 1 in this setting. Next, we make a comparison of the RGF method with the SG-based method in the literature on multiagent optimization. We evaluate each method’s performance in terms of the expected number of iterations to converge to within accuracies of 10−2 and 10−3 , respectively. Note that we use n = 10 in this setting. The simulation results are shown in Table II. We can see from Table II that both methods require more iterations to achieve an accuracy of 10−2 or

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

1345

in which the term ∇ f i i (xki ) can be bounded using Lemma 1(b) μ and Jensen’s inequality, particularly



∇ fμi i (xki ) = E gμi (xki ) | Fk  ≤ E gμi (xki ) | Fk 

≤ E gμi (xki )2 | Fk ≤ (m + 4)G 0 ( f i ).

TABLE II T OTAL N UMBER OF I TERATIONS

Then, it follows that:

2γk f i (xki ) − f μi i (x  )

i i ≤ ϑk+1 − x  2 − E xk+1 − x  2 | Fk + γk2 (m + 4)2 G 20 ( f i )

10−3 , the larger the value of m; more iterations are needed for the RGF method to achieve an accuracy of 10−2 or 10−3 , however, this is more or less expected, since the SG-based method makes use of full computation of the SG. V. C ONCLUSION In this brief, we have considered the constrained multiagent optimization problem. Through smoothing the multiagent optimization problem under time-varying network framework, we propose the RGF method to provide approximate solutions and the error level of which is characterized by the smoothing parameters of the objective functions. Finally, a numerical example was given to validate the theoretical results. There are several interesting questions that remain to be explored. For instance, it would be of interest to develop and analyze the asynchronous version of the proposed RGF method; it would be interesting to present an in-depth analysis of the optimal function-dependent choice of the smoothing parameters; it would also be interesting to derive the expected convergence rate of the proposed RGF method and study the dependence of the expected error bounds on the problem parameters (such as the dimension of the decision variable).

i − xki . + 2γk (m + 4)G 0 ( f i )ϑk+1

Summing these inequalities over i ∈ V and reordering the terms, we obtain n 

i i 2γk f (xk ) − f μi i (x  ) ≤

i=1 n 



n

i  i − x  2 − E xk+1 − x  2 | Fk ϑk+1

i=1

i=1

+ γk2 (m + 4)2 + 2γk (m + 4)

n  i=1 n 

G 20 ( f i ) i G 0 ( f i )ϑk+1 − xki .

Recalling the assumptions Assumption 1), we see that

A PPENDIX A

=

i , we have By definition of the update for xk+1

=

matrix

P(k)

(see

n 

[P(k)]i j xk − x  2 j

xki − x  2

i=1

on the other hand, according to the definition of k , we obtain

i ≤ ϑk+1 − γk gμi (xki ) − x  2

n n  n  

i j E ϑk+1 − xki  ≤ [P(k)]i j E[xk − xki ]

i = ϑk+1 − x  2 + γk2 gμi (xki )2  i  −2γk gμi (xki )T ϑk+1 − x

i=1

where the inequality follows from the nonexpansive property of the Euclidean projection operation  X [·], that is,  X [a] −  X [b] ≤ a − b, ∀a, b ∈ Rm . Then, taking the conditional expectation yields

i − x  2 | Fk E xk+1

i − x  2 + γk2 E gμi (xki )2 | Fk ≤ ϑk+1

T  i  −2γk E gμi (xki ) | Fk ϑk+1 − x



i=1 j =1 n  n 

[P(k)]i j k = n k .

i=1 j =1

Taking the total expectation in (8) and then substituting the preceding inequalities into (8), we further obtain 2γk ≤

n 

E f i (xki ) − fμi i (x  ) i=1 n  i=1

(7)

where the last inequality follows from Lemma 1(b) and (c). On the other hand, by Lemma 1(a), we have  i  − x ∇ fμi i (xki )T ϑk+1    i  − xki = ∇ fμi i (xki )T xki − x  + ∇ fμi i (xki )T ϑk+1 i − xki  ≥ f i (xki ) − fμi i (x  ) − ∇ fμi i (xki )ϑk+1

on

i=1 j =1 n  n  j =1 i=1

P ROOF OF T HEOREM 1

i − x  2 + γk2 (m + 4)2 G 20 ( f i ) ≤ ϑk+1  T  i  ϑk+1 − x  −2γk ∇ fμi i xki

made

n n  n  

i j [P(k)]i j xk − x  2 ϑk+1 − x  2 ≤ i=1



i i xk+1 − x  2 =  X ϑk+1 − γk gμi (xki ) − x  2

(8)

i=1

n

i  E xki − x  2 − E xk+1 − x  2 i=1

2 + 2n(m + 4)G 0 γk k . + γk2 n(m + 4)2 G 0 Now, let j ∈ V be an arbitrary but fixed index. From the fact that function f i (x) is Lipschitz continuous with constant G 0 ( f i ), it follows:



j j 0 k E f i (xki ) − f i (xk ) ≥ −G 0 ( f i )E xki − xk  ≥ −G by substituting this relation into the preceding inequality and summing the relation form k = 0 to t − 1, and then dividing both sides

1346

with (2

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

t −1

k=0 γk ) yield

Therefore, the desired result follows from noting limk→∞ B k+1/T −2 = 0 and the following estimate:

j n  k=0 γk E f (xk ) − f (x  ) ≤ √m G 0 μi t −1 γ k=0 k i=1 ⎡ n 

1 ⎣1 + t −1 E x0i − x  2 2 k=0 γk i=1 t −1

lim

k→∞

⎤ t −1 t −1   1 2 0 + n(m + 4)2 G γk2 + n(m + 5)G γk k ⎦ 0 2 k=0

First, we introduce the following auxiliary variable for each i ∈ V and k ≥ 0: n 

j [P(k)]i j xk . j =1

i From the expression for ϕki it follows that xk+1 n j j =1 [P(k)]i j xk , applying recursively to get

j =1 τ =0

j

[P(k : τ + 1)]i j ϕτ +

n 

=

ϕki +

j

[P(k : 0)]i j x0 .

j =1

n n k−1   1 j j ϕk  + |[P(k : τ + 1)]i j − 1/n|ϕτ  n j =1

+

j =1

k−1 

 B k−1−τ γτ

τ =0

P ROOF OF T HEOREM 2  First, it is easy to see that ∞ k=0 γk = ∞. Hence, for the second term on the right-hand side of (4), we have that n 1 E[x0i − x  2 ] = 0. 2 γ k=0 k i=1 On the other  hand, it is easy to show −1 2 −1 γk / tk=0 γk ≤ lim supk→∞ γk =  γ , which lim supt →∞ tk=0 implies that the third term on the right-hand side (4) is bounded as follows:

1 lim t −1 t →∞

t −1  1 1 1 2 2  γk2 ≤ n(m + 4)2 G n(m + 4)2 G lim sup t −1 0 0γ . 2 2 t →∞ γ k k=0 k=0

Using the same reasoning and noting that k ≤ maxi, j ∈V {E[xki −

j x k  + E xk − x k ]}, from Lemma 2, we conclude that t →∞

k=0 γk

0 n(m + 5)G

0 lim sup k ≤ n(m + 5)G

t −1 

γk k

k=0

k→∞

i − x k+1  ≤ ϕki  xk+1

n 

 B γ ≤ 2 . B (1 −  B)

1 lim sup t −1

On the other hand, form the expression for x k and the properties of the weight matrix P(k), it follows x k+1 = n1 ni=1 [ϕki + n j 1 n ϕ i + x , which yields the evolution k j =1 [P(k)]i j xk ] = n i=1 k    j j x k+1 = 1/n nj =1 kτ =0 ϕτ + 1/n nj =1 x0 . Consequently, it follows that:

+

−2 γ ≤ B −2  B lim τ k→∞

A PPENDIX C

P ROOF OF L EMMA 2

i = ϕki + xk+1

k−τ T

k=0

A PPENDIX B

n k−1  

τ =0

B

This completes the proof.

  where we n havei used Lemma 1(a), that is, f μ (x ) ≤ f (x ) + √ 0 mG desired result follows from noting that i=1 μ . The t −1 t −1 j j γ E[ f (x )]/ xk )]. k=0 k k=0 γk ≥ E[ f ( k

i − ϕki = xk+1

k−1 

that

 ≤ 2n(m + 4)(m + 5) 2 +

 n B 2  G 0γ . B 2 (1 −  B) Hence, combining these equations and using Theorem 1, we obtain n 

√ 1 j 2  0 xt ) − f  ≤ m G μi + n(m + 4)2 G lim sup E f ( 0γ 2 t →∞ i=1   n B 2  G +2n(m + 4)(m + 5) × 2 + 2 0γ B (1 −  B) therefore, the desired result follows from some algebra.

j =1 τ =0

ACKNOWLEDGMENT

j

|[P(k : 0)]i j − 1/n|x0 .

The authors would like to thank the anonymous reviewers for their careful reading and constructive comments that improved this brief.

Then, by resorting to [28, Th. 10.2], that is, for any k ≥ s, we have |[P(k : s)]i j − 1/n| ≤ B k−s+1/T −2 . j j j of ϕτ , we see that ϕτ  = to bound ϕτ . By definition  It remains 

 X ϑ j − γτ g j (xτj ) − ϑ j  ≤ γτ g j (xτj ). Combining the μ μ τ +1 τ +1 preceding results and taking the conditional expectation, and using j the relation E[gμ j (xτ ) | Fτ ] ≤ (m + 4)G 0 ( f j ), and then taking the expectation again, we obtain

i − x k+1  E xk+1

k+1 0 γk + n B T −2 max E x i  ≤ 2(m + 4)G i∈V

0 +n(m + 4)G

k−1  τ =0

B

k−τ T

−2 γ . τ

0

R EFERENCES [1] W. Ren and R. W. Beard, “Consensus seeking in multiagent systems under dynamically changing interaction topologies,” IEEE Trans. Autom. Control, vol. 50, no. 5, pp. 655–661, May 2005. [2] R. Olfati-Saber and R. M. Murray, “Consensus problems in networks of agents with switching topology and time-delays,” IEEE Trans. Autom. Control, vol. 49, no. 9, pp. 1520–1533, Sep. 2004. [3] L. Xiao and S. Boyd, “Fast linear iterations for distributed averaging,” Syst. Control Lett., vol. 53, no. 1, pp. 65–78, Sep. 2004. [4] S. Boyd, A. Ghosh, B. Prabhakar, and D. Shah, “Randomized gossip algorithms,” IEEE Trans. Inf. Theory, vol. 52, no. 6, pp. 2508–2530, Jun. 2006. [5] D. Yuan, S. Xu, and H. Zhao, “Distributed primal–dual subgradient method for multiagent optimization via consensus algorithms,” IEEE Trans. Syst., Man, Cybern. B, Cybern., vol. 41, no. 6, pp. 1715–1724, Dec. 2011.

IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. 26, NO. 6, JUNE 2015

[6] M. Zhu and S. Martinez, “On distributed convex optimization under inequality and equality constraints,” IEEE Trans. Autom. Control, vol. 57, no. 1, pp. 151–164, Jan. 2012. [7] J. C. Duchi, A. Agarwal, and M. J. Wainwright, “Dual averaging for distributed optimization: Convergence analysis and network scaling,” IEEE Trans. Autom. Control, vol. 57, no. 3, pp. 592–606, Mar. 2012. [8] J. N. Tsitsiklis, D. P. Bertsekas, and M. Athans, “Distributed asynchronous deterministic and stochastic gradient optimization algorithms,” IEEE Trans. Autom. Control, vol. 31, no. 9, pp. 803–812, Sep. 1986. [9] J. N. Tsitsiklis, “Problems in decentralized decision making and computation,” Ph.D. dissertation, Dept. Electr. Eng. Comput. Sci., Massachusetts Inst. Technol., Cambridge, MA, USA, 1984. [10] D. Yuan, S. Xu, H. Zhao, and Y. Chu, “Distributed average consensus via gossip algorithm with real-valued and quantized data for 0

Randomized gradient-free method for multiagent optimization over time-varying networks.

In this brief, we consider the multiagent optimization over a network where multiple agents try to minimize a sum of nonsmooth but Lipschitz continuou...
390KB Sizes 0 Downloads 6 Views