IEEE Transactions on Automatic Control, Vol.59, No.5, 1131-1146, 2014
Fast Distributed Gradient Methods
We study distributed optimization problems when N nodes minimize the sum of their individual costs subject to a common vector variable. The costs are convex, have Lipschitz continuous gradient (with constant L), and bounded gradient. We propose two fast distributed gradient algorithms based on the centralized Nesterov gradient algorithm and establish their convergence rates in terms of the per-node communications /C and the per-node gradient evaluations k. Our first method, Distributed Nesterov Gradient, achieves rates O (log K/K) and O (log k/k). Our second method, Distributed Nesterov gradient with Consensus iterations, assumes at all nodes knowledge of L and.t(W) - the second largest singular value of the N x N doubly stochastic weight matrix W. It achieves rates O (1/k(2-epsilon)) and O (1/k(2)) (epsilon > 0 arbitrarily small). Further, we give for both methods explicit dependence of the convergence constants on N and W. Simulation examples illustrate our findings.