A Distributed Newton Method for Network Utility Maximization

Most existing work uses dual decomposition and subgradient methods to solve Network Utility Maximization (NUM) problems in a distributed manner, which suffer from slow rate of convergence properties. This work develops an alternative distributed Newt…

Authors: Ermin Wei, Asuman Ozdaglar, Ali Jadbabaie

A Distributed Newton Method for Network Utility Maximization
A Distributed Newton Metho d for Net w ork Utilit y Maximization ∗ Ermin W ei † , Asuman Ozdaglar † , and Ali Jadbabaie ‡ Octob er 29, 2018 Abstract Most existing w ork uses dual decomp osition and first-order methods to solve Net- w ork Utilit y Maximization (NUM) problems in a distributed manner, whic h suffer from slo w rate of con v ergence prop erties. This pap er dev elops an alternative distributed Newton-t yp e fast con verging algorithm for solving NUM problems with self-concordan t utilit y functions. By using nov el matrix splitting tec hniques, b oth primal and dual up- dates for the Newton step can b e computed using iterative schemes in a decentralized manner. W e prop ose a stepsize rule and pro vide a distributed pro cedure to compute it in finitely many iterations. The key feature of our direction and stepsize compu- tation sc hemes is that b oth are implemen ted using the same distributed information exc hange mec hanism emplo yed by first order methods. W e sho w that even when the Newton direction and the stepsize in our metho d are computed within some error (due to finite truncation of the iterative schemes), the resulting ob jective function v alue still con verges sup erlinearly in terms of primal iterations to an explicitly c haracterized error neighborho o d. Sim ulation results demonstrate significant con vergence rate im- pro v emen t of our algorithm relativ e to the existing first-order metho ds based on dual decomp osition. ∗ This w ork w as supp orted b y National Science F oundation under Career gran t DMI-0545910, the DARP A ITMANET program, ONR MURI N000140810747 and AFOSR Complex Netw orks Program. † Departmen t of Electrical Engineering and Computer Science, Massach usetts Institute of T echnology ‡ Departmen t of Electrical and Systems Engineering and GRASP Lab oratory , Univ ersit y of Pennsylv ania 1 In tro duction Most of to day’s comm unication netw orks are large-scale and comprise of agents with hetero- geneous preferences. Lac k of access to cen tralized information in suc h netw orks necessitate design of distributed control algorithms that can op erate based on lo cally a v ailable infor- mation. Some applica tions include routing and congestion control in the In ternet, data collection and pro cessing in sensor net w orks, and cross-la yer design in wireless netw orks. This w ork fo cuses on the rate control problem in wireline net w orks, which can b e formulated in the Network Utility Maximization (NUM) framework proposed in [22] (see also [25], [33], and [11]). NUM problems are characterized by a fixed netw ork and a set of sources, whic h send information o v er the net work along predetermined routes. Each source has a lo cal util- it y function ov er the rate at which it sends information. The goal is to determine the source rates that maximize the sum of utilities sub ject to link capacity constraints. The standard approac h for solving NUM problems relies on using dual decomp osition and subgradient (or first-order) methods, whic h through a price exc hange mec hanism among the sources and the links yields algorithms that can op erate on the basis of lo cal information. 1 One ma jor shortcoming of this approac h is the slo w rate of conv ergence. In this pap er, w e prop ose a no vel Newton-type second-order metho d for solving the NUM problem in a distributed manner, which leads to significan tly faster con vergence. Our approac h inv olves transforming the inequality constrained NUM problem to an equality- constrained one through in tro ducing slack v ariables and logarithmic barrier functions, and using an equality-constrained Newton method for the reform ulated problem. There are t wo c hallenges in implementing this method in a distributed manner. First challenge is the computation of the Newton direction. This computation inv olv es a matrix inv ersion, which is costly and requires global information. W e solv e this problem b y using an iterativ e scheme based on a no vel matrix splitting technique. Since the ob jective function of the (equalit y- constrained) NUM problem is sep ar able , i.e., it is the sum of functions ov er each of the v ariables, this splitting enables computation of the Newton direction using decen tralized algorithms based on limited “scalar” information exc hange b et w een sources and links. This exc hange in volv es destinations iterativ ely sending route prices (aggregated link prices or dual v ariables along a route) to the sources, and sources sending the route price scaled b y the Hessian to the links along its route. Therefore, our algorithm has c omp ar able level of information exchange with the first-order metho ds applied to the NUM problem. The second c hallenge is related to the computation of a stepsize rule that can guaran tee lo cal sup erlinear con vergence of the primal iterations. Instead of the iterativ e bac ktracking rules t ypically used with Newton methods, we prop ose a stepsize rule whic h is in v ersely prop ortional to the inexact Newton decrement (where the inexactness arises due to errors in the computation of the Newton direction) if this decrement is ab o v e a certain threshold and takes the form of a pure Newton step otherwise. Computation of the inexact Newton 1 The price exchange mechanism inv olves destinations (end nodes of a route) sending route prices (ag- gregated o ver the links along the route) to sources, sources up dating their rates based on these prices and finally links up dating prices based on new rates sent ov er the netw ork. 1 decremen t inv olv es aggregating lo cal information from the sources and links in the netw ork. W e propose a no v el distributed pro cedure for computing the inexact Newton decrement in finite n umber of steps using again the same information exchange mec hanism employ ed b y first order metho ds. Since our metho d uses iterative schemes to compute the Newton direction, exact com- putation is not feasible. Another ma jor contribution of our work is to consider a truncated v ersion of this sc heme, allo w error in stepsize computation and present conv ergence rate analysis of the constrained Newton metho d when the stepsize and the Newton direction are estimated with some error. W e sho w that when these errors are sufficien tly small, the v alue of the ob jective function con verges sup erlinearly in terms of primal iterations to a neighbor- ho o d of the optimal ob jective function v alue, whose size is explicitly quan tified as a function of the errors and b ounds on them. Our work con tributes to the gro wing literature on distributed optimization and con trol of multi-agen t net work ed systems. There are t wo standard approac hes for designing dis- tributed algorithms for suc h problems. The first approac h, as men tioned ab o v e, uses dual decomp osition and subgradient methods, whic h for some problems including NUM problems lead to iterativ e distributed algorithms (see [22], [25]). Subsequen t w ork b y A thuraliy a and Lo w in [1] use diagonal scaling to approximate Newton steps to speed up the subgradien t algorithm while main taining their distributed nature. Despite impro v ements in speed o ver the first-order methods, as w e shall see, the p erformance of this mo dified algorithm do es not ac hieve the rate gains obtained b y second-order metho ds. The second approac h in volv es considering c onsensus-b ase d sc hemes, in which agen ts ex- c hange lo cal estimates with their neighbors with the goal of aggregating information ov er an exogenous (fixed or time-v arying) net w ork top ology (see [34], [8], [29], [35], [17], [31], [18] and [32]). It has b een shown that under some mild assumption on the connectivity of the graph and up dating rules, the distance from the v ector formed by current estimates to consensus diminishes linearly . Consensus schemes can b e used to compute the a v erage of lo cal v alues or more generally as a building blo c k for developing distributed optimization algorithms with linear/sublinear rate of con vergence ([28]). The stepsize for the distributed Newton metho d can b e computed using consensus t yp e of algorithms. How ever, the dis- tributed Newton metho d ac hiev es quadratic rate of conv ergence for the primal iterations, using consensus results in prohibitiv ely slow stepsize computation at eac h iteration, and is hence av oided in our method. Other than the papers cited ab o ve, our pap er is also related to [4], [23], [6] and [18]. In [4], Bertsek as and Gafni studied a pro jected Newton metho d for optimization problems with t wice differen tiable ob jective functions and simplex constraints. They prop osed find- ing the Newton direction (exactly or appro ximately) using a conjugate gradient metho d. This work sho wed that when applied to m ulti-commo dity net work flo w problems, the con- jugate gradient iterations can b e obtained using simple graph op erations, ho wev er did not in vestigate distributed implemen tations. Similarly , in [23], Klincewicz prop osed a Newton metho d for netw ork flo w problems that computes the dual v ariables at each step using an iterativ e conjugate gradien t algorithm. He sho wed that conjugate gradient iterations can b e 2 implemen ted using a “distributed” sc heme that inv olv es simple op erations and information exc hange along a spanning tree. Spanning tree based computations in v olv e passing all in- formation to a cen tralized no de and ma y therefore be restrictive for NUM problems whic h are characterized b y decentralized (p oten tially autonomous) sources. In [6], the authors hav e developed a distributed Newton-type metho d for the NUM prob- lem using a b elief propagation algorithm. Belief propagation algorithms, while performing w ell in practice, lac k systematic conv ergence guaran tees. Another recen t pap er [18] studied a Newton metho d for equality-constrained net work optimization problems and presen ted a con vergence analysis under Lipsc hitz assumptions. In this pap er, w e fo cus on an inequality- constrained problem, which is reform ulated as an equality-constrained problem using barrier functions. Therefore, this problem do es not satisfy Lipschitz assumptions. Instead, w e as- sume that the utilit y functions are self-concordan t and present a nov el con v ergence analysis using prop erties of self-concordan t functions. Our analysis for the conv ergence of the algorithm also relates to work on conv ergence rate analysis of inexact Newton metho ds (see [14], [20]). These works fo cus on pro viding conditions on the amount of error at eac h iteration relativ e to the norm of the gradient of the curren t iterate that ensures sup erlinear conv ergence to the exact optimal solution (essentially requiring the error to v anish in the limit). Ev en though these analyses can provide sup erlinear rate of con v ergence, the v anishing error requiremen t can b e to o restrictiv e for practical implemen tations. Another nov el feature of our analysis is the consideration of c onver genc e to an appr oximate neighb orho o d of the optimal solution . In particular, w e allow a fixed error lev el to be main tained at each step of the Newton direction computation and sho w that sup erlinear con vergence is ac hiev ed b y the primal iterates to an error neighborho o d, whose size can b e controlled b y tuning the parameters of the algorithm. Hence, our work also con tributes to the literature on error analysis for inexact Newton metho ds. The rest of the pap er is organized as follo ws: Section 2 defines the problem form ulation and related transformations. Section 3 describ es the exact constrained primal-dual Newton metho d for this problem. Section 4 presen ts a distributed iterativ e sc heme for computing the dual Newton step and the distributed inexact Newton-type algorithm. Section 5 con tains the rate of con vergence analysis for our algorithm. Section 6 presents simulation results to demonstrate con vergence s p eed impro v ement of our algorithm to the existing metho ds with linear conv ergence rates. Section 7 contains our concluding remarks. Basic Notation and Notions: A v ector is viewed as a column v ector, unless clearly stated otherwise. W e write R + to denote the set of nonnegativ e real num b ers, i.e., R + = [0 , ∞ ). W e use subscripts to denote the comp onen ts of a v ector and sup erscripts to index a sequence, i.e., x i is the i th comp onen t of vector x and x k is the k th elemen t of a sequence. When x i ≥ 0 for all components i of a v ector x , w e write x ≥ 0. F or a matrix A , w e write A ij to denote the matrix entry in the i th ro w and j th column, and [ A ] i to denote the i th column of the matrix A , and [ A ] j to denote the j th ro w of the matrix A . W e write I ( n ) to denote the identit y matrix of dimension n × n . W e use x 0 and A 0 to denote the transp ose of a vector x and a matrix A resp ectiv ely . F or a real-v alued function 3 f : X → R , where X is a subset of R n , the gradien t vector and the Hessian matrix of f at x in X are denoted b y ∇ f ( x ) and ∇ 2 f ( x ) respectively . W e use the v ector e to denote the v ector of all ones. A real-v alued conv ex function g : X → R , where X is a subset of R , is self-c onc or dant if it is three times con tinuously differen tiable and | g 000 ( x ) | ≤ 2 g 00 ( x ) 3 2 for all x in its domain. 2 F or real-v alued functions in R n , a con vex function g : X → R , where X is a subset of R n , is self-concordan t if it is self-concordant along every direction in its domain, i.e., if the function ˜ g ( t ) = g ( x + tv ) is self-concordan t in t for all x and v . Op erations that preserv e self-concordance prop erty include summing, scaling by a factor α ≥ 1, and comp osition with affine transformation (see [9] Chapter 9 for more details). 2 Net w ork Utilit y Maximization Problem W e consider a net work represented b y a set L = { 1 , ..., L } of (directed) links of finite nonzero capacit y giv en b y c = [ c l ] l ∈L with c > 0. The netw ork is shared b y a set S = { 1 , ..., S } of sources, each of which transmits information along a predetermined route. F or eac h link l , let S ( l ) denote the set of sources use it. F or eac h source i , let L ( i ) denote the set of links it uses. W e also denote the nonnegative source rate vector by s = [ s i ] i ∈S . The capacit y constrain t at the links can be compactly expressed as Rs ≤ c, where R is the r outing matrix 3 of dimension L × S , i.e., R ij =  1 if link i is on the route of source j , 0 otherwise . (1) W e asso ciate a utility function U i : R + → R with eac h source i , i.e., U i ( s i ) denotes the utilit y of source i as a function of the source rate s i . W e assume the utilit y functions are additive, suc h that the o verall utilit y of the net work is given by P S i =1 U i ( s i ). Thus the Net work Utilit y Maximization(NUM) problem can b e formulated as maximize S X i =1 U i ( s i ) (2) sub ject to Rs ≤ c, s ≥ 0 . W e adopt the follo wing assumption. 2 Self-concordan t functions are defined through the following more general definition: a real-v alued three times con tinuously differen tiable conv ex function g : X → R , where X is a subset of R , is self-c onc or dant , if there exists a constant a > 0, such that | g 000 ( x ) | ≤ 2 a − 1 2 g 00 ( x ) 3 2 for all x in its domain [30], [19]. Here we fo cus on the case a = 1 for notational simplification in the analysis. 3 This is also referred to as the link-sour c e incidenc e matrix in the literature. Without loss of generality , w e assume that each source flo w tra v erses at least one link, eac h link is used by at least one source and the links form a connected graph. 4 Assumption 1. The utility functions U i : R + → R are strictly concav e, monotonically nondecreasing on (0 , ∞ ). The functions − U i : R + → R are self-concordan t on (0 , ∞ ). The self-concordance assumption is satisfied by standard utility functions considered in the literature, for instance logarithmic, i.e., weigh ted prop ortional fair utility functions [33], and concav e quadratic utility functions, and is adopted here to allow a self-concordan t analysis in establishing lo cal quadratic con vergence. W e use h ( x ) to denote the (negativ e of the) ob jectiv e function of problem (2), i.e., h ( x ) = − P S i =1 U i ( x i ), and h ∗ to denote the (negativ e of the) optimal v alue of this problem. 4 Since h ( x ) is contin uous and the feasible set of problem (2) is compact, it follows that problem (2) has an optimal solution, and therefore h ∗ is finite. Moreov er, the interior of the feasible set is nonempty , i.e., there exists a feasible solution x with x i = c S +1 for all i ∈ S with c > 0. 5 T o facilitate the dev elopmen t of a distributed Newton-t yp e metho d, we consider a re- lated equalit y-constrained problem by introducing nonnegativ e slac k v ariables [ y l ] l ∈L for the capacit y constrain ts, defined b y S X j =1 R lj s j + y l = c l for l = 1 , 2 . . . L, (3) and logarithmic barrier functions for the nonnegativity constrain ts (which can b e done since the feasible set of (2) has a nonempt y in terior). 6 W e denote the new decision v ector b y x = ([ s i ] 0 i ∈S , [ y l ] 0 l ∈L ) 0 . This problem can b e written as minimize − S X i =1 U i ( x i ) − µ S + L X i =1 log ( x i ) (4) sub ject to Ax = c, where A is the L × ( S + L )-dimensional matrix giv en by A = [ R I ( L )] , (5) and µ is a nonnegative barrier function co efficien t. W e use f ( x ) to denote the ob jectiv e function of problem (4), i.e., f ( x ) = − S X i =1 U i ( x i ) − µ S + L X i =1 log ( x i ) , (6) and f ∗ to denote the optimal v alue of this problem, which is finite for p ositiv e µ . 7 4 W e consider the negative of the ob jective function v alue to w ork with a minimization problem. 5 One p ossible v alue for c is c = min l { c l } . 6 W e adopt the conv ention that log( x ) = −∞ for x ≤ 0. 7 This problem has a feasible solution, hence f ∗ is upper b ounded. Each of the v ariable x i is upper b ounded b y ¯ c , where ¯ c = max l { c l } , hence b y monotonicit y of utility and logarithm functions, the optimal ob jectiv e function v alue is lo wer bounded. Note that in the optimal solution of problem (4) x i 6 = 0 for all i , due to the logarithmic barrier functions. 5 By Assumption 1, the function f ( x ) is separable, strictly conv ex, and has a p ositive definite diagonal Hessian matrix on the p ositive orthant. The function f ( x ) is also self- concordan t for µ ≥ 1, since b oth summing and scaling by a factor µ ≥ 1 preserve self- concordance prop erty . W e write the optimal solution of problem (4) for a fixed barrier function co efficien t µ as x ( µ ). One can show that as the barrier function co efficien t µ approac hes 0, the optimal solution of problem (4) approac hes that of problem (2), when the constrain t set in (2) has nonempt y interior and is conv ex [3], [15]. Hence b y con tin uit y from Assumption 1, h ( x ( µ )) approac hes h ∗ . Therefore, in the rest of this pap er, unless clearly stated otherwise, w e study iter ative distribute d metho ds for solving problem (4) for a giv en µ . In order to preserve the self-concordance prop erty of the function f , which will be used in our conv ergence analysis, w e first develop a Newton-type algorithm for µ ≥ 1. In Section 5.3, w e show that problem (4) for any µ > 0 can b e tac kled b y solving tw o instances of problem (4) with differen t co efficien ts µ ≥ 1, leading to a solution x ( µ ) that satisfies h ( x ( µ )) − h ∗ h ∗ ≤ a for any p ositive scalar a . 3 Exact Newton Metho d F or eac h fixed µ , problem (4) is feasible and has a conv ex ob jectiv e function, affine con- strain ts, and a finite optimal v alue f ∗ . Therefore, we can use a strong dualit y theorem to sho w that, for problem (4), there is no dualit y gap and there exists a dual optimal solution (see [5]). Moreo ver, since matrix A has full ro w rank, w e can use a (feasible start) equality-constrained Newton method to solve problem (4)(see [9] Chapter 10). 3.1 F easible Initialization W e initialize the algorithm with some feasible and strictly p ositiv e v ector x 0 . F or example, one such initial v ector is giv en b y x 0 i = c S + 1 for i = 1 , 2 . . . S, (7) x 0 l + S = c l − S X j =1 R lj c S + 1 for l = 1 , 2 . . . L, where c l is the finite capacit y for link l , c is the minim um (nonzero) link capacit y , S is the total num b er of sources in the netw ork, and R is routing matrix [cf. Eq. (1)]. 3.2 Iterativ e Up date Rule Giv en an initial feasible vector x 0 , the algorithm generates the iterates by x k +1 = x k + d k ∆ x k , (8) 6 where d k is a positive stepsize, ∆ x k is the (primal) Newton direction giv en as the solution of the following system of linear equations: 8 .  ∇ 2 f ( x k ) A 0 A 0   ∆ x k w k  = −  ∇ f ( x k ) 0  . (9) W e will refer to x k as the primal ve ctor and w k as the dual ve ctor (and their components as primal and dual v ariables respectively). W e also refer to w k as the pric e ve ctor since the dual v ariables [ w k l ] l ∈L asso ciated with the link capacit y constraints can be viewed as prices for using links. F or notational conv enience, w e will use H k = ∇ 2 f ( x k ) to denote the Hessian matrix in the rest of the pap er. Solving for ∆ x k and w k in the preceding system yields ∆ x k = − H − 1 k ( ∇ f ( x k ) + A 0 w k ) , (10) ( AH − 1 k A 0 ) w k = − AH − 1 k ∇ f ( x k ) . (11) This system has a unique solution for all k . T o see this, note that the matrix H k is a diagonal matrix with en tries ( H k ) ii = ( − ∂ 2 U i ( x k i ) ∂ x 2 i + µ ( x k i ) 2 1 ≤ i ≤ S, µ ( x k i ) 2 S + 1 ≤ i ≤ S + L. (12) By Assumption 1, the functions U i are strictly conca v e, whic h implies ∂ 2 U i ( x k i ) ∂ x 2 i ≤ 0. Moreov er, the primal vector x k is b ounded (since the metho d maintains feasibilit y) and, as w e shall see in Section 4.4, can b e guaran teed to remain strictly positive by prop er choice of stepsize. Therefore, the en tries ( H k ) ii > 0 and are well-defined for all i , implying that the Hessian matrix H k is inv ertible. Due to the structure of A [cf. Eq. (5)], the column span of A is the en tire space R L , and hence the matrix AH − 1 k A 0 is also inv ertible. 9 This sho ws that the preceding system of linear equations can b e solv ed uniquely for all k . The ob jective function f is separable in x i , therefore given the v ector w k l for l in L ( i ), the Newton direction ∆ x k i can b e computed b y eac h source i using lo cal information a v ailable to that source. Ho wev er, the computation of the vector w k at a given primal solution x k cannot be implemen ted in a decentralized manner since the ev aluation of the matrix in verse ( AH − 1 k A 0 ) − 1 requires global information. The follo wing section provides a distributed inexact Newton metho d, based on computing the vector w k using a decentralized iterativ e sc heme. 8 This is a primal-dual metho d with the vectors ∆ x k and w k acting as primal direction and dual v ariables resp ectiv ely 9 If for some x ∈ R L , w e ha v e AH − 1 k A 0 x = 0, then x 0 AH − 1 k A 0 x =       H − 1 2 k A 0 x       2 = 0, whic h implies || A 0 x || 2 = 0, b ecause the matrix H is inv ertible. The rows of the matrix A 0 span R L , therefore we hav e x = 0. This shows that the matrix AH − 1 k A 0 is inv ertible. 7 4 Distributed Inexact Newton Metho d In this section, we introduce a distributed Newton metho d using ideas from matrix splitting in order to compute the dual v ector w k at eac h k using an iterative scheme. Before pro ceeding to present the details of the algorithm, we first in tro duce some preliminaries on matrix splitting. 4.1 Preliminaries on Matrix Splitting Matrix splitting can be used to solve a system of linear equations giv en b y Gy = a, where G is an n × n matrix and a is an n -dimensional vector. Supp ose that the matrix G can b e expressed as the sum of an in v ertible matrix M and a matrix N , i.e., G = M + N . (13) Let y 0 b e an arbitrary n -dimensional vector. A sequence { y k } can b e generated by the follo wing iteration: y k +1 = − M − 1 N y k + M − 1 a. (14) It can b e seen that the sequence { y k } con verges as k → ∞ if and only if the sp ectral radius of the matrix M − 1 N is strictly b ounded abov e b y 1. When the sequence { y k } con verges, its limit y ∗ solv es the original linear system, i.e., Gy ∗ = a (see [2] and [13] for more details). Hence, the k ey to solving the linear equation via matrix splitting is the b ound on the sp ectral radius of the matrix M − 1 N . Such a b ound can b e obtained using the follo wing result (see Theorem 2.5.3 from [13]). Theorem 4.1. Let G b e a real symmetric matrix. Let M and N b e matrices such that G = M + N and assume that M is inv ertible and b oth matrices M + N and M − N are p ositive definite. Then the sp ectral radius of M − 1 N , denoted b y ρ ( M − 1 N ), satisfies ρ ( M − 1 N ) < 1. By the ab ov e theorem, if G is a real, symmetric, p ositiv e definite matrix and M is a nonsingular matrix, then one sufficien t condition for the iteration (14) to con v erge is that the matrix M − N is p ositiv e definite. This can b e guaran teed using Gershgorin Circle Theorem, which w e introduce next (see [37] for more details). Theorem 4.2. (Gershgorin Circle Theorem) Let G b e an n × n matrix, and define r i ( G ) = P j 6 = i | G ij | . Then, each eigen v alue of G lies in one of the Gershgorin sets { Γ i } , with Γ i defined as disks in the complex plane, i.e., Γ i = { z ∈ C | | z − G ii | ≤ r i ( G ) } . 8 One corollary of the abov e theorem is that if a matrix G is strictly diagonally dominan t, i.e., | G ii | > P j 6 = i | G ij | , and G ii > 0 for all i , then the real parts of all the eigenv alues lie in the p ositiv e half of the real line, and thus the matrix is p ositiv e definite. Hence a sufficient condition for the matrix M − N to b e positive definite is that M − N is strictly diagonally dominan t with strictly p ositive diagonal en tries. 4.2 Distributed Computation of the Dual V ector W e use the matrix splitting sc heme in tro duced in the preceding section to compute the dual v ector w k in Eq. (11) in a distributed manner. Let D k b e a diagonal matrix, with diagonal en tries ( D k ) ll = ( AH − 1 k A 0 ) ll , (15) and matrix B k b e given b y B k = AH − 1 k A 0 − D k . (16) Let matrix ¯ B k b e a diagonal matrix, with diagonal entries ( ¯ B k ) ii = L X j =1 ( B k ) ij . (17) By splitting the matrix AH − 1 k A 0 as the sum of D k + ¯ B k and B k − ¯ B k , we obtain the following result. Theorem 4.3. F or a given k > 0, let D k , B k , ¯ B k b e the matrices defined in Eqs. (15), (16) and (17). Let w (0) b e an arbitrary initial vector and consider the sequence { w ( t ) } generated b y the iteration w ( t + 1) = ( D k + ¯ B k ) − 1 ( ¯ B k − B k ) w ( t ) + ( D k + ¯ B k ) − 1 ( − AH − 1 k ∇ f ( x k )) , (18) for all t ≥ 0. Then the sp ectral radius of the matrix ( D k + ¯ B k ) − 1 ( B k − ¯ B k ) is strictly b ounded ab o v e b y 1 and the sequence { w ( t ) } con v erges as t → ∞ , and its limit is the solution to Eq. (11). Pr o of. W e split the matrix AH − 1 k A 0 as ( AH − 1 k A 0 ) = ( D k + ¯ B k ) + ( B k − ¯ B k ) (19) and use the iterativ e sc heme presen ted in Eqs. (13) and (14) to solv e Eq. (11). F or all k , b oth the real matrix H k and its in verse, H − 1 k , are p ositiv e definite and diagonal. The matrix A has full ro w rank and is elemen t-wise nonnegativ e. Therefore the pro duct AH − 1 k A 0 is real, symmetric, element-wise nonnegativ e and positive definite. W e let Q k = ( D k + ¯ B k ) − ( B k − ¯ B k ) = D k + 2 ¯ B k − B k (20) 9 denote the difference matrix. By definition of ¯ B k [cf. Eq. (17)], the matrix 2 ¯ B k − B k is diagonally dominant, with nonnegative diagonal entries. Moreo ver, due to strict p ositivity of the second deriv ativ es of the logarithmic barrier functions, w e ha ve ( D k ) ii > 0 for all i . Therefore the matrix Q k is strictly diagonally dominant. By Theorem 4.2, suc h matrices are p ositive definite. Therefore, by Theorem 4.1, the sp ectral radius of the matrix ( D k + ¯ B k ) − 1 ( B k − ¯ B k ) is strictly b ounded ab ov e by 1. Hence the splitting scheme (19) guarantees the sequence { w ( t ) } generated by iteration (18) to conv erge to the solution of Eq. (11). This provides an iterative sc heme to compute the dual vector w k at each primal iteration k using an iterative sc heme. W e will refer to the iterativ e sc heme defined in Eq. (18) as the dual iter ation . There are man y w ays to split the matrix AH − 1 k A 0 . The particular one in Eq. (19) is c hosen here due to three desirable features. First it guaran tees that the difference matrix Q k [cf. Eq. (20)] is strictly diagonally dominant, and hence ensures con v ergence of the sequence { w ( t ) } . Second, with this splitting scheme, the matrix D k + ¯ B k is diagonal, which eliminates the need for global information when calculating its in verse. The third feature enables us to study con vergence rate of iteration (18) in terms of a dual (routing) graph whic h we in tro duce next. Definition 1. Consider a net w ork G = {L , S } , represen ted b y a set L = { 1 , ..., L } of (directed) links, and a set S = { 1 , ..., S } of sources. The links form a strongly connected graph, and eac h source sends information along a predetermined route. The weighte d dual (r outing) gr aph ˜ G = { ˜ N , ˜ L} , where ˜ N is the set of nodes, and ˜ L is the set of (directed) links defined by: A. ˜ N = L ; B. A link is present b etw een node L i to L j in ˜ G if and only if there is some common flo w b et w een L i and L j in G . C. The weigh t ˜ W ij on the link from no de L i to L j is given b y ˜ W ij = ( D k + ¯ B k ) − 1 ii ( B k ) ij = ( D k + ¯ B k ) − 1 ii ( AH − 1 k A 0 ) ij = ( D k + ¯ B k ) − 1 ii X s ∈ S ( i ) ∩ S ( j ) H − 1 ss , where the matrices D k , B k , and ¯ B k are defined in Eqs. (15), (16) and (17). One example of a netw ork and its dual graph are presented in Figures 1 and 2. Note that the unw eigh ted indegree and outdegree of a node are the same in the dual graph, ho w ev er the w eigh ts are different depending on the direction of the links. The splitting scheme in Eq. (19) inv olv es the matrix ( D k + ¯ B k ) − 1 ( ¯ B k − B k ), whic h is the weigh ted Laplacian matrix of the dual graph. 10 The weigh ted out-degree of no de i in the dual graph, i.e., the diagonal 10 W e adopt the following definition for the weigh ted Laplacian matrix of a graph. Consider a weigh ted directed graph G with w eight W ij asso ciated with the link from node i to j . W e let W ij = 0 whenever the link is not presen t. These weigh ts form a weighte d adjac ency matrix W . The weighte d out-de gr e e matrix D is defined as a diagonal matrix with D ii = P j W ij and the weighte d L aplacian matrix L is defined as L = D − W . See [7], [12] for more details on graph Laplacian matrices. 10 S 1 S 2 D 1 D 2 L 1 : x 1 L 2 : x 2 L 3 : x 1 , x 2 L 4 : x 1 L 5 : x 2 Figure 1: A sample net work. Each source-destination pair is displa yed with the same color. W e use x i to denote the flo w corresp onding to the i th source-destination pair and L i to denote the i th link. L 1 L 2 L 3 L 4 L 5 x 1 x 2 x 1 x 2 x 1 x 2 x 1 x 2 x 1 x 2 x 1 x 2 Figure 2: Dual graph for the net work in Figure 1, each link in this graph corresp onds to the flo ws shared b etw een the links in the original netw ork. en try ( D k + ¯ B k ) − 1 ii ¯ B ii of the Laplacian matrix, can be view ed as a measure of the c ongestion level of a link in the original netw ork since the neigh b ors in the dual graph represent links that share flo ws in the original net work. W e show in Section 5.1 that the spectral properties of the Laplacian matrix of the dual graph dictate the con vergence sp eed of dual iteration (18). W e next rewrite iteration (18), analyze the information exc hange required to implement it and develop a distributed computation pro cedure to calculate the dual vector. F or notational con venience, w e define the pric e of the r oute for source i , π i ( t ), as the sum of the dual v ariables asso ciated with links used by source i at the t th dual iteration, i.e., π i ( t ) = P l ∈ L ( i ) w l ( t ). Similarly , we define the weighte d pric e of the r oute for source i , Π i ( t ), as the price of the route for source i weigh ted b y the i th diagonal element of the in verse Hessian matrix, i.e., Π i ( t ) = ( H − 1 k ) ii P l ∈ L ( i ) w l ( t ). 11 Lemma 4.4. F or eac h primal iteration k , the dual iteration (18) can b e written as w l ( t + 1) = 1 ( H k ) − 1 ( S + l )( S + l ) + P i ∈ S ( l ) Π i (0)  X i ∈ S ( l ) Π i (0) − X i ∈ S ( l ) ( H k ) − 1 ii  w l ( t ) − X i ∈ S ( l ) Π i ( t ) (21) + X i ∈ S ( l ) ( H k ) − 1 ii w l ( t ) − X i ∈ S ( l ) ( H − 1 k ) ii ∇ i f ( x k ) − ( H − 1 k ) ( S + l )( S + l ) ∇ S + l f ( x k )  , where Π i (0) is the w eighted price of the route for source i when w (0) = [1 , 1 . . . , 1] 0 . Pr o of. Recall the definition of matrix A , i.e., A li = 1 for i = 1 , 2 . . . S if source i uses link l , i.e., i ∈ S ( l ), and A li = 0 otherwise. Therefore, w e can write the price of the route for source i as, π i ( t ) = P L l =1 A li w ( t ) l = [ A 0 ] i w ( t ). Similarly , since the Hessian matrix H k is diagonal, the weigh ted price can be written as Π i ( t ) = ( H k ) − 1 ii [ A 0 ] i w ( t ) = [ H − 1 k A 0 ] i w ( t ) . (22) On the other hand, since A = [ R I ( L )], where R is the routing matrix, w e ha ve ( AH − 1 k A 0 w ( t )) l = S X i =1 ([ A ] i [ H − 1 k A 0 ] i w ( t )) l + ( H − 1 k ) ( S + l )( S + l ) w l ( t ) = S X i =1 A li ([ H − 1 k A 0 ] i w ( t )) + ( H − 1 k ) ( S + l )( S + l ) w l ( t ) . Using the definition of the matrix A one more time, this implies ( AH − 1 k A 0 w ( t )) l = X i ∈ S ( l ) [ H − 1 k A 0 ] i w ( t ) + ( H − 1 k ) ( S + l )( S + l ) w l ( t ) (23) = X i ∈ S ( l ) Π i ( t ) + ( H − 1 k ) ( S + l )( S + l ) w l ( t ) , where the last equalit y follows from Eq. (22). Using Eq. (16), the ab ov e relation implies that (( B k + D k ) w ( t )) l = P i ∈ S ( l ) Π i ( t ) + ( H − 1 k ) ( S + l )( S + l ) w l ( t ). W e next rewrite ( ¯ B k ) ll . Using the fact that w (0) = [1 , 1 . . . , 1] 0 , w e ha ve ( AH − 1 k A 0 w (0)) l = (( B k + D k ) w (0)) l = L X j =1 ( B k ) lj + ( D k ) ll . Using the definition of ¯ B k [cf. Eq. (17)], this implies ( ¯ B k ) ll = L X j =1 ( B k ) lj = ( AH − 1 k A 0 w (0)) l − ( D k ) ll = X i ∈ S ( l ) Π i (0) + ( H − 1 k ) ( S + l )( S + l ) − ( D k ) ll . 12 This calculation can further b e simplified using ( D k ) ll = ( AH − 1 k A 0 ) ll = X i ∈ S ( l ) ( H k ) − 1 ii + ( H k ) − 1 ( S + l )( S + l ) , (24) [cf. Eq. (15)], yielding ( ¯ B k ) ll = X i ∈ S ( l ) Π i (0) − X i ∈ S ( l ) ( H k ) − 1 ii . (25) F ollowing the same argument, the v alue ( B k w ( t )) l for all t can b e written as ( B k w ( t )) l = ( AH − 1 k A 0 w ( t )) l − ( D k w ( t )) l = S X i =1 Π i ( t ) + ( H − 1 k ) ( S + l )( S + l ) w l ( t ) − ( D k ) ll w l ( t ) = S X i =1 Π i ( t ) − X i ∈ S ( l ) ( H k ) − 1 ii w l ( t ) , where the first equalit y follows from Eq. (17), the second equality follo ws from Eq. (23), and the last equality follo ws from Eq. (24). Finally , w e can write ( AH − 1 k ∇ f ( x k )) l as ( AH − 1 k ∇ f ( x k )) l = X i ∈ S ( l ) ( H − 1 k ) ii ∇ i f ( x k ) + ( H − 1 k ) ( S + l )( S + l ) ∇ S + l f ( x k ) . Substituting the preceding in to (18), w e obtain the desired iteration (21). W e next analyze the information exc hange required to implemen t iteration (21) among sources and links in the netw ork. W e first observ e the lo cal information a v ailable to sources and links. Eac h source i kno ws the i th diagonal en try of the Hessian ( H k ) ii and the i th comp onen t of the gradient ∇ i f ( x k ). Similarly , eac h link l kno ws the ( S + l )th diagonal en try of the Hessian ( H k ) S + l ,S + l and the ( S + l )th comp onent of the gradien t ∇ S + l f ( x k ). In addition to the lo cally a v ailable information, eac h link l , when executing iteration (21), needs to compute the terms: X i ∈ S ( l ) ( H k ) − 1 ii , X i ∈ S ( l ) ( H − 1 k ) ii ∇ i f ( x k ) , X i ∈ S ( l ) Π i (0) , X i ∈ S ( l ) Π i ( t ) . The first t w o terms can be computed b y link l if eac h source sends its lo cal information to the links along its route “once” in primal iteration k . The third term can b e computed by link l again once for every k if the route price π i (0) (aggregated along the links of a route when link prices are all equal to 1) are sent b y the destination to source i , whic h then ev aluates and sends the weigh ted price Π i (0) to the links along its route. The fourth term can b e computed with a similar feedbac k mechanism, how ev er the computation of this term needs to b e repeated for ev ery dual iteration t . 13 S i S j Figure 3: Direction of information flow for the steps 1.a, 1.c and 2.b, from sources to the links they use. S i S j Figure 4: Direction of flo w for the steps 1.b and 2.a, from links to the sources using them. The preceding information exchange suggests the follo wing distributed implemen tation of (21) (at eac h primal iteration k ) among the sources and the links, where eac h source or link is view ed as a processor, information av ailable at source i can b e passed to the links it tra verses, i.e., l ∈ L ( i ), and information ab out the links along a route can b e aggregated and sen t bac k to the corresp onding source using a feedback mec hanism: 1. Initialization. 1.a Eac h source i sends its lo cal information ( H k ) ii and ∇ i f ( x k ) to the links along its route, l ∈ L ( i ). Each link l computes ( H k ) − 1 ( S + l )( S + l ) , P i ∈ S ( l ) ( H k ) − 1 ii , ( H − 1 k ) ( S + l )( S + l ) ∇ S + l f ( x k ) and P i ∈ S ( l ) ( H − 1 k ) ii ∇ i f ( x k ). 1.b Eac h link l starts with price w l (0) = 1. The link prices w l (0) are aggregated along route i to compute π (0) = P l ∈ L ( i ) w l (0) at the destination. This information is sen t bac k to source i . 1.c Eac h source computes the w eighted price Π i (0) = ( H − 1 k ) ii P l ∈ L ( i ) w l (0) and sends it to the links along its route, l ∈ L ( i ). 1.d Eac h link l then initializes with arbitrary price w l (1). 2. Dual Iteration. 2.a The link prices w l ( t ) are updated using (21) and aggregated along route i to compute π ( t ) at the destination. This information is sen t back to source i . 2.b Eac h source computes the weigh ted price Π i ( t ) and sends it to the links along its route, l ∈ L ( i ). The direction of information flo w can b e seen in Figures 3 and 4. Note that the sources need to send their Hessian and gradient information once per primal iteration since these v alues do not c hange in the dual iterations. Moreov er, this algorithm has comparable level of information exc hange with the subgradient based algorithms applied to the NUM problem (2) (see [1], [21], [25], [27] for more details). In b oth t yp es of algorithms, only the sum of prices of links along a route is fed back to the source, and the links up date 14 prices based on scalar information sent from sources using that link. The computation here is slightly more in volv ed since it requires scaling by Hessian matrix en tries, how ever all op erations are scalar-based, hence do es not imp ose degradation on the p erformance of the algorithm. 4.3 Distributed Computation of the Primal Newton Direction Once the dual v ariables are computed, the primal Newton direction can b e obtained accord- ing to Eq. (10) as (∆ x k ) i = − ( H k ) − 1 ii ( ∇ i f ( x k ) + ( A 0 w k ) i ) = − ( H k ) − 1 ii ∇ i f ( x k ) + Π i , (26) where Π i is the weigh ted price of the route for source i computed at termination of the dual iteration. Hence, the primal Newton direction can b e computed using lo cal information b y eac h source. Ho wev er, b ecause the dual v ariable computation inv olv es an iterative scheme, the exact v alue for w k is not av ailable. Therefore, the direction ∆ x k computed using Eq. (26) ma y violate the equalit y constrain ts in problems (4). T o main tain feasibilit y of the generated primal v ectors, the calculation of the inexact Newton dir e ction at a primal v ector x k , which w e denote b y ∆ ˜ x k , is separated in to tw o stages. In the first stage, the first S comp onents of ∆ ˜ x k , denoted by ∆ ˜ s k , is computed via Eq. (26) using the dual v ariables obtained via the iterative sc heme, i.e., ∆ ˜ s k i = − ( H k ) − 1 ii ( ∇ i f ( x k ) + [ R 0 ] i w k ) . (27) In the second stage, the last L comp onen ts of ∆ ˜ x k (corresp onding to the slac k v ariables) are computed to ensure that the condition A ∆ ˜ x k = 0 is satisfied, i.e. ∆ ˜ x k =  ∆ ˜ s k − R ∆ ˜ s k  . (28) This calculation in volv es eac h link computing the slac k in tro duced b y the first S comp onen ts of ∆ ˜ x k . The algorithm presen ted generates the primal vectors as follows: Let x 0 b e an initial strictly p ositiv e feasible primal v ector (see Eq. (7) for one p ossible choice). F or an y k ≥ 0, w e ha ve x k +1 = x k + d k ∆ ˜ x k , (29) where d k is a p ositive stepsize and ∆ ˜ x k is the inexact Newton direction at primal vector x k (obtained through an iterativ e dual v ariable computation scheme and a tw o-stage primal direction computation that maintains feasibility). W e will refer to this algorithm as the (distribute d) inexact Newton metho d . 15 4.4 Stepsize Rule W e next describ e a stepsize rule that can b e computed in a distributed manner while achiev- ing lo cal sup erlinear conv ergence rate (to an error neigh b orho o d) for the primal iterations. This rule will further guarantee that the primal v ectors x k generated b y the algorithm re- main strictly p ositive for all k , hence ensuring that the Hessian matrix is well-defined at all iterates (see Eq. (12) and Theorem 4.6). Our stepsize rule will be based on an inexact version of the Newton decremen t. At a giv en primal vector x k (with Hessian matrix H k ), w e define the exact Newton dir e ction , denoted b y ∆ x k , as the exact solution of the system of equations (9). The exact Newton de cr ement λ ( x k ) is defined as λ ( x k ) = p (∆ x k ) 0 H k ∆ x k . (30) Similarly , the inexact Newton de cr ement ˜ λ ( x k ) is given b y ˜ λ ( x k ) = p (∆ ˜ x k ) 0 H k ∆ ˜ x k , (31) where ∆ ˜ x k is the inexact Newton direction at primal vector x k . Note that b oth λ ( x k ) and ˜ λ ( x k ) are nonnegativ e and well-defined due to the fact that the matrix ∇ 2 f ( x k ) is p ositiv e definite. Our stepsize rule in volv es the inexact Newton decrement ˜ λ ( x k ), we use θ k to denote the appro ximate v alue of ˜ λ ( x k ) obtained through some distributed computation procedure. One p ossible suc h pro cedure with finite termination yielding θ k = ˜ λ ( x k ) exactly is describ ed in App endix A. Ho wev er, other estimates θ k can b e used, which can p otentially be obtained b y exploiting the diagonal structure of the Hessian matrix, writing the inexact Newton decremen t as ˜ λ ( x k ) = s X i ∈L S S (∆ ˜ x k ) 2 i ( H k ) ii = p ( L + S ) ¯ y , where ¯ y = 1 S + L P i ∈S S L (∆ ˜ x k ) 2 i ( H k ) ii and using consensus t yp e of algorithms. Giv en the scalar θ k , an approximation to the inexact Newton decremen t ˜ λ ( x k ), at each iteration k , we c ho ose the stepsize d k as follo ws: Let V be some p ositive scalar with 0 < V < 0 . 267. W e hav e d k =  b θ k +1 if θ k ≥ V for all previous k , 1 otherwise , (32) where b ∈ (0 , 1). The upper b ound on V will b e used in analysis of the quadratic con vergence phase of our algorithm [cf. Assumption 4]. This b ound will also ensure the strict p ositivity of the generated primal v ectors [cf. Theorem 4.6]. There can b e t w o sources of error in the execution of the algorithm. The first is in the computation of the inexact Newton direction, whic h arises due to iterativ e computation of the dual v ector w k and the mo dification w e use to main tain feasibilit y . Second source of error is in the stepsize rule, whic h is a function of θ k , an appro ximation to the inexact Newton decremen t ˜ λ ( x k ). W e next state t wo assumptions that quantify the b ounds on these errors. 16 Assumption 2. Let { x k } denote the sequence of primal vectors generated by the distributed inexact Newton metho d. Let ∆ x k and ∆ ˜ x k denote the exact and inexact Newton directions at x k , and γ k denote the error in the Newton direction computation, i.e., ∆ x k = ∆ ˜ x k + γ k . (33) F or all k , γ k satisfies | ( γ k ) 0 ∇ 2 f ( x k ) γ k | ≤ p 2 (∆ ˜ x k ) 0 ∇ 2 f ( x k )∆ ˜ x k + . (34) for some p ositiv e scalars p < 1 and  . This assumption imposes a b ound on the w eigh ted norm of the Newton direction error γ k as a function of the weigh ted norm of ∆ ˜ x k and a constan t  . Note that without the constan t  , w e w ould require this error to v anish when x k is close to the optimal solution, i.e. , when ∆ ˜ x k is small, whic h is impractical for implementation purp oses. Giv en p and  , one can devise distributed schem es for determining the n umber of dual iterations needed so that the resulting error γ k satisfies this Assumption (see App endix B). W e bound the error in the inexact Newton decremen t calculation as follo ws. Assumption 3. Let τ k denote the error in the Newton decrement calculation, i.e., τ k = ˜ λ ( x k ) − θ k . (35) F or all k , τ k satisfies | τ k | ≤  1 b − 1  (1 + V ) . This assumption will b e used in establishing the strict p ositivit y of the generated primal v ectors x k . Giv en b and V , using conv ergence rate results for av erage consensus schemes (see [32],[28]), one can pro vide a lo wer b ound on the n um b er of a verage consensus steps needed so that the error τ k satisfies this assumption. W hen the method presen ted in App endix A is used to compute θ k , then w e ha ve τ k = 0 for all k and the preceding assumption is satisfied clearly . Throughout the rest of the pap er, w e assume the conditions in Assumptions 1-3 hold. W e next sho w that the stepsize c hoice in (32) will guarantee strict p ositivit y of the primal v ector x k generated b y our algorithm. This is important since it ensures that the Hessian H k and therefore the (inexact) Newton direction is w ell-defined at eac h iteration. W e pro ceed b y first establishing a b ound on the error in the stepsize calculation. Lemma 4.5. Let θ k b e an appro ximation of the inexact Newton decrement ˜ λ ( x k ) defined in (31). F or θ k ≥ V , we ha ve (2 b − 1) / ( ˜ λ ( x k ) + 1) ≤ b θ k + 1 ≤ 1 / ( ˜ λ ( x k ) + 1) , (36) where b ∈ (0 , 1) is the constan t used in stepsize choice (32). 17 Pr o of. By Assumption 3 and the fact θ k ≥ V , we ha ve | ˜ λ ( x k ) − θ k | ≤  1 b − 1  (1 + V ) ≤  1 b − 1  (1 + θ k ) . (37) By multiplying both sides b y the p ositiv e scalar b , the ab ov e relation implies bθ k − b ˜ λ ( x k ) ≤ (1 − b )(1 + θ k ) , whic h yields (2 b − 1) θ k + (2 b − 1) ≤ b ˜ λ ( x k ) + b. By dividing b oth sides of the ab o v e relation by the positive scalar ( θ k + 1)( ˜ λ ( x k ) + 1), w e obtain the first inequalit y in Eq. (36). Similarly , using Eq. (37) we can establish b ˜ λ ( x k ) − bθ k ≤ (1 − b )(1 + θ k ) , whic h can b e rewritten as b ˜ λ ( x k ) + b ≤ θ k + 1 . After dividing b oth sides of the preceding relation by the p ositive scalar ( θ k + 1)( ˜ λ ( x k ) + 1), w e obtain the second inequality in Eq. (36). With this bound on the stepsize error, we can show that starting with a strictly p ositive feasible solution, the primal v ectors x k generated b y our algorithm remain positive for all k . Theorem 4.6. Given a strictly positive feasible primal v ector x 0 , let { x k } b e the sequence generated by the inexact distributed Newton metho d (29). Assume that the stepsize d k is selected according to Eq. (32) and the constant b satisfies V +1 2 V +1 < b < 1. Then, the primal v ector x k is strictly p ositiv e for all k . Pr o of. W e will prov e this claim b y induction. The base case of x 0 > 0 holds by the as- sumption of the theorem. Since the U i are strictly conca ve [cf. Assumption 1], for any x k , w e ha v e − ∂ 2 U i ∂ x 2 i ( x k i ) ≥ 0. Given the form of the Hessian matrix [cf. Eq. (12)], this implies ( H k ) ii ≥ µ ( x k i ) 2 for all i , and therefore ˜ λ ( x k ) = S + L X i =1 (∆ ˜ x k i ) 2 ( H k ) ii ! 1 2 ≥ S + L X i =1 µ  ∆ ˜ x k i x k i  2 ! 1 2 ≥ max i     √ µ ∆ ˜ x k i x k i     , where the last inequality follo ws from the nonnegativit y of the terms µ  ∆ ˜ x k i x k i  2 . By taking the recipro cal on b oth sides, the abov e relation implies 1 ˜ λ ( x k ) ≤ 1 max i    √ µ ∆ ˜ x k i x k i    = 1 √ µ min i     x k i ∆ ˜ x k i     ≤ min i     x k i ∆ ˜ x k i     , (38) where the last inequalit y follows from the fact that µ ≥ 1. W e sho w the inductiv e step b y considering t wo cases. 18 • Case i: θ k ≥ V By Lemma 4.5, the stepsize d k satisfies d k ≤ 1 / (1 + ˜ λ ( x k )) < 1 / ˜ λ ( x k ) . Using Eq. (38), this implies d k < min i    x k i ∆ ˜ x k i    . Hence if x k > 0, then x k +1 = x k + d k ∆ ˜ x k > 0. • Case ii: θ k < V By Assumption 3, we hav e ˜ λ ( x k ) < V +  1 b − 1  (1 + V ). Using the fact that b > V +1 2 V +1 , w e obtain ˜ λ ( x k ) < V +  1 b − 1  (1 + V ) < V +  2 V + 1 V + 1 − 1  (1 + V ) = 2 V ≤ 1 , where the last inequality follows from the fact that V < 0 . 267. Hence w e hav e d k = 1 < 1 ˜ λ ( x k ) ≤ min i | x k i ∆ ˜ x k i | , where the last inequalit y follo ws from Eq. (38). Once again, if x k > 0, then x k +1 = x k + d k ∆ ˜ x k > 0. In b oth cases w e ha ve x k +1 = x k + d k ∆ ˜ x k > 0, whic h completes the induction pro of. In the rest of the pap er, we will assume that the constant b used in the definition of the stepsize satisfies V +1 2 V +1 < b < 1. 5 Con v ergence A nalysis W e next presen t our conv ergence analysis for b oth primal and dual iterations. W e first establish conv ergence for dual iterations. 5.1 Con v ergence in Dual Iterations W e c haracterize the rate of con v ergence of the dual iteration (18). W e will use the follo wing lemma [36]. Lemma 5.1. Let M b e an n × n matrix, and assume that its sp ectral radius, denoted b y ρ ( M ), satisfies ρ ( M ) < 1. Let { λ i } i =1 ,...,n denote the set of eigen v alues of M , with 1 > | λ 1 | ≥ | λ 2 | ≥ . . . ≥ | λ n | and let v i denote the set of corresp onding unit length right eigen vectors. Assume the matrix has n linearly indep endent eigen vectors. 11 Then for the sequence w ( t ) generated b y the follo wing iteration w ( t + 1) = M w ( t ) , (39) 11 An alternativ e assumption is that the algebraic m ultiplicity of each λ i is equal to its corresp onding geometric multiplicit y , since eigenv ectors asso ciated with differen t eigenv alues are indep endent [24]. 19 w e ha ve || w ( t ) − w ∗ || 2 ≤ | λ 1 | t α, (40) for some p ositiv e scalar α , where w ∗ is the limit of iteration (39) as t → ∞ . W e use M to denote the L × L matrix, M = ( D k + ¯ B k ) − 1 ( ¯ B k − B k ), and z to denote the vector z = ( D k + ¯ B k ) − 1 ( − AH − 1 k ∇ f ( x k )). W e can rewrite iteration (18) as w ( t + 1) = M w ( t ) + z , which implies w ( t + q ) = M q w ( t ) + q − 1 X i =0 M i z = M q w ( t ) + ( I − M q )( I − M ) − 1 z . This alternativ e represen tation is possible since ρ ( M ) < 1, whic h follo ws from Theorem 4.3. After rearranging the terms, w e obtain w ( t + q ) = M q ( w ( t ) − ( I − M ) − 1 z ) + ( I − M ) − 1 z . Therefore starting from some arbitrary initial v ector w (0), the con v ergence sp eed of the sequence w ( t ) coincides with the sequence u ( t ), generated b y u ( t + q ) = M q u (0), where u (0) = w (0) − M ( I − M ) − 1 z . W e next show that the matrix M has L linearly indep endent eigen v ectors in order to apply the preceding lemma. W e first note that since the nonnegativ e matrix A has full ro w rank and the Hessian matrix H has p ositiv e diagonal elemen ts, the pro duct matrix AH − 1 k A 0 has p ositiv e diagonal elements and nonnegative entries. This sho ws that the matrix D k [cf. Eq. (15)] has p ositive diagonal elements and the matrix ¯ B [cf. Eq. (17)] has nonnegative en tries. Therefore the matrix ( D k + ¯ B k ) − 1 2 is diagonal and nonsingular. Hence, using the relation ˜ M = ( D k + ¯ B k ) 1 2 M ( D k + ¯ B k ) − 1 2 , we see that the matrix M = ( D k + ¯ B k ) − 1 ( ¯ B k − B k ) is similar to the matrix ˜ M = ( D k + ¯ B k ) − 1 2 ( ¯ B k − B k )( D k + ¯ B k ) − 1 2 . F rom the definition of B k [cf. Eq. (16)] and the symmetry of the matrix AH − 1 k A 0 , w e conclude that the matrix B is symmetric. This sho ws that the matrix ˜ M is symmetric and hence diagonalizable, whic h implies that the matrix M is also diagonalizable, and therefore it has L linearly indep enden t eigen vectors. 12 W e can use Lemma 5.1 to infer that || w ( t ) − w ∗ || 2 = || u ( t ) − u ∗ || 2 ≤ | λ 1 | t α, where λ 1 is the eigen v alue of M with largest magnitude, and α is a constant that depends on the initial vector u (0) = w (0) − ( I − M ) − 1 z . Hence λ 1 determines the sp eed of conv ergence of the dual iteration. W e next analyze the relationship b et ween λ 1 and the dual graph topology . First note that the matrix M = ( D k + ¯ B k ) − 1 ( ¯ B k − B k ) is the weigh ted Laplacian matrix of the dual graph [cf. 12 If a square matrix A of size n × n is symmetric, then A has n linearly independent eigen v ectors. If a square matrix B of size n × n is similar to a symmetric matrix, then B has n linearly independent eigenv ectors [16]. 20 Section 4.2], and is therefore p ositive semidefinite [12]. W e then ha ve ρ ( M ) = | λ 1 | = λ 1 ≥ 0. F rom graph theory [26], Theorem 4.3 and the ab ov e analysis, w e hav e 4mc( M ) L ≤ λ 1 ≤ min  2 max l ∈ L  ( D k + ¯ B k ) − 1 ¯ B k  ll , 1  , (41) where mc( M ) is the weigh ted maxim um cut of the dual graph, i.e., mc( M ) = max S ⊂ ˜ N ( X i ∈ S,j 6∈ S ˜ W ij + X i ∈ S,j 6∈ S ˜ W j i ) , where ˜ W ij is the weigh t asso ciated with the link from no de i to j . The abov e relation suggests that a large maximal cut of the dual graph pro vides a large low er b ound on λ 1 , implying the dual iteration cannot finish with very few iterates. When the maximum weigh ted out-degree, i.e., max l ∈ L  ( D k + ¯ B k ) − 1 ¯ B k  ll , in the dual graph is small, the ab ov e relation provides a small upp er b ound on λ 1 and hence suggesting that the dual iteration con verges fast. W e finally illustrate the relationship betw een the dual graph topology and the underlying net work properties b y means of t w o simple examples that highligh t ho w differen t netw ork structures can affect the dual graph and hence the conv ergence rate of the dual iteration. In particular, w e sho w that the dual iteration conv erges slo wer for a net work with a more congested link. Consider t wo net works given in Figures 5 and 7, whose corresp onding dual graphs are presented in Figures 6 and 8 resp ectively . Both of these net works ha ve 3 source- destination pairs and 7 links. How ever, in Figure 5 all three flo ws use the same link, i.e., L 4 , whereas in Figure 7 at most t wo flo ws share the same link. This difference in the netw ork top ology results in different degree distributions in the dual graphs as sho wn in Figures 6 and 8. T o b e more concrete, let U i ( s i ) = 15 log( s i ) for all sources i in b oth graphs and link capacity c l = 35 for all links l . W e apply our distributed Newton algorithm to b oth problems, for the primal iteration when all the source rates are 10, the largest w eighted out-degree in the dual graphs of the tw o examples are 0 . 46 for Figure 6 and 0 . 095 for Figure 8, whic h implies the upp er b ounds for λ 1 of the corresp onding dual iterations are 0 . 92 and 0 . 19 respectively [cf. Eq. (41)]. The weigh ted maxim um cut for Figure 6 is obtained b y isolating the no de corresp onding to L 4 , with w eighted maximum cut v alue of 0.52. The maxim um cut for Figure 8 is formed by isolating the set { L 4 , L 6 } , with weigh ted maximum cut v alue of 0 . 17. Based on (41) these graph cuts generate lo w er bounds for λ 1 of 0 . 30 and 0 . 096 resp ectively . By com bining the upp er and lo wer bounds, w e obtain in terv als for λ 1 as [0 . 30 , 0 . 92] and [0 . 096 , 0 . 19] resp ectively . Recall that a large spectral radius corresp onds to slo w conv ergence in the dual iteration [cf. Eq. (40)], therefore these bounds guarantee that the dual iteration for the netw ork in Figure 7, whic h is less congested, conv erges faster than for the one in Figure 5. Numerical results suggest the actual largest eigenv alues are 0 . 47 and 0 . 12 resp ectively , which confirm with the prediction. 21 S 1 S 3 D 1 D 3 S 2 D 2 L 1 : x 1 L 2 : x 2 L 3 : x 3 L 4 : x 1 , x 2 , x 3 L 5 : x 1 L 6 : x 2 L 7 : x 3 Figure 5: Each source-destination pair is display ed with the same color. W e use x i to denote the flo w corresp onding to the i th source-destination pair and L i to denote the i th link. All 3 flo ws tra verse link L 4 . L 1 L 5 L 4 L 2 L 6 L 3 L 7 x 1 x 1 x 1 x 1 x 1 x 1 x 2 x 2 x 2 x 2 x 2 x 2 x 3 x 3 x 3 x 3 x 3 x 3 Figure 6: Dual graph for the net work in Figure 5, each link in this graph corresp onds to the flo ws shared betw een the links in the original netw ork. The no de corresponding to link L 4 has high unw eighted out-degree equal to 6. 22 S 1 S 3 D 1 D 3 S 2 D 2 L 1 : x 1 L 2 : x 3 L 3 : x 2 L 4 : x 1 , x 3 L 5 : x 1 L 6 : x 2 , x 3 L 7 : x 2 Figure 7: Each source-destination pair is display ed with the same color. W e use x i to denote the flo w corresp onding to the i th source-destination pair and L i to denote the i th link. Each link has at most 2 flo ws trav ersing it. L 1 L 2 L 4 L 5 L 6 L 3 L 7 x 1 x 3 x 2 x 1 x 3 x 2 x 1 x 3 x 2 x 1 x 3 x 2 x 1 x 3 x 2 x 1 x 3 x 2 Figure 8: Dual graph for the net work in Figure 8, each link in this graph corresp onds to the flo ws shared betw een the links in the original netw ork. Both nodes corresp onding to links L 4 and L 6 has relatively high out-degree equal to 4. 23 5.2 Con v ergence in Primal Iterations W e next presen t our conv ergence analysis for the primal sequence { x k } generated by the inexact Newton metho d (29). F or the k th iteration, we define the function ˜ f k : R → R as ˜ f k ( t ) = f ( x k + t ∆ ˜ x k ) , (42) whic h is self-concordan t, b ecause the ob jective function f is self-concordan t. Note that the v alue ˜ f k (0) and ˜ f k ( d k ) are the ob jectiv e function v alues at x k and x k +1 resp ectiv ely . Therefore ˜ f k ( d k ) − ˜ f k (0) measures the decrease in the ob jective function v alue at the k th iteration. W e will refer to the function ˜ f k as the obje ctive function along the Newton dir e ction . Before pro ceeding further, we first in tro duce some prop erties of self-concordant functions and the Newton decremen t, which will b e used in our conv ergence analysis. 13 5.2.1 Preliminaries Using the definition of a self-concordan t function, w e ha ve the following result (see [9] for the pro of ). Lemma 5.2. Let ˜ f : R → R b e a self-concordant function. Then for all t ≥ 0 in the domain of the function ˜ f with t ˜ f 00 (0) 1 2 < 1, the follo wing inequalit y holds: ˜ f ( t ) ≤ ˜ f (0) + t ˜ f 0 (0) − t ˜ f 00 (0) 1 2 − log (1 − t ˜ f 00 (0) 1 2 ) . (43) W e will use the preceding lemma to pro ve a k ey relation in analyzing con v ergence prop- erties of our algorithm [see Lemma 5.8]. The next lemma will b e used to relate the weigh ted norms of a v ector z , with w eights ∇ 2 f ( x ) and ∇ 2 f ( y ) for some x and y . This lemma pla ys an essen tial role in establishing prop erties for the Newton decremen t (see [19], [30] for more details). Lemma 5.3. Let f : R n → R b e a self-concordant function. Suppose v ectors x and y are in the domain of f and ˜ λ = (( x − y ) 0 ∇ 2 f ( x )( x − y )) 1 2 < 1, then for an y z ∈ R n , the following inequalit y holds: (1 − ˜ λ ) 2 z 0 ∇ 2 f ( x ) z ≤ z 0 ∇ 2 f ( y ) z ≤ 1 (1 − ˜ λ ) 2 z 0 ∇ 2 f ( x ) z . (44) The next tw o lemmas establish prop erties of the Newton decrement generated b y the equalit y-constrained Newton metho d. The first lemma extends results in [19] and [30] to allo w inexactness in the Newton direction and reflects the effect of the error in the current step on the Newton decremen t in the next step. 14 13 W e use the same notation in these lemmas as in (4)-(6) since these relations will b e used in the conv er- gence analysis of the inexact Newton metho d applied to problem (4). 14 W e use the same notation in the subsequen t lemmas as in problem form ulation (4) despite the fact that the results hold for general optimization problems with self-concordant ob jective functions and linear equalit y c onstrain ts. 24 Lemma 5.4. Let f : R n → R b e a self-concordan t function. Consider solving the equality constrained optimization problem minimize f ( x ) (45) sub ject to Ax = c, using an (exact) Newton metho d with feasible initialization, where the matrix A is in R L × ( L + S ) and has full column rank, i.e., rank( A ) = L . Let ∆ x b e the exact Newton di- rection at x , i.e., ∆ x solv es the following system of linear equations,  ∇ 2 f ( x ) A 0 A 0   ∆ x w  = −  ∇ f ( x ) 0  . (46) Let ∆ ˜ x denote an y direction with γ = ∆ x − ∆ ˜ x , and x ( t ) = x + t ∆ ˜ x for t ∈ [0 , 1]. Let z b e the exact Newton direction at x + ∆ ˜ x . If ˜ λ = p ∆ ˜ x 0 ∇ 2 f ( x )∆ ˜ x < 1, then we ha ve z ∇ 2 f ( x + ∆ ˜ x ) 0 z ≤ ˜ λ 2 1 − ˜ λ p z 0 ∇ 2 f ( x ) z + | γ 0 ∇ 2 f ( x ) 0 z | . Pr o of. W e first transform problem (45) into an unconstrained one via elimination technique, establish equiv alence in the Newton decremen ts and the Newton primal directions b etw een the tw o problems following the lines in [9], then derive the results for the unconstrained problem and lastly w e map the result back to the original constrained problem. Since the matrix A has full column rank, i.e., rank( A ) = L , in order to eliminate the equalit y constrain ts, w e let matrix K ∈ R ( S + L ) × S b e any matrix whose range is null space of A, with rank( K ) = S , vector ˆ x ∈ R S + L b e a feasible solution for problem (45), i.e., A ˆ x = c . Then we ha ve the parametrization of the affine feasible set as { x | Ax = c } = { K y + ˆ x | y ∈ R S } . The eliminated equiv alent optimization problem b ecomes minimize y ∈ R S F ( y ) = f ( K y + ˆ x ) . (47) W e next sho w the Newton primal direction for the constrained problem (45) and un- constrained problem (47) are isomorphic, where a feasible solution x for problem (45) is mapp ed to y in problem (47) with K y + ˆ x = x . W e start by showing that eac h ∆ y in the unconstrained problem corresp onds uniquely to the Newton direction in the constrained problem. F or the unconstrained problem, the gradien t and Hessian are giv en by ∇ F ( y ) = K 0 ∇ f ( K y + ˆ x ) , ∇ 2 F ( y ) = K 0 ∇ 2 f ( K y + ˆ x ) K . (48) Note that the ob jective function f is three times con tinuously differen tiable, which implies its Hessian matrix ∇ 2 f ( K y + ˆ x ) is symmetric, and therefore w e hav e ∇ 2 F ( y ) is symmetric, i.e., ∇ 2 F ( y ) 0 = ∇ 2 F ( y ). 25 The Newton direction for problem (47) is giv en by ∆ y = −  ∇ 2 F ( y )  − 1 ∇ F ( y ) = − ( K 0 ∇ 2 f ( x ) K ) − 1 K 0 ∇ f ( x ) . 15 (49) W e c ho ose w = − ( AA 0 ) − 1 A ( ∇ f ( x ) + ∇ 2 f ( x )∆ x ) , (50) and show that (∆ x, w ) where ∆ x = K ∆ y (51) is the unique solution pair for the linear system (46) for the constrained problem (45). T o establish the first equation, i.e., ∇ 2 f ( x )∆ x + A 0 w = −∇ f ( x ), w e use the property that  K 0 A  u =  K 0 u Au  = 0 for some u ∈ R S + L implies u = 0. 16 W e ha ve  K 0 A   ∇ 2 f ( x )∆ x + A 0 w + ∇ f ( x )  =  K 0 ∇ 2 f ( x ) K ( − ( K 0 ∇ 2 f ( x ) K ) − 1 K 0 ∇ f ( x )) + K 0 A 0 w + K 0 ∇ f ( x ) A ∇ 2 f ( x )∆ x − A ( ∇ f ( x ) + ∇ 2 f ( x )∆ x ) + A ∇ f ( x )  =  0 0  , where the first equality follo ws from definition of ∆ x , ∆ y and w [cf. Eqs. (51), (49) and (50)] and the second equality follows the fact that K 0 A 0 w = 0 for an y w . 17 Therefore we conclude that the first equation in (46) holds. Since the range of matrix K is the null space of matrix A , we ha ve AK y = 0 for all y , therefore the second equation in (46) holds, i.e., A ∆ x = 0. F or the conv erse, giv en a Newton direction ∆ x defined as solution to the system (46) for the constrained problem (45), we can uniquely reco ver a vector ∆ y , suc h that K ∆ y = ∆ x . This is b ecause A ∆ x = 0 from (46), and hence ∆ x is in the n ull space of the matrix A , i.e., column space of the matrix K . The matrix K has full rank, thus there exists a unique ∆ y . Therefore the (primal) Newton directions for problems (47) and (45) are isomorphic under the mapping K . In what follows, w e perform our analysis for the unconstrained problem (47) and then use isomorphic transformations to sho w the result hold for the equality constrained problem (45). 15 The matrix K ∇ 2 f ( x ) K is in vertible. If for some y ∈ R S , w e ha ve K ∇ 2 f ( x ) K 0 y = 0, then y 0 K ∇ 2 f ( x ) K 0 y =       ( ∇ 2 f ( x )) 1 2 K 0 y       2 = 0, which implies || K 0 x || 2 = 0, b ecause the matrix ∇ 2 f ( x ) is strictly p ositiv e for all x . The rows of the matrix K 0 span R S , therefore we ha ve y = 0. This shows that the matrix K ∇ 2 f ( x ) K 0 is inv ertible. 16 If K 0 u = 0, then the vector u is orthogonal to the row space of the matrix K 0 , and hence column space of the matrix K , i.e., null space of the matrix A . If Au = 0, then u is in the null space of the matrix A . Hence the vector u b elongs to the set nul( A ) ∩ ( nul ( A )) ⊥ , which implies u = 0. 17 Let K 0 A 0 w = u , then we ha ve || u || 2 2 = u 0 K 0 A 0 w = w 0 AK u . Since the range of matrix K is the n ull space of matrix A , we hav e AK u = 0 for all u , hence || u || 2 2 = 0, suggesting u = 0. 26 Consider the unconstrained problem (45), let ∆ y denote the exact Newton direction at y [cf. Eq. (48)], v ector ∆ ˜ y denote an y direction in R S , y ( t ) = y + t ∆ ˜ y and ˜ λ = p ∆ ˜ y 0 ∇ 2 F ( y )∆ ˜ y . Note that with the isomorphism established earlier, w e hav e ˜ λ = p ∆ ˜ y 0 ∇ 2 F ( y )∆ ˜ y = p ∆ ˜ y 0 K 0 ∇ 2 f ( K y + ˆ x ) K ∆ ˜ y = p ∆ ˜ x 0 ∇ 2 f ( x )∆ ˜ x , where x = K y + ˆ x and ∆ ˜ x = K ∆ ˜ y . F rom the assumption in the theorem, we hav e ˜ λ < 1. F or any t < 1, ( y − y ( t )) 0 ∇ 2 F ( y )( y − y ( t )) = t 2 ˜ λ 2 < 1 and b y Lemma 5.3 for any z y in R S , we ha ve (1 − t ˜ λ ) 2 z 0 y ∇ 2 F ( y ) z y ≤ z 0 y ∇ 2 F ( y ( t )) z y ≤ 1 (1 − t ˜ λ ) 2 z 0 y ∇ 2 F ( y ) z y whic h implies z 0 y ( ∇ 2 F ( y ( t )) − ∇ 2 F ( y )) z y ≤  1 (1 − t ˜ λ ) 2 − 1  z 0 y ∇ 2 F ( y ) z y , (52) and z 0 y ( ∇ 2 F ( y ) − ∇ 2 F ( y ( t ))) z y ≤  1 − (1 − t ˜ λ ) 2  z 0 y ∇ 2 F ( y ) z y . Using the fact that 1 − (1 − t ˜ λ ) 2 ≤ 1 (1 − t ˜ λ ) 2 − 1, the preceding relation can be rewritten as z 0 y ( ∇ 2 F ( y ) − ∇ 2 F ( y ( t ))) z y ≤  1 (1 − t ˜ λ ) 2 − 1  z 0 y ∇ 2 F ( y ) z y . (53) Com bining relations (52) and (53) yields   z 0 y ( ∇ 2 F ( y ) − ∇ 2 F ( y ( t ))) z y   ≤  1 (1 − t ˜ λ ) 2 − 1  z 0 y ∇ 2 F ( y ) z y . (54) Since the function F is con vex, the Hessian matrix ∇ 2 F ( y ) is p ositiv e semidefinite. W e can therefore apply the generalized Cauc h y-Sc h warz inequalit y and obtain   (∆ ˜ y ) 0 ( ∇ 2 F ( y ( t )) − ∇ 2 F ( y )) z y   (55) ≤ p (∆ ˜ y ) 0 ( ∇ 2 F ( y ( t )) − ∇ 2 F ( y ))∆ ˜ y 0 q z 0 y ( ∇ 2 F ( y ( t )) − ∇ 2 F ( y )) z y ≤  1 (1 − t ˜ λ ) 2 − 1  p (∆ ˜ y ) 0 ∇ 2 F ( y )∆ ˜ y q z 0 y ∇ 2 F ( y ) z y =  1 (1 − t ˜ λ ) 2 − 1  ˜ λ q z 0 y ∇ 2 F ( y ) z y , where the second inequality follo ws from relation (54), and the equality follo ws from defini- tion of ˜ λ . Define the function κ : R → R , as κ ( t ) = ∇ F ( y ( t )) 0 z y + (1 − t )(∆ ˜ y ) 0 ∇ 2 F ( y ) 0 z y , then     d dt κ ( t )     =   (∆ ˜ y ) 0 ∇ 2 F ( y ( t )) 0 z y − (∆ ˜ y ) 0 ∇ 2 F ( y ) z y   =   (∆ ˜ y ) 0 ( ∇ 2 F ( y ( t )) − ∇ 2 F ( y )) z y   , 27 whic h is the left hand side of (55). Define γ y = ∆ y − ∆ ˜ y , whic h b y the isomorphism, implies γ = ∆ x − ∆ ˜ x = K γ y . By rewriting ∆ ˜ y = ∆ y − γ y and observing the exact Newton direction ∆ y satisfies ∆ y = −∇ 2 F ( y ) − 1 ∇ F ( y ) [cf. Eq. (48)] and hence b y symmetry of the matrix ∇ 2 F ( y ), w e ha ve ∆ y 0 ∇ 2 F ( y ) = ∆ y 0 ∇ 2 F ( y ) 0 = −∇ F ( y ) 0 , we obtain κ (0) = ∇ F ( y ) 0 z y + (∆ ˜ y ) 0 ∇ 2 F ( y ) 0 z y = ∇ F ( y ) 0 z y − ∇ F ( y ) 0 z y − γ 0 y ∇ 2 F ( y ) z y = − γ 0 y ∇ 2 F ( y ) z y . Hence by in tegration, we obtain the b ound | κ ( t ) | ≤ ˜ λ q z 0 y ∇ 2 F ( y ) z y Z t 0  1 (1 − s ˜ λ ) 2 − 1  ds + | γ 0 y ∇ 2 F ( y ) z y | = ˜ λ 2 t 2 1 − ˜ λt q z 0 y ∇ 2 F ( y ) z y + | γ 0 y ∇ 2 F ( y ) z y | . F or t = 1, y ( t ) = y + ∆ ˜ y , ab o v e equation implies | κ (1) | = |∇ F ( y + ∆ ˜ y ) 0 z y | ≤ ˜ λ 2 1 − ˜ λ q z 0 y ∇ 2 F ( y ) z y + | γ 0 y ∇ 2 F ( y ) z y | . W e now specify z y to b e the exact Newton direction at y + ∆ ˜ y , then z y satisfies z 0 y ∇ 2 F ( y + ∆ ˜ y ) z y = |∇ F ( y + ∆ ˜ y ) 0 z y | , b y using the definition of Newton direction at y + ∆ ˜ y [cf. Eq. (49)], which pro ves z y ∇ 2 F ( y + ∆ ˜ y ) z y ≤ ˜ λ 2 1 − ˜ λ q z 0 y ∇ 2 F ( y ) z y + | γ 0 y ∇ 2 F ( y ) 0 z y | . W e no w use the isomorphism once more to transform the ab o v e relation to the equalit y constrained problem domain. W e ha v e z = K z y , the exact Newton direction at x + ∆ ˜ x = ˆ x + K y + K ∆ ˜ y . The left hand side b ecomes z 0 y ∇ 2 F ( y + ∆ ˜ y ) z y = z 0 y K 0 ∇ 2 f ( x + ∆ ˜ x ) K z y = z 0 ∇ 2 f ( x + ∆ ˜ x ) z . Similarly , w e hav e the righ t hand sand satisfies ˜ λ 2 1 − ˜ λ q z 0 y ∇ 2 F ( y ) z y + | γ 0 y ∇ 2 F ( y ) 0 z y | = ˜ λ 2 1 − ˜ λ q z 0 y K 0 ∇ 2 f ( x ) K z y + | γ 0 y K 0 ∇ 2 f ( x ) K z y | = ˜ λ 2 1 − ˜ λ p z 0 ∇ 2 f ( x ) z + | γ 0 ∇ 2 f ( x ) 0 z | . By combining the ab o v e t wo relations, w e hav e established the desired relation. One possible matrix K in the ab o ve pro of for problem (4) is giv en b y K =  I ( S ) − R  , whose corresp onding unconstrained domain consists of the source rate v ariables. In the 28 unconstrained domain, the source rates are updated and then the matrix K adjusts the slac k v ariables accordingly to maintain the feasibilit y , which coincides with our inexact distributed algorithm in the primal domain. The ab ov e lemma will b e used to guarantee quadratic rate of con v ergence for the distributed inexact Newton metho d (29)]. The next lemma pla ys a cen tral role in relating the sub optimality gap in the ob jectiv e function v alue to the exact Newton decrement (see [9] for more details). Lemma 5.5. Let F : R n → R b e a self-concordan t function. Consider solving the uncon- strained optimization problem minimize x ∈ R n F ( x ) , (56) using an (unconstrained) Newton method. Let ∆ x b e the exact Newton direction at x , i.e., ∆ x = −∇ 2 F ( x ) − 1 ∇ F ( x ). Let λ ( x ) b e the exact Newton decremen t, i.e., λ ( x ) = p (∆ x ) 0 ∇ 2 F ( x )∆ x . Let F ∗ denote the optimal v alue of problem (56). If λ ( x ) ≤ 0 . 68, then we ha ve F ∗ ≥ F ( x ) − λ ( x ) 2 . (57) Using the same elimination technique and isomorphism established for Lemma 5.4, the next result follows immediately . Lemma 5.6. Let f : R n → R b e a self-concordan t function. Consider solving the equality constrained optimization problem minimize f ( x ) (58) sub ject to Ax = c, using a constrained Newton metho d with feasible initialization. Let ∆ x b e the exact (primal) Newton direction at x , i.e., ∆ x solves the system  ∇ 2 f ( x ) A 0 A 0   ∆ x w  = −  ∇ f ( x ) 0  . Let λ ( x ) b e the exact Newton decremen t, i.e., λ ( x ) = p (∆ x ) 0 ∇ 2 f ( x )∆ x . Let f ∗ denote the optimal v alue of problem (58). If λ ( x ) ≤ 0 . 68, then w e hav e f ∗ ≥ f ( x ) − λ ( x ) 2 . (59) Note that the relation on the sub optimality gap in the preceding lemma holds when the exact Newton decremen t is sufficien tly small (pro vided by the numerical bound 0.68, see [9]). W e will use these lemmas in the subsequen t sections for the conv ergence rate analysis of the distributed inexact Newton metho d applied to problem (4). Our analysis comprises of t wo parts: The first part is the damp e d c onver gent phase , in whic h we pro vide a lo w er b ound on the impro v emen t in the ob jectiv e function v alue at each step b y a constant. The second part is the quadr atic al ly c onver gent phase , in whic h the sub optimality in the ob jective function v alue diminishes quadratically to an error lev el. 29 5.2.2 Basic Relations W e first in tro duce some k ey relations, whic h pro vides a bound on the error in the Newton direction computation. This will b e used for b oth phases of the conv ergence analysis. Lemma 5.7. Let { x k } b e the primal sequence generated by the inexact Newton metho d (29). Let ˜ λ ( x k ) b e the inexact Newton decrement at x k [cf. Eq. (31)]. F or all k , we ha ve | ( γ k ) 0 ∇ 2 f ( x k )∆ ˜ x k | ≤ p ˜ λ ( x k ) 2 + ˜ λ ( x k ) √ , where γ k , p , and  are nonnegativ e scalars defined in Assumption 2. Pr o of. By Assumption 1, the Hessian matrix ∇ 2 f ( x k ) is p ositive definite for all x k . W e therefore can apply the generalized Cauc h y-Sc h warz inequalit y and obtain | ( γ k ) 0 ∇ 2 f ( x k )∆ ˜ x k | ≤ p (( γ k ) 0 ∇ 2 f ( x k ) γ k )((∆ ˜ x k ) 0 ∇ 2 f ( x k )∆ ˜ x k ) (60) ≤ q ( p 2 ˜ λ ( x k ) 2 +  ) ˜ λ ( x k ) 2 ≤ q ( p 2 ˜ λ ( x k ) 2 +  + 2 p ˜ λ ( x k ) √  ) ˜ λ ( x k ) 2 , where the second inequalit y follo ws from Assumption 2 and definition of ˜ λ ( x k ), and the third inequalit y follows by adding the nonnegativ e term 2 p √  ˜ λ ( x k ) 3 to the righ t hand side. By the nonnegativity of the inexact Newton decrement ˜ λ ( x k ), it can b e seen that relation (60) implies | ( γ k ) 0 ∇ 2 f ( x k )∆ ˜ x k | ≤ ˜ λ ( x k )( p ˜ λ ( x k ) + √  ) = p ˜ λ ( x k ) 2 + ˜ λ ( x k ) √ , whic h pro ves the desired relation. Using the preceding lemma, the follo wing basic relation can b e established, whic h will b e used to measure the impro vemen t in the ob jective function v alue. Lemma 5.8. Let { x k } b e the primal sequence generated by the inexact Newton metho d (29). Let ˜ f k b e the ob jectiv e function along the Newton direction and ˜ λ ( x k ) b e the inexact Newton decremen t [cf. Eqs. (42) and (31)] at x k resp ectiv ely . F or all k with 0 ≤ t < 1 / ˜ λ ( x k ), w e ha ve ˜ f k ( t ) ≤ ˜ f k (0) − t (1 − p ) ˜ λ ( x k ) 2 − (1 − √  ) t ˜ λ ( x k ) − log (1 − t ˜ λ ( x k )) , (61) where p , and  are the nonnegative scalars defined in Assumption 2. Pr o of. Recall that ∆ x k is the exact Newton direction, whic h solv es t he system (9). Therefore for some w k , the following equation is satisfied, ∇ 2 f ( x k )∆ x k + A 0 w k = −∇ f ( x k ) . By left multiplying the ab ov e relation by (∆ ˜ x k ) 0 , we obtain (∆ ˜ x k ) 0 ∇ 2 f ( x k )∆ x k + (∆ ˜ x k ) 0 A 0 w k = − (∆ ˜ x k ) 0 ∇ f ( x k ) . 30 Using the facts that ∆ x k = ∆ ˜ x k + γ k from Assumption 2 and A ∆ ˜ x k = 0 by the design of our algorithm, the abov e relation yields (∆ ˜ x k ) 0 ∇ 2 f ( x )∆ ˜ x k + (∆ ˜ x k ) 0 ∇ 2 f ( x k ) γ k = − (∆ ˜ x k ) 0 ∇ f ( x k ) . By Lemma 5.7, w e can bound (∆ ˜ x k ) 0 ∇ 2 f ( x k ) γ k b y , p ˜ λ ( x k ) 2 + ˜ λ ( x k ) √  ≥ (∆ ˜ x k ) 0 ∇ 2 f ( x k ) γ k ≥ − p ˜ λ ( x k ) 2 − ˜ λ ( x k ) √ . Using the definition of ˜ λ ( x k ) [cf. Eq. (31)] and the preceding tw o relations, we obtain the follo wing bounds on (∆ ˜ x k ) 0 ∇ f ( x k ): − (1 + p ) ˜ λ ( x k ) 2 − ˜ λ ( x k ) √  ≤ (∆ ˜ x k ) 0 ∇ f ( x k ) ≤ − (1 − p ) ˜ λ ( x k ) 2 + ˜ λ ( x k ) √ . By differentiating the function ˜ f k ( t ), and using the preceding relation, this yields, ˜ f 0 k (0) = ∇ f ( x k ) 0 ∆ ˜ x k (62) ≤ − (1 − p ) ˜ λ ( x k ) 2 + ˜ λ ( x k ) √ . Moreo ver, w e ha ve ˜ f 00 k (0) = (∆ ˜ x k ) 0 ∇ 2 f ( x k )∆ ˜ x k (63) = ˜ λ ( x k ) 2 . The function ˜ f k ( t ) is self-concordan t for all k , therefore b y Lemma 5.2, for 0 ≤ t < 1 / ˜ λ ( x k ), the following relations hold: ˜ f k ( t ) ≤ ˜ f k (0) + t ˜ f 0 k (0) − t ˜ f 00 k (0) 1 2 − log (1 − tf 00 k (0) 1 2 ) ≤ ˜ f k (0) − t (1 − p ) ˜ λ ( x k ) 2 + t ˜ λ ( x k ) √  − t ˜ λ ( x k ) − log (1 − t ˜ λ ( x k )) = ˜ f k (0) − t (1 − p ) ˜ λ ( x k ) 2 − (1 − √  ) t ˜ λ ( x k ) − log (1 − t ˜ λ ( x k )) , where the second inequalit y follows b y Eqs. (62) and (63). This prov es Eq. (61). The preceding lemma sho ws that a careful choice of the stepsize t can guarantee a constan t lo wer b ound on the improv emen t in the ob jective function v alue at eac h iteration. W e present the conv ergence prop erties of our algorithm in the follo wing t wo sections. 5.2.3 Damp ed Con v ergen t Phase In this section, w e consider the case when θ k ≥ V and stepsize d k = b θ k +1 [cf. Eq. (32)]. W e will pro vide a constant lo wer b ound on the improv emen t in the ob jective function v alue in this case. T o this end, we first establish the impro v emen t b ound for the exact stepsize choice of t = 1 / ( ˜ λ ( x k ) + 1). 31 Theorem 5.9. Let { x k } be the primal sequence generated b y the inexact Newton metho d (29). Let ˜ f k b e the ob jectiv e function along the Newton direction and ˜ λ ( x k ) b e the inexact Newton decremen t at x k [cf. Eqs. (42) and (31)]. Consider the scalars p and  defined in Assumption 2 and assume that 0 < p < 1 2 and 0 <  <  (0 . 5 − p )(2 V b − V + b − 1) b  2 , where b is the constan t used in the stepsize rule [cf. Eq. (32)]. F or θ k ≥ V and t = 1 /  ˜ λ ( x k ) + 1  , there exists a scalar α > 0 such that ˜ f k ( t ) − ˜ f k (0) ≤ − α (1 + p )  2 V b − V + b − 1 b  2 .  1 + 2 V b − V + b − 1 b  . (64) Pr o of. F or notational simplicity , let y = ˜ λ  x k  in this pro of. W e will show that for any p ositiv e scalar α with 0 < α ≤  1 2 − p − √ b (2 V b − V + b − 1)  / ( p + 1), Eq. (64) holds. Note that suc h α exists since  <  (0 . 5 − p )(2 V b − V + b − 1) b  2 . By Assumption 3, w e hav e for θ k ≥ V , y ≥ θ k −  1 b − 1  (1 + V ) ≥ V −  1 b − 1  (1 + V ) = 2 V b − V + b − 1 b . (65) Using b > V +1 2 V +1 , we hav e y ≥ V −  1 b − 1  (1 + V ) > 0, whic h implies 2 V b − V + b − 1 > 0. T ogether with 0 < α ≤  1 2 − p − √ b 2 V b − V + b − 1  / ( p + 1) and b > V +1 2 V +1 , this shows √  ≤ 2 V b − V + b − 1 b  1 2 − p − α (1 + p )  . Com bining the ab ov e, we obtain √  ≤ y  1 2 − p − α (1 + p )  , whic h using algebraic manipulation yields − (1 − p ) y −  1 − √   + (1 + y ) − y 2 ≤ − α (1 + p ) y . F rom Eq. (65), we hav e y > 0. W e can therefore m ultiply b y y and divide b y 1 + y both sides of the abov e inequality to obtain − 1 − p 1 + y y 2 − 1 − √  1 + y y + y − y 2 2 (1 + y ) ≤ − α (1 + p ) y 2 1 + y (66) Using second order T a ylor expansion on log (1 + y ), w e hav e for y ≥ 0 log (1 + y ) ≤ y − y 2 2 (1 + y ) . 32 Using this relation in Eq. (66) yields, − 1 − p 1 + y y 2 − 1 − √  1 + y y + log (1 + y ) ≤ − α (1 + p ) y 2 1 + y . Substituting the v alue of t = 1 / ( y + 1), the ab o v e relation can be rewritten as − (1 − p ) ty 2 −  1 − √   ty − log (1 − ty ) ≤ − α (1 + p ) y 2 1 + y . Using Eq. (61) from Lemma 5.8 and definition of y in the preceding, w e obtain ˜ f k ( t ) − ˜ f k (0) ≤ − α (1 + p ) y 2 y + 1 . Observ e that the function h ( y ) = y 2 y +1 is monotonically increasing in y , and for θ k ≥ V b y relation (65) we ha ve y ≥ 2 V b − V + b − 1 b . Therefore − α (1 + p ) y 2 y + 1 ≤ − α (1 + p )  2 V b − V + b − 1 b  2 /  1 + 2 V b − V + b − 1 b  . Com bining the preceding tw o relations completes the pro of. Note that our algorithm uses the stepsize d k = d θ k +1 in the damp ed con vergen t phase, whic h is an appro ximation to the stepsize t = 1 / ( ˜ λ ( x k ) + 1) used in the previous theorem. The error betw een the t w o is b ounded by relation (36) as shown in Lemma 4.5. W e next sho w that with this error in the stepsize computation, the impro vemen t in the ob jective function v alue in the inexact algorithm is still lo wer bounded at eac h iteration. Let β = d k t , where t = 1 / ( ˜ λ ( x k ) + 1). By the conv exit y of f , w e hav e f ( x k + β t ∆ x k ) = f ( β ( x k + t ∆ x k ) + (1 − β )( x k )) ≤ β f ( x k + t ∆ x k ) + (1 − β ) f ( x k ) . Therefore the ob jectiv e function v alue improv ement is b ounded by f ( x + β t ∆ x k ) − f ( x k ) ≤ β f ( x k + t ∆ x k ) + (1 − β ) f ( x k ) − f ( x k ) = β ( f ( x k + t ∆ x k ) − f ( x k )) = β ( ˜ f k ( t ) − ˜ f k (0)) , where the last equality follows from the definition of ˜ f k ( t ). Using Lemma 4.5, w e obtain b ounds on β as 2 b − 1 ≤ β ≤ 1. Hence com bining this b ound with Theorem 5.9, w e obtain f ( x k +1 ) − f ( x k ) ≤ − (2 b − 1) α (1 + p )  2 V b − V + b − 1 b  2  1 + 2 V b − V + b − 1 b  . (67) Hence in the damp ed conv ergent phase we can guarantee a lo wer b ound on the ob ject function v alue impro vemen t at each iteration. This b ound is monotone in b , i.e., the closer the scalar b is to 1, the faster the ob jective function v alue improv es, ho wev er this also requires the error in the inexact Newton decremen t calculation, i.e., ˜ λ ( x k ) − θ k , to diminish to 0 [cf. Assumption 3]. 33 5.2.4 Quadratically Con v ergent Phase In this phase, there exists ¯ k with θ ¯ k < V and the step size c hoice is d k = 1 for all k ≥ ¯ k . 18 W e sho w that the sub optimalit y in the primal ob jective function v alue diminishes quadratically to a neighborho o d of optimal solution. W e pro ceed b y first establishing the follo wing lemma for relating the exact and the inexact Newton decremen ts. Lemma 5.10. Let { x k } b e the primal sequence generated by the inexact Newton method (29) and λ ( x k ), ˜ λ ( x k ) b e the exact and inexact Newton decremen ts at x k [cf. Eqs. (30) and (31)]. Let p and  b e the nonnegative scalars defined in Assumption 2. W e hav e (1 − p ) ˜ λ ( x k ) − √  ≤ λ ( x k ) ≤ (1 + p ) ˜ λ ( x k ) + √ . (68) Pr o of. By Assumption 1, for all k , ∇ 2 f ( x k ) is p ositiv e definite. W e therefore can apply the generalized Cauch y-Sc hw arz inequality and obtain | (∆ x k ) 0 ∇ 2 f ( x k )∆ ˜ x k | ≤ p ((∆ x k ) 0 ∇ 2 f ( x k )∆ x k )((∆ ˜ x k ) 0 ∇ 2 f ( x k )∆ ˜ x k ) (69) = λ ( x k ) ˜ λ ( x k ) , where the equality follows from definition of λ ( x k ) and ˜ λ ( x k ). Note that b y Assumption 2, w e ha ve ∆ x k = ∆ ˜ x k + γ k , and hence | (∆ x k ) 0 ∇ 2 f ( x k )∆ ˜ x k | = | (∆ ˜ x k + γ k ) 0 ∇ 2 f ( x k )∆ ˜ x k | (70) ≥ (∆ ˜ x k ) 0 ∇ 2 f ( x k )∆ ˜ x k − | ( γ k ) 0 ∇ 2 f ( x k )∆ ˜ x k | ≥ ˜ λ ( x k ) 2 − p ˜ λ ( x k ) 2 − ˜ λ ( x k ) √ , where the first inequality follo ws from a v ariation of triangle inequality , and the last inequalit y follo ws from Lemma 5.8. Com bining the t wo inequalities (69) and (70), w e obtain λ ( x k ) ˜ λ ( x k ) ≥ ˜ λ ( x k ) 2 − p ˜ λ ( x k ) 2 − √  ˜ λ ( x k ) , By canceling the nonnegativ e term ˜ λ ( x k ) on b oth sides, we ha ve λ ( x k ) ≥ ˜ λ ( x k ) − p ˜ λ ( x k ) − √ . This shows the first half of the relation (68). F or the second half, using the definition of λ ( x k ), we ha ve λ ( x k ) 2 = (∆ x k ) 0 ∇ 2 f ( x k )∆ x k = (∆ ˜ x k + γ k ) 0 ∇ 2 f ( x k )(∆ ˜ x k + γ k ) = (∆ ˜ x k ) 0 ∇ 2 f ( x k )∆ ˜ x k + ( γ k ) 0 ∇ 2 f ( x k ) γ k + 2(∆ ˜ x k ) 0 ∇ 2 f ( x k ) γ k , 18 Note that once the condition θ ¯ k < V is satisfied, in all the following iterations, we hav e stepsize d k = 1 and no longer need to compute θ k . 34 where the second equality follo ws from the definition of γ k [cf. Eq. (33)]. By using the definition of ˜ λ ( x k ), Assumption 2 and Lemma 5.7, the preceding relation implies, λ ( x k ) 2 ≤ ˜ λ ( x k ) 2 + p 2 ˜ λ ( x k ) 2 +  + 2 p ˜ λ ( x k ) 2 + 2 √  ˜ λ ( x k ) ≤ ˜ λ ( x k ) 2 + p 2 ˜ λ ( x k ) 2 + 2 p ˜ λ ( x k ) 2 + 2 √  (1 + p ) ˜ λ ( x k ) +  = ((1 + p ) ˜ λ ( x k ) + √  ) 2 , where the second inequalit y follo ws b y adding a nonnegativ e term of 2 √ p ˜ λ ( x k ) to the righ t hand side. By nonnegativity of p ,  , λ and ˜ λ ( x k ), we can tak e the square ro ot of b oth sides and this completes the pro of for relation (68). Before pro ceeding to establish quadratic con vergence in terms of the primal iterations to an error neighborho o d of the optimal solution, we need to imp ose the following bound on the errors in our algorithm in this phase. Recall that ¯ k is an index suc h that θ ¯ k < V and d k = 1 for all k ≥ ¯ k . Assumption 4. Let { x k } b e the primal sequence generated b y the inexact Newton metho d (29). Let φ b e a positive scalar with φ ≤ 0 . 267. Let ξ and v b e nonnegativ e scalars defined in terms of φ as ξ = φp + √  1 − p − φ − √  + 2 φ √  +  (1 − p − φ − √  ) 2 , v = 1 (1 − p − φ − √  ) 2 , where p and  are the scalars defined in Assumption 2. The follo wing relations hold (1 + p )( θ ¯ k + τ ¯ k ) + √  ≤ φ, (71) v (0 . 68) 2 + ξ ≤ 0 . 68 , (72) 0 . 68 + √  1 − p ≤ 1 , (73) p + √  ≤ 1 − (4 φ 2 ) 1 4 − φ, (74) where τ ¯ k > 0 is a b ound on the error in the Newton decremen t calculation at step ¯ k [cf. Assumption 3]. The upp er b ound of 0 . 267 on φ is necessary here to guarantee relation (74) can be satisfied b y some nonnegativ e scalars p and  . Relation (71) can b e satisfied b y some nonnegative scalars p ,  and τ ¯ k , because w e ha ve θ ¯ k < V < 0 . 267. Relation (71) and (72) will be used to guarantee the condition λ ( x k ) ≤ 0 . 68 is satisfied throughout this phase, so that we can use Lemma 5.6 to relate the sub optimality bound with the Newton decremen t, and relation (73) and (74) will b e used for establishing the quadratic rate of conv ergence of the ob jective function v alue, as we will sho w in the Theorem 5.12. This assumption can b e satisfied by first choosing proper v alues for the scalars p ,  and τ suc h that all the relations are satisfied, and then adapt b oth the consensus algorithm for θ ¯ k and the dual iterations for w k according 35 to the desired precision (see the discussions follo wing Assumption 2 and 3 for ho w these precision levels can b e achiev ed). T o sho w the quadratic rate of conv ergence for the primal iterations, w e need the following lemma, which relates the exact Newton decrement at the current and the next step. Lemma 5.11. Let { x k } b e the primal sequence generated by the inexact Newton method (29) and λ ( x k ), ˜ λ ( x k ) b e the exact and inexact Newton decremen ts at x k [cf. Eqs. (30) and (31)]. Let θ k b e the computed inexact v alue of ˜ λ ( x k ) and let Assumption 4 hold. Then for all k with ˜ λ ( x k ) < 1, we ha ve λ ( x k +1 ) ≤ v λ ( x k ) 2 + ξ , (75) where ξ and v are the scalars defined in Assumption 4 and p and  are defined as in Assump- tion 2. Pr o of. Given ˜ λ ( x k ) < 1, we can apply Lemma 5.4 b y letting z = ∆ x k +1 , we ha ve λ ( x k +1 ) 2 = (∆ x k +1 ) 0 ∇ f 2 ( x + ∆ ˜ x )∆ x k +1 ≤ ˜ λ ( x k ) 2 1 − ˜ λ ( x k ) p (∆ x k +1 ) 0 ∇ 2 f ( x )∆ x k +1 +   ( γ k ) 0 ∇ 2 f ( x ) 0 ∆ x k +1   ≤ ˜ λ ( x k ) 2 1 − ˜ λ ( x k ) p (∆ x k +1 ) 0 ∇ 2 f ( x )∆ x k +1 + p ( γ k ) 0 ∇ 2 f ( x ) γ k p (∆ x k +1 ) 0 ∇ 2 f ( x )∆ x k +1 , where the last inequality follows from the generalized Cauc h y-Sch warz inequality . Using Assumption 2, the abov e relation implies λ ( x k +1 ) 2 ≤ ˜ λ ( x k ) 2 1 − ˜ λ ( x k ) + q p 2 ˜ λ ( x k ) 2 +  ! p (∆ x k +1 ) 0 ∇ 2 f ( x )∆ x k +1 . By the fact that ˜ λ ( x k ) ≤ θ k + τ ≤ φ < 1, we can apply Lemma 5.3 and obtain, λ ( x k +1 ) 2 ≤ 1 1 − ˜ λ ( x k ) ˜ λ ( x k ) 2 1 − ˜ λ ( x k ) + q p 2 ˜ λ ( x k ) 2 +  ! p (∆ x k +1 ) 0 ∇ 2 f ( x + ∆ ˜ x )∆ x k +1 =   ˜ λ ( x k ) 2 (1 − ˜ λ ( x k )) 2 + q p 2 ˜ λ ( x k ) 2 +  1 − ˜ λ ( x k )   λ ( x k +1 ) . By dividing the last line b y λ ( x k +1 ), this yields λ ( x k +1 ) ≤ ˜ λ ( x k ) 2 (1 − ˜ λ ( x k )) 2 + q p 2 ˜ λ ( x k ) 2 +  1 − ˜ λ ( x k ) ≤ ˜ λ ( x k ) 2 (1 − ˜ λ ( x k )) 2 + p ˜ λ ( x k ) + √  1 − ˜ λ ( x k ) . 36 F rom Eq. (68), w e hav e ˜ λ ( x k ) ≤ λ ( x k )+ √  1 − p . Therefore the ab ov e relation implies λ ( x k +1 ) ≤  λ ( x k ) + √  1 − p − λ ( x k ) − √   2 + pλ ( x k ) + √  1 − p − λ ( x k ) − √  . By Eq. (77), w e hav e λ ( x k ) ≤ φ , and therefore the ab o v e relation can be relaxed to λ ( x k +1 ) ≤  λ ( x k ) 1 − p − φ − √   2 + φp + √  1 − p − φ − √  + 2 φ √  +  (1 − p − φ − √  ) 2 . Hence, by definition of ξ and v , we ha ve λ ( x k +1 ) ≤ v λ ( x k ) 2 + ξ . In the next theorem, building up on the preceding lemma, w e apply relation (59) to bound the suboptimality in our algorithm, i.e., f ( x k ) − f ∗ , using the exact Newton decremen t. W e sho w that under the abov e assumption, the ob jectiv e function v alue f ( x k ) generated b y our algorithm conv erges quadratically in terms of the primal iterations to an explicitly c haracterized error neigh b orho o d of the optimal v alue f ∗ . Theorem 5.12. Let { x k } b e the primal sequence generated by the inexact Newton metho d (29) and λ ( x k ), ˜ λ ( x k ) b e the exact and inexact Newton decremen ts at x k [cf. Eqs. (30) and (31)]. Let f ( x k ) b e the corresp onding ob jectiv e function v alue at k th iteration and f ∗ denote the optimal ob jective function v alue for problem (4). Let Assumption 4 hold, and ξ and v b e the scalars defined in Assumption 4. Assume that for some δ ∈ [0 , 1 / 2), ξ + v ξ ≤ δ 4 v . Then for all m ≥ 1, we ha ve λ ( x ¯ k + m ) ≤ 1 2 2 m v + ξ + δ v 2 2 m − 1 − 1 2 2 m , (76) and limsup m →∞ f ( x ¯ k + m ) − f ∗ ≤ ξ + δ 2 v , where ¯ k is the iteration index with θ ¯ k < V . Pr o of. W e prov e Eq. (76) b y induction. First for m = 1, from Assumption 3, we ha ve ˜ λ ( x ¯ k ) ≤ θ ¯ k + τ ¯ k . Relation (71) implies θ ¯ k + τ ¯ k ≤ φ < 1, hence w e hav e ˜ λ ( x ¯ k ) < 1 and w e can apply Lemma 5.11 and obtain λ ( x ¯ k +1 ) ≤ v λ ( x ¯ k ) 2 + ξ . 37 By Assumption 4 and Eq. (68), we ha ve λ ( x ¯ k ) ≤ (1 + p )( θ ¯ k + τ ¯ k ) + √  ≤ φ. (77) The ab ov e tw o relations imply λ ( x ¯ k +1 ) ≤ v φ 2 + ξ . The righ t hand side is monotonically increasing in φ . Since φ ≤ 0 . 68, w e hav e b y Eq. (72), λ ( x ¯ k +1 ) ≤ 0 . 68. By relation (74), w e obtain (1 − p − φ − √  ) 4 ≥ 4 φ 2 . Using the definition of v , i.e., v = 1 (1 − p − φ − √  ) 2 , the ab o ve relation implies v φ 2 ≤ 1 4 v . Hence w e hav e λ ( x ¯ k +1 ) ≤ 1 4 v + ξ . This establishes relation (76) for m = 1. W e next assume that Eq. (76) holds and λ ( x ¯ k + m ) ≤ 0 . 68 for some m > 0, and sho w that these also hold for m + 1. F rom Eqs. (68) and (73), we ha ve ˜ λ ( x ¯ k + m ) ≤ λ ( x ¯ k + m ) + √  1 − p ≤ 0 . 68 + √  1 − p ≤ 1 , 19 where in the second inequalit y we used the inductiv e h yp othesis that λ ( x ¯ k + m ) ≤ 0 . 68. Hence w e can apply Eq. (75) and obtain λ ( x ¯ k + m +1 ) ≤ v λ ( x ¯ k + m ) 2 + ξ , using Eq. (72) and λ ( x ¯ k + m ) ≤ 0 . 68 once more, w e ha v e λ ( x ¯ k + m +1 ) ≤ 0 . 68. F rom our inductive h yp othesis that (76) holds for m , the abov e relation also implies λ ( x ¯ k + m +1 ) ≤ v  1 2 2 m v + ξ + δ v 2 2 m − 1 − 1 2 2 m  2 + ξ = 1 2 2 m +1 v + ξ 2 2 m − 1 + δ v 2 2 m − 1 − 1 2 2 m +1 − 1 + v  ξ + δ v 2 2 m − 1 − 1 2 2 m  2 + ξ , Using algebraic manipulations and the assumption that ξ + v ξ ≤ δ 4 v , this yields λ ( x ¯ k + m +1 ) ≤ 1 2 2 m +1 v + ξ + δ v 2 2 m +1 − 1 − 1 2 2 m +1 , completing the induction and therefore the pro of of relation (76). 19 Note that w e do not need monotonicit y in ˜ λ ( x k ), instead the error lev el assumption from relation (73) enables us to use Lemma 5.11 to establish quadratic rate of conv ergence. 38 The induction pro of abov e suggests that the condition λ ( x ¯ k + m ) ≤ 0 . 68 holds for all m > 0, we can therefore apply Lemma 5.6, and obtain an upp er b ound on sub optimality as follo ws, f ( x ¯ k + m ) − f ∗ ≤  λ ( x ¯ k + m )  2 ≤ λ ( x ¯ k + m ) . Com bining this with Eq. (76), w e obtain f ( x ¯ k + m ) − f ∗ ≤ 1 2 2 m v + ξ + δ v 2 2 m − 1 − 1 2 2 m . T aking limit sup erior on both sides of the preceding relation establishes the final result. The abov e theorem shows that the ob jective function v alue f ( x k ) generated b y our al- gorithm con v erges in terms of the primal iterations quadratically to a neighborho o d of the optimal v alue f ∗ , with the neigh b orho o d of size ξ + δ 2 v , where ξ = φp + √  1 − p − φ − √  + 2 φ √  +  (1 − p − φ − √  ) 2 , v = 1 (1 − p − φ − √  ) 2 , and the condition ξ + v ξ ≤ δ 4 v is satisfied. Note that with the exact Newton algorithm, we ha ve p =  = 0, whic h implies ξ = 0 and we can choose δ = 0, which in turn leads to the size of the error neigh b orho o d being 0. This confirms the fact that the exact Newton algorithm con verges quadratically to the optimal ob jective function v alue. 5.3 Con v ergence with resp ect to Design P arameter µ In the preceding dev elopment, w e ha v e restricted our atten tion to develop an algorithm for a giv en logarithmic barrier co efficien t µ . W e next study the conv ergence prop erty of the optimal ob ject function v alue as a function of µ , in order to develop a metho d to bound the error introduced b y the logarithmic barrier functions to b e arbitrarily small. W e utilize the follo wing result from [30]. Lemma 5.13. Let G b e a closed conv ex domain, and function g b e a self-concordant barrier function for G , then for an y x , y in in terior of G , we ha ve ( y − x ) 0 ∇ g ( x ) ≤ 1. Using this lemma and an argumen t similar to that in [30], we can establish the follo wing result, which bounds the sub-optimalit y as a function of µ . Theorem 5.14. Given µ ≥ 0, let x ( µ ) denote the optimal solution of problem (4) and h ( x ( µ )) = P S i =1 − U i ( x i ( µ )) . Similarly , let x ∗ denote the optimal solution of problem (2) together with corresp onding slack v ariables (defined in Eq. (3)), and h ∗ = P S i =1 − U i ( x ∗ i ). Then, the following relation holds, h ( x ( µ )) − h ∗ ≤ µ. 39 Pr o of. F or notational simplicit y , w e write g ( x ) = − P S + L i =1 log ( x i ). Therefore the ob jective function for problem (4) can b e written as h ( x ) + µg ( x ). By Assumption 1, w e hav e that the utilit y functions are concav e, therefore the negativ e ob jectiv e functions in the minimization problems are conv ex. F rom conv exity , w e obtain h ( x ∗ ) ≥ h ( x ( µ )) + ( x ∗ − x ( µ )) 0 ∇ h ( x ( µ )) . (78) By optimality condition for x ( µ ) for problem (4) for a given µ , w e hav e, ( ∇ h ( x ( µ )) + µ ∇ g ( x ( µ ))) 0 ( x − x ( µ )) ≥ 0 , for any feasible x . Since x ∗ is feasible, we ha ve ( ∇ h ( x ( µ )) + µ ∇ g ( x ( µ ))) 0 ( x ∗ − x ( µ )) ≥ 0 , whic h implies ∇ h ( x ( µ )) 0 ( x ∗ − x ( µ )) ≥ − µ ∇ g ( x ( µ )) 0 ( x ∗ − x ( µ )) . F or an y µ , w e ha ve x ( µ ) b elong to the in terior of the feasible set, and b y Lemma 5.13, we ha ve for all ˜ µ , ∇ g ( x ( µ )) 0 ( x ( ˜ µ ) − x ( µ )) ≤ 1. By con tinuit y of x ( µ ) and the fact that the con vex set Ax ≤ c is closed, for A and c defined in problem (4), we hav e x ∗ = lim µ → 0 x ( µ ), and hence ∇ g ( x ( µ )) 0 ( x ∗ − x ( µ )) = lim ˜ µ → 0 ∇ g ( x ( µ )) 0 ( x ( ˜ µ ) − x ( µ )) ≤ 1 . The preceding tw o relations imply ∇ h ( x ( µ )) 0 ( x ∗ − x ( µ )) ≥ − µ. In view of relation (78), this establishes the desired result, i.e., h ( x ( µ )) − h ∗ ≤ µ. By using the ab o ve theorem, we can dev elop a method to b ound the sub-optimalit y b et w een the ob jective function v alue our algorithm provides for problem (4) and the exact optimal ob jectiv e function v alue for problem (2), i.e, the sub-optimalit y introduced b y the barrier functions in the ob jectiv e function, suc h that for an y p ositive scalar a , the follo wing relation holds, h ( x ( µ )) − h ∗ h ∗ ≤ a, (79) where the v alue h ( x ( µ )) is the v alue obtained from our algorithm for problem (4), and h ∗ is the optimal ob jective function v alue for problem (2). W e achiev e the abov e b ound b y implementing our algorithm twice. The first time in v olv es running the algorithm for 40 problem (4) with some arbitrary µ . This leads to a sequence of x k con verging to some x ( µ ). Let h ( x ( µ )) = P S i =1 − U i ( x i ( µ )). By Theorem 5.14, we ha ve h ( x ( µ )) − µ ≤ h ∗ . (80) Let scalar M b e such that M = ( a [ h ( x ( µ )) − µ ]) − 1 and implemen t the algorithm one more time for problem (4), with µ = 1 and the ob jectiv e function multiplied b y M , i.e., the new ob jectiv e is to minimize − M P S i =1 U i ( x i ) − P S + L i =1 log ( x i ), sub ject to link capacit y con- strain ts. 20 W e obtain a sequence of ˜ x k con verges to some ˜ x (1). Denote the ob jective function v alue as h ( ˜ x (1)), then b y applying the preceding theorem one more time we ha ve M h ( ˜ x (1)) − M h ∗ ≤ µ = 1 , whic h implies h ( ˜ x (1)) − h ∗ ≤ a [ h ( x ( µ )) − µ ] ≤ ah ∗ where the first inequality follows by definition of the p ositiv e scalar M and the second inequalit y follo ws from relation (80). Hence w e ha ve the desired b ound (79). Therefore ev en with the introduction of the logarithmic barrier function, the relativ e error in the ob jective function v alue can be b ounded b y an arbitrarily small positive scalar at the cost of p erforming the fast Newton-t yp e algorithm t wice. 6 Sim ulation Results Our simulation results demonstrate that the decen tralized Newton metho d significantly out- p erforms the existing metho ds in terms of n umber of iterations. F or our distributed Newton metho d, we used the follo wing error tolerance levels: p = 10 − 3 ,  = 10 − 4 [cf. Assumption 2], τ = 10 − 2 [cf. Assumption 3] and when θ ¯ k > V = 0 . 12 we switch stepsize c hoice to b e d k = 1 for all k ≥ ¯ k . With these error tolerance lev els, b oth Assumptions 2 and 4 can b e satisfied. W e executed distributed Newton metho d twice with different scaling and barrier co efficien ts according to Section 5.3 with B = 10 − 2 to confine the error in the ob jective function v alue to be within 1% of the optimal v alue. F or a comprehensiv e comparison, w e coun t both the primal and dual iterations implemented through distributed error c hec king metho d described in App endix B. 21 In particular, in what follo ws, the num b er of iterations of our metho d refers to the sum of dual iterations at eac h of the generated primal iterate. 20 When M < 0, we can simply add a constan t to the original ob jective function to shift it up ward. Therefore the scalar M can b e assumed to be positive without loss of generalit y . If no estimate on M is a v ailable apriori, w e can implemen t the distributed algorithm one more time in the b eginning to obtain an estimate to generate the constant accordingly . 21 In these simulations we did not include the n umber of steps required to compute the stepsize (distrib uted summation with finite termination) and to implement distributed error c hecking (maximum consensus) to allo w the possibilities that other metho ds can b e used to compute these. Note that the num b er of iterations required by both of these computation is upper b ounded by the n umber of sources, which is a small constant (8 for example) in our sim ulations. 41 0 20 40 60 80 100 120 140 160 180 200 18 20 22 24 26 28 30 32 34 Sample Utility Value Distributed Newton Iteration Utility Value Figure 9: One sample ob jective function v alue of distributed Newton method against n um b er of iterations. The dotted blac k lines denote ± 5% interv al of the optimal ob jectiv e function v alue. In the simulation results, w e compare our distributed Newton metho d p erformance against b oth the subgradient metho d used in [25] and the Newton-type diagonal scaling dual method dev elop ed in [1]. Both of these metho ds w ere implemented using a constan t stepsize that can guarantee con vergence as sho wn in [25] and [1]. A sample evolution of the ob jective function v alue of the distributed Newton metho d is presen ted in Figure 9. This is generated for the netw ork in Figure 1. The horizontal line segments corresp ond to the dual iterations, where the primal vector stays constant, and eac h jump in the figure is a primal Newton up date. The spik e close to the end is a result of rescaling and using a new barrier co efficien t in the second round of the distributed Newton algorithm [cf. Section 5.3]. The blac k dotted lines indicate ± 5% interv al around the optimal ob jective function v alue. The other t wo algorithms w ere implemen ted for the same problem, and the ob jectiv e function v alues are plotted in Figure 10, with logarithmic scaled iteration coun t on the x -axis. W e use blac k dotted lines to indicate ± 5% interv al around the optimal ob jective function v alue. While the subgradien t and diagonal scaling metho ds hav e similar conv ergence b eha vior, the distributed Newton metho d significantly outp erforms the tw o. One of the imp ortan t features of the distributed Newton metho d is that, unlike the other t wo algorithms, the generated primal iterates satisfy the link capacity constrain t throughout the algorithm. This observ ation is confirmed by Figure 11, where the minimal slacks in links 42 10 0 10 1 10 2 10 3 10 4 −5 0 5 10 15 20 25 30 35 Iteration Count Utility Value Sample Utility Value Output Subgradient Diagonal Scaling Distributed Newton Figure 10: One sample ob jectiv e function v alue of all three metho ds against log scaled iteration count. The dotted blac k lines denote ± 5% in terv al of the optimal ob jective function v alue. 43 10 0 10 1 10 2 10 3 10 4 − 1 − 0.8 − 0.6 − 0.4 − 0.2 0 0.2 0.4 Iteration Minimal slack capacity among all links Minimal Slack in Link Capacities Subgradient Diagonal Scaling Distributed Newton Figure 11: Sample minimal slack in link capacity of all three metho ds against log scaled iteration count. Negative slac k means violating capacity constrain t. The dotted blac k line denotes 0. 44 0 5 10 15 20 25 30 35 40 45 50 10 2 10 3 10 4 10 5 10 6 Number of Iterations for Three Algorithms 50 Random Trials Trial Number Number of Iterations, Log − scale Subgradient Diagonal Scaling Distributed Newton Figure 12: Log scaled iteration coun t for the 3 metho ds implemented o v er 50 randomly generated netw orks. are sho wn for all three algorithms. The black dotted line is the zero line and a negative slac k means violating the capacit y constraint. The slacks that our distributed Newton metho d yields alwa ys stays ab o v e the zero line, while the other t wo only b ecomes feasible in the end. T o test the performances of the metho ds ov er general net works, we generated 50 random net works, with num b er of links L = 15 and n umber of sources S = 8. Each routing matrix consists of L × R Bernoulli random v ariables. 22 All three methods are implemented ov er the 50 netw orks. W e record the n um b er of iterations up on termination for all 3 metho ds, and results are sho wn in Figure 12 on a log scale. The mean num b er of iterations to con v ergence from the 50 trials is 924 for distributed Newton metho d, 20286 for Newton-type diagonal scaling and 29315 for subgradien t method. 7 Conclusions This paper dev elops a distributed Newton-t yp e second order algorithm for net work utility maximization problems, whic h can ac hiev e sup erlinear conv ergence rate in primal iterates within some error neigh b orho o d. W e sho w that the computation of the dual Newton step can b e implemented in a decentralized manner using a matrix splitting scheme. The key 22 When there exists a source that does not use an y links or a link that is not used by any sources, w e discard the routing matrix and generate another one. 45 feature of this scheme is that its implementation uses an information exchange mec hanism similar to that in volv ed in first order metho ds applied to this problem. W e sho w that ev en when the Newton direction and stepsize are computed with some error, the metho d achiev es sup erlinear conv ergence rate in terms of primal iterations to an error neighborho o d. Simula- tion results also indicate significant improv ement ov er traditional distributed algorithms for net work utility maximization problems. Possible future directions include a more detailed analysis of the relationship betw een the rate of con vergence of the dual iterations and the underlying topology of the net w ork and in vestigating conv ergence prop erties for a fixed finite truncation of dual iterations. A Distributed Stepsize Computation In this section, we describ e a distributed pro cedure with finite termination to compute stepsize d k according to Eq. (32). W e first note that in Eq. (32), the scalar b ∈ (0 , 1) is predetermined and the only unknown term is the inexact Newton decremen t ˜ λ ( x k ). In order to compute the v alue of ˜ λ ( x k ), we rewrite the inexact Newton decremen t based on definition (31) as ˜ λ ( x k ) = q P i ∈S (∆ ˜ x k i ) 2 ( H k ) ii + P l ∈L (∆ ˜ x k l + S ) 2 ( H k ) ( l + S )( l + S ) , or equiv alently ,  ˜ λ ( x k )  2 = X i ∈S (∆ ˜ x k i ) 2 ( H k ) ii + X l ∈L (∆ ˜ x k l + S ) 2 ( H k ) ( l + S )( l + S ) . (81) In the sequel, we dev elop a distributed summation procedure to compute this quan tity b y aggregating the lo cal information a v ailable on sources and links. A key feature of this pro cedure is that it resp ects the simple information exc hange mechanism used b y first order metho ds applied to the NUM problem: information ab out the links along the routes is aggregated and sent bac k to the sources using a feedbac k mec hanism. Over-coun ting is a voided using a nov el off-line construction, which forms an (undir e cte d) auxiliary gr aph that con tains information on sources sharing common links. Giv en a netw ork with source set S = { 1 , 2 , . . . , S } (each asso ciated with a predetermined route) and link set L = { 1 , 2 , . . . , L } , we define the set of no des in the auxiliary graph as the set S , i.e., each no de corresp onds to a source (or equiv alen tly , a flo w) in the original netw ork. The edges are formed betw een sources that share common links according to the follo wing iterativ e construction. In this construction, eac h source is equipp ed with a state (or color) and eac h link is equipp ed with a set (a subset of sources), which are up dated using signals sen t b y the sources along their routes. Auxiliary Graph Construction: • Initialization: Each link l is asso ciated with a set Θ l = ∅ . One arbitrarily chosen source is marked as grey , and the rest are marked as white. The grey source sends a signal { lab el, i } to its route. Eac h link l receiving the signal, i.e., l ∈ L ( i ), adds i to Θ l . 46 • Iteration: In eac h iteration, first the sources up date their states and send out signals according to step (A). Eac h link l then receiv es signals sen t in step (A) from the sources i ∈ S ( l ) and updates the set Θ l according to step (B). (A) Eac h source i : (A.a) If it is white, it sums up | Θ l | along its route, using the v alue | Θ l | from the previous time. (A.a.1) If P l ∈ L ( i ) | Θ l | > 0, then the source i is mark ed grey and it sends t wo signals { neighbor, i } and { lab el, i } to its route. (A.a.2) Else, i.e., P l ∈ L ( i ) | Θ l | = 0, source i do es nothing for this iteration. (A.a) Otherwise, i.e., it is grey , source i do es nothing. (B) Eac h link l : (B.a) If Θ l = ∅ : (B.a.1) If it exp eriences signal { lab el, i } passing through it, it adds i to Θ l . When there are more than one such signals during the same iteration, only the smallest i is added. The signal k eeps tra versing the rest of its route. (B.a.2) Otherwise link l simply carries the signal(s) passing through it, if an y , to the next link or no de. (B.b) Else, i.e., Θ l 6 = ∅ : (B.b.1) If it exp eriences signal { neighbor, i } passing through it, an edge ( i, j ) with label L l is added to the auxiliary graph for all j ∈ Θ l , and then i is added to the set Θ l . If there are more than one such signals during the same iteration, the sources are added sequentially , and the resulting no des in the set Θ l form a clique in the auxiliary graph. Link l then stops the signal, i.e., it do es not pass the signals to the next link or no de. (B.b.2) Otherwise link l simply carries the signal(s) passing through it, if an y , to the next link or no de. • T ermination: T erminate after S − 1 n umber of iterations. The auxiliary graph construction pro cess for the sample netw ork in Figure 13 is illustrated in Figure 14, where the left column reflects the color of the nodes in the original net work and the elements of the set Θ l (lab eled on eac h link l ), while the righ t column corresp onds to the auxiliary graph constructed after each iteration. 23 W e next inv estigate some prop erties of the auxiliary graph, which will b e used in proving that our distributed summation pro cedure yields the corrects v alues. Lemma A.1. Consider a net w ork and its auxiliary graph with sets { Θ l } l ∈L . The follo wing statemen ts hold: 23 Note that dep ending on construction, a net work ma y hav e different auxiliary graphs asso ciated with it. An y of these graphs can b e used in the distributed summation pro cedure. 47 S 1 S 2 S 3 D 1 S 4 D 2 D 3 D 4 L 1 : x 1 L 2 : x 2 L 3 : x 3 L 4 : x 1 , x 2 , x 3 L 7 : x 4 L 5 : x 2 , x 3 , x 4 L 8 : x 2 L 9 : x 3 L 10 : x 4 L 6 : x 1 Figure 13: A sample netw ork with four sources and ten links. Eac h link sho ws the flows (or sources) using that link. This example will b e used to illustrate different parts of the distributed stepsize computation in this section. (1) F or eac h link l , Θ l ⊂ S ( l ). (2) Source no des i, j are connected in the auxiliary graph if and only if there exists a link l , such that { i, j } ⊂ Θ l . (3) The auxiliary graph do es not contain m ultiple edges, i.e., there exists at most one edge b et w een any pair of no des. (4) The auxiliary graph is connected. (5) F or eac h link l , Θ l 6 = ∅ . (6) There is no simple cycle in the auxiliary graph other than that formed b y only the edges with the same lab el. Pr o of. W e prov e the abov e statements in the order they are stated. (1) P art (1) follo ws immediately from our auxiliary graph construction, because eac h source only sends signals to links on its own route and the links only up date their set Θ l when they exp erience some signals passing through them. (2) In the auxiliary graph construction, a link is added to the auxiliary graph only in step (B.b.1), where part (2) clearly holds. (3) F rom the first tw o parts, there is an edge b et w een source no des i, j , i.e., { i, j } ⊂ Θ l for some l , only if i and j share link l in the original netw ork. F rom the auxiliary graph construction, if sources i and j share link l then an edge with lab el L l b et w een i and j is formed at some iteration if and only if one of the following three cases holds: I In the b eginning of the previous iteration Θ l = ∅ and sources i, j are b oth white. During the previous iteration, source i b ecomes grey and sends out the signal { lab el, i } to link l , hence Θ l = { i } . In the curren t iteration, source j with 48 S 1 S 2 S 3 D 1 S 4 D 2 D 3 D 4 S 1 S 1 S 1 (a) State of the netw ork t = 0 S 3 S 1 S 4 S 2 (b) State of the aux- iliary graph t = 0 S 1 S 2 S 3 D 1 S 4 D 2 D 3 D 4 S 1 S 1 , S 2 , S 3 S 1 S 2 S 3 S 2 S 2 S 3 (c) State of the netw ork t = 1 S 3 S 1 S 4 S 2 L 4 L 4 L 4 (d) State of the aux- iliary graph t = 1 S 1 S 2 S 3 D 1 S 4 D 2 D 3 D 4 S 1 S 1 , S 2 , S 3 S 1 S 2 S 3 S 2 , S 4 S 2 S 3 S 4 S 4 (e) State of the netw ork t = 2 S 3 S 1 S 4 S 2 L 4 L 4 L 4 L 5 (f ) State of the aux- iliary graph t = 2 S 1 S 2 S 3 D 1 S 4 D 2 D 3 D 4 S 1 S 1 , S 2 , S 3 S 1 S 2 S 3 S 2 , S 4 S 2 S 3 S 4 S 4 (g) State of the netw ork t = 3 S 3 S 1 S 4 S 2 L 4 L 4 L 4 L 5 (h) State of the aux- iliary graph t = 3 Figure 14: Steps of the construction of the auxiliary graph corresp onding to the net work in Figure 13. The elements of Θ l are lab eled on link l . A link is drawn bold in the original graph if Θ l 6 = ∅ . 49 P m ∈ L ( j ) | Θ m | ≥ | Θ l | > 0 b ecomes grey and sends out signal { neigh b or, j } to link l ; I I The symmetric case of I, where first source j b ecomes grey and one iteration later source i b ecomes grey . I I I In the b eginning of the previous iteration Θ l = ∅ and sources i, j are b oth white. During the previous iteration, some other source t with l ∈ L ( t ) b ecomes grey and sends out the signal { lab el, t } to link l , hence Θ l = { t } . In the current iteration, b oth source i and j with P m ∈ L ( i ) | Θ m | ≥ | Θ l | > 0 and P m ∈ L ( j ) | Θ m | ≥ | Θ l | > 0 b ecome grey and send out signals { neighbor, i } and { neighbor, j } to link l . Hence if an edge connecting no des i and j exists in the auxiliary graph, then in the b eginning of the iteration when the edge is formed at least one of the no des is white, and b y the end of the iteration b oth no des are colored grey and sta y grey . Therefore the edges betw een i and j in the auxiliary graph can only b e formed during exactly one iteration. W e next sho w that only one such edge can b e formed in one iteration. The first tw o cases are symmetric, and without loss of generality we only consider cases I and I I I. In b oth of these cases, an edge b etw een i and j is formed with lab el L l only if link l receiv es the signal { neigh b or, j } and Θ l 6 = ∅ . In step (B.b.1) of the auxiliary graph construction, the first link with Θ l 6 = ∅ stops the signal from passing to the rest of its route, hence at most one edge b et w een i and j can b e generated. Hence part (3) holds. (4) By using a similar analysis as ab o ve, it is straightforw ard to see that if at one iteration source i from the original netw ork b ecomes grey , then in the next iteration all the sources whic h share link with i b ecome grey and are connected to i in the auxiliary graph. By induction, w e conclude that all the no des in the auxiliary graph corre- sp onding to sources colored grey in the original net w ork are connected to the source no de mark ed grey in the initialization step, and hence these no des form a connected comp onen t. W e next show that all no des are colored grey when the auxiliary graph construction pro cedure terminates. W e first argue that at least one no de is mark ed grey from white at each iteration b efore all nodes are marked grey . Assume the contrary is true, that is at some iteration no more no des are marked grey and there exists a set of white no des S ∗ . This implies that the no des in S ∗ do not share an y links with the nodes in S \ S ∗ and thus there is no path from an y source in the set S \ S ∗ to any source in S ∗ using the links (including the feedbac k mechanisms) in the original netw ork. How ever, this con tradicts the fact that all links form a strongly connected graph. Therefore after S − 1 iterations all no des in the original graph are colored grey and therefore we hav e the desired statement hold. (5) Analysis for part (3) suggests that all the connected no des in the auxiliary graph are colored grey . In view of the part (4), all the sources are colored grey when the auxiliary 50 graph construction pro cedure terminates. Step (B.a.1) implies that a link has Θ l = ∅ if all sources i ∈ S ( l ) are white. Since eac h link is used by at least one source, and all sources are grey , part (5) holds. (6) W e prov e part (6) b y showing the auxiliary graph, when the cycles formed b y the edges of the same lab el are remov ed, is acyclic. F or each link l , let i ∗ l denote the first element added to the set Θ l in the auxiliary graph construction pro cess, which is uniquely defined for eac h link l b y Step (B.a.1). In the set S for eac h link l , we define an equiv alence class by i ∼ j if { i, j } ⊂ Θ l \{ i ∗ l } , which implies if and only if i and j are connected in the auxiliary graph and i ∼ j , this link is formed b y scenario I I I as defined ab ov e in the pro of of part (3). The no des in each equiv alence class are connected b y edges with the same label, whic h form the undesired cycles. W e remov e these cycles b y merging eac h equiv alence class in to one representativ e no de, whic h inherits all the edges going b etw een the no des in the equiv alence class and S \ Θ l in the auxiliary graph, and is connected to i ∗ l via one edge. Note the resulting graph is connected, since the auxiliary graph is b y part (4) and all the remaining edges are generated under scenarios I and I I as defined in the pro of of part (3). W e now sho w that the resulting graph con tains no cycle. F rom cases I and I I, it follo ws immediately that an edge is generated when one more source becomes grey . Therefore if num b er of no es is N , w e hav e N − 1 edges. In a connected graph, this implies w e ha ve a tree, i.e. acyclic, and hence part (6) holds. W e denote the set of links inducing edges in the auxiliary graph as L ∗ = { l | | Θ l | > 1 } and for each source i the set of links which induce edges in the auxiliary graph as L ∗ ( i ) = { l | i ∈ Θ l , l ∈ L ∗ } for notational con v enience. Eac h link can iden tify if it is in L ∗ b y the cardinality of the set Θ l . Eac h source i can obtain | L ∗ ( i ) | along the links on its route. The auxiliary graph remains the same throughout the distributed Newton algorithm and only dep ends on the structure of the netw ork (indep enden t of the utilit y functions and link capacities), therefore given a net work, the ab o ve construction only needs to be preformed once prior to execution of the distributed Newton algorithm. W e next presen t a distributed pro cedure to compute the sum in Eq. (81) and show that the sets Θ l constructed using the abov e pro cedure av oids ov er-coun ting and enables computation of the correct v alues. 24 Distributed Summation Pro cedure: • Initialization: Eac h link l initializes to z l (0) = 0. Each source i computes y ∗ i = (∆ ˜ x k i ) 2 ( H k ) ii and eac h link l computes z ∗ l = 1 | S ( l ) | (∆ ˜ x k l + S ) 2 ( H k ) ( l + S )( l + S ) . Each source i 24 Note that the execution of the pro cedure only uses the sets Θ l , L ∗ , and L ∗ ( i ).W e will use the structure of the auxiliary graph in pro ving the correctness of the pro cedure. 51 aggregates the sum y i (0) = y ∗ i + X l ∈ L ( i ) z ∗ l (82) along its route. • Iteration for t = 1 , 2 , . . . , S . The follo wing 3 steps are completed in the order they are presen ted. a. Eac h source i sends its current v alue y i ( t ) to its route. b. Eac h link l uses the y i ( t ) received and computes z l ( t ) = X i ∈ Θ l y i ( t − 1) − ( | Θ l | − 1) z l ( t − 1) . (83) c. Eac h source i aggregates information along its route from the links l ∈ L ∗ ( i ) and computes y i ( t ) = X l ∈ L ∗ ( i ) z l ( t ) − ( | L ∗ ( i ) | − 1) y i ( t − 1) . (84) • T ermination: T erminate after S n umber of iterations. 52 By the diagonal structure of the Hessian matrix H k , the scalars (∆ ˜ x k i ) 2 ( H k ) ii and (∆ ˜ x k l + S ) 2 ( H k ) ( l + S )( l + S ) are a v ailable to the corresp onding source i and link l resp ectively , hence z ∗ l and y ∗ i can be computed using lo cal information. In the ab ov e pro cess, eac h source only uses aggregate information along its route and each link l only uses information from sources i ∈ S ( l ). The ev olution of the distributed summation pro cedure for the sample netw ork in Figure 13 is sho wn in Figures 15 and 16. W e next establish tw o lemmas, which quan tifies the expansion of the t -hop neighborho o d in the auxiliary graph for the links and sources. This will b e k ey in sho wing that the aforemen tioned summation procedure yields the correct v alues at the sources and the links. F or eac h source i , w e use the notation N i ( t ) to denote the set of no des that are connected to node i b y a path of length at most t in the auxiliary graph. Note that N i (0) = { i } . W e sa y that no de i is t -hops away from no de j is the length of the shortest path betw een no des i and j is t . Lemma A.2. Consider a net work and its auxiliary graph with sets { Θ l } l ∈L . F or any link l and all t ≥ 1, we ha ve, N i ( t ) ∩ N j ( t ) = ∪ m ∈ Θ l N m ( t − 1) for i, j ∈ Θ l with i 6 = j. (85) Pr o of. Since the source no des i, j ∈ Θ l , by part (2) of Lemma A.1, they are 1-hop aw ay from all other no des in Θ l . Hence if a source no de n is in N m ( t − 1) for m ∈ Θ l , then n is at most t -hops aw a y from i or j . This yields ∪ m ∈ Θ l N m ( t − 1) ⊂ N i ( t ) ∩ N j ( t ) . (86) On the other hand, if n ∈ N i ( t ) ∩ N j ( t ), then we hav e either n ∈ N i ( t − 1) and hence n ∈ ∪ m ∈ Θ l N m ( t − 1) or n ∈ ( N i ( t ) \N i ( t − 1)) ∩ N j ( t ) . Let P ( a, b ) denote an ordered set of nodes on the path b et w een no des a and b including b but not a for notational conv enience. Then the abov e relation implies there exists a path with | P ( i, n ) | = t and | P ( j, n ) | ≤ t . Let n ∗ ∈ P ( i, n ) ∩ P ( j, n ) and P ( j, n ∗ ) ∩ P ( j, n ∗ ) = ∅ . The no de n ∗ exists, b ecause the tw o paths b oth end at n . If n ∗ 6∈ Θ l , then we ha ve a cycle of { P ( i, n ∗ ) , P ( n ∗ , j ) , P ( j, i ) } , which includes an edge with label L l b et w een i and j and other edges. In view of part (6) of Lemma A.1, this leads to a contradiction. Therefore we obtain n ∗ ∈ Θ l , implying P ( i, n ) = { P ( i, n ∗ ) , P ( n ∗ , n ) } . Since i is connected to all no des in Θ l , | P ( i, n ∗ ) | = 1 and hence | P ( n ∗ , n ) | = t − 1, whic h implies n ∈ N n ∗ ( t − 1) ⊂ ∪ m ∈ Θ l N m ( t − 1). Therefore the ab o ve analysis yields N i ( t ) ∩ N j ( t ) ⊂ ∪ m ∈ Θ l N m ( t − 1) . With relation (86), this establishes the desired equalit y . Lemma A.3. Consider a net work and its auxiliary graph with sets { Θ l } l ∈L . F or any source i , and all t ≥ 1, we ha ve, ( ∪ j ∈ Θ l N j ( t )) ∩ ( ∪ j ∈ Θ m N j ( t )) = N i ( t ) for l , m ∈ L ∗ ( i ) with l 6 = m. (87) 53 y 1 (0) y 2 (0) y 3 (0) y 4 (0) 0 0 0 0 0 0 0 0 0 0 D 1 D 4 D 3 D 2 S 1 S 2 S 4 S 3 (a) State of the netw ork t = 0 S 3 S 1 S 4 S 2 y 3 (0) y 1 (0) y 2 (0) y 4 (0) 0 0 0 0 (b) State of auxiliary graph t = 0 y 2 (0) + y 4 (0) y 1 (0) y 2 (0) y 3 (0) P 3 i =1 y i (0) y 1 (0) y 4 (0) y 2 (0) + y 4 (0) y 2 (0) y 3 (0) y 4 (0) P 3 i =1 y i (0) P 3 i =1 y i (0) P 4 i =1 y i (0) S 1 S 2 S 3 S 4 D 1 D 2 D 3 D 4 (c) State of the netw ork t = 1 S 3 S 1 S 4 S 2 P 3 i =1 y i (0) y 2 (0) + y 4 (0) P 3 i =1 y i (0) P 4 i =1 y i (0) P 3 i =1 y i (0) P 3 i =1 y i (0) P 3 i =1 y i (0) y 2 (0) + y 4 (0) (d) State of auxiliary graph t = 1 P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 3 i =1 y i (0) P 3 i =1 y i (0) P 3 i =1 y i (0) P 3 i =1 y i (0) y 2 (0) + y 4 (0) y 2 (0) + y 4 (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) S 1 S 2 S 3 S 4 D 1 D 2 D 3 D 4 (e) State of the netw ork t = 2 S 3 S 1 S 4 S 2 P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) (f ) State of auxiliary graph t = 2 P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) S 1 S 2 S 3 S 4 D 1 D 2 D 3 D 4 (g) State of the netw ork t = 3 S 3 S 1 S 4 S 2 P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) (h) State of auxiliary graph t = 3 Figure 15: Ev olution of distributed summation pro cess, where ρ i = y i (0) and destination no de is indicated using a dot with the same color as its corresp onding source. 54 P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) S 1 S 2 S 3 S 4 D 1 D 2 D 3 D 4 (a) State of the netw ork t = 4 S 3 S 1 S 4 S 2 P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) P 4 i =1 y i (0) (b) State of auxiliary graph t = 4 Figure 16: Evolution of distributed summation pro cess con tinued, where ρ i = y i (0) and destination no de is indicated using a dot with the same color as its corresp onding source. Pr o of. Since l, m ∈ L ∗ ( i ), we ha ve i ∈ Θ l and i ∈ Θ m , this yields, N i ( t ) ⊂ ( ∪ j ∈ Θ l N j ( t )) ∩ ( ∪ j ∈ Θ m N j ( t )) . On the other hand, assume there exists a no de n with n ∈ ( ∪ j ∈ Θ l N j ( t )) ∩ ( ∪ j ∈ Θ m N j ( t )), and n 6∈ N i ( t ). Then there exists a no de p ∈ Θ l with p 6 = i and n ∈ N p ( t ). Similarly there exists a no de q ∈ Θ m with q 6 = i and n ∈ N q ( t ). Let P ( a, b ) denote an ordered set nodes on the path betw een no des a and b including b but not a for notational con v enience. Let n ∗ ∈ P ( p, n ) ∩ P ( q , n ) and P ( p, n ∗ ) ∩ P ( q , n ∗ ) = ∅ . The no de n ∗ exists, b ecause the tw o paths b oth end at n . Since nodes i, p are connected via an edge with label L l and i, q are connected via an edge with label L m , w e ha ve a cycle of { P ( i, p ) , P ( p, n ) , P ( n, q ) , P ( q , i ) } , whic h con tradicts part (6) in Lemma A.1 and w e hav e ( ∪ j ∈ Θ l N j ( t )) ∩ ( ∪ j ∈ Θ m N j ( t )) ⊂ N i ( t ) . The preceding tw o relations establish the desired equiv alence. Equipp ed with the preceding lemma, we can no w sho w that up on termination of the summation procedure, each source i and link l hav e y i ( S ) = z l ( S − 1) = ( ˜ λ ( x k )) 2 [cf. Eq. (81)]. Theorem A.4. Consider a netw ork and its auxiliary graph with sets { Θ l } l ∈L . Let Ω denote the set of all subsets of S and define the function σ : Ω → R as σ ( K ) = X l ∈∪ i ∈ K L ( i ) z ∗ l X i ∈ K I { l ∈ L ( i ) } + X i ∈ K y ∗ i , where y ∗ i = (∆ ˜ x k i ) 2 ( H k ) ii , z ∗ l = 1 | S ( l ) | (∆ ˜ x k l + S ) 2 ( H k ) ( l + S )( l + S ) and I { l ∈ L ( i ) } is the indicator func- tion for the ev en t { l ∈ L ( i ) } . Let y i ( t ) and z l ( t ) b e the iterates generated by the distributed 55 summation pro cedure described abov e. Then for all t ∈ { 1 , . . . , S } , the v alue z l ( t ) at eac h link satisfies z l ( t ) = σ ( ∪ i ∈ Θ l N i ( t − 1)) , (88) and the v alue y i ( t ) at each source no de satisfies y i ( t ) = σ ( N i ( t )) . (89) Pr o of. W e use induction to pro v e the theorem. Base case: t = 1 . Since z l (0) = 0 for all links, Eq. (83) for t = 1 is z l (1) = X i ∈ Θ l y i (0) = X i ∈ Θ l ( y ∗ i + X l ∈ L ( i ) z ∗ l ) = σ (Θ l ) , where w e use the definition of y (0) [cf. Eq. (82)] and the function σ ( · ). Since N i (0) = i , the ab o v e relation implies Eq. (88) holds. F or source i , from up date relation (84), we ha ve y i (1) = X l ∈ L ∗ ( i ) σ (Θ l ) − ( | L ∗ ( i ) | − 1) y i (0) . Lemma A.3 and inclusion-exclusion principle imply X l ∈ L ∗ ( i ) σ (Θ l ) = σ ( ∪ l ∈ L ∗ ( i ) Θ l ) + ( | L ∗ ( i ) | − 1) σ ( i ) . Since y i (0) = σ ( i ) based on the definition of y i (0) [cf. Eq. (82)], by rearranging the preceding t wo relations, we obtain y i (1) = σ ( ∪ l ∈ L ∗ ( i ) Θ l ) = σ ( N i (1)) , whic h sho ws Eq. (89) holds for t = 1. Inductiv e step for t = T ≥ 2 . Assume for t = T − 1, Eqs. (89) and (88) hold, w e first show that Eq. (88) hold. When t = T , b y up date equation (83), w e obtain for link l z l ( T ) = X i ∈ Θ l y i ( T − 1) − ( | Θ l | − 1) z l ( T − 1) = X i ∈ Θ l σ ( N i ( T − 1)) − ( | Θ l | − 1) z l ( T − 1) , where the second equalit y follows from Eq. (89) for t = T − 1. If | Θ l | = 1, then w e hav e z l ( T ) = σ ( N i ( T − 1)), for i ∈ Θ l , therefore Eq. (88) is satisfied. F or | Θ l | > 1, using Lemma A.2 for t = T and by inclusion-exclusion principle, w e obtain X i ∈ Θ l σ ( N i ( T − 1)) = σ ( ∪ i ∈ Θ l N i ( T − 1)) + ( | Θ l | − 1) σ ( ∪ m ∈ Θ l N m ( T − 2)) . 56 Eq. (88) for t = T − 1 yields z l ( T − 1) = σ ( ∪ m ∈ Θ l N m ( T − 2)). By using this fact and rearranging the preceding t wo relations, w e ha ve Eq. (88) holds for t = T , i.e., z l ( T ) = σ ( ∪ i ∈ Θ l N i ( T − 1)) . W e next establish Eq. (89). F rom update equation (84), using the preceding relation, we ha ve y i ( T ) = X l ∈ L ∗ ( i ) z l ( T ) − ( | L ∗ ( i ) | − 1) y i ( T − 1) = X l ∈ L ∗ ( i ) σ ( ∪ i ∈ Θ l N i ( T − 1)) − ( | L ∗ ( i ) | − 1) y i ( T − 1) . Lemma A.3 and inclusion-exclusion principle imply X l ∈ L ∗ ( i ) σ ( ∪ i ∈ Θ l N i ( T − 1)) = σ ( ∪ l ∈ L ∗ ( i ) ∪ i ∈ Θ l N i ( T − 1)) + ( | L ∗ ( i ) | − 1) σ ( N i ( T − 1)) . By definition of N i ( · ), w e hav e ∪ l ∈ L ∗ ( i ) ∪ i ∈ Θ l N i ( T − 1) = N i ( T ). By using Eq. (89) for t = T − 1, i.e., y i ( T − 1) = σ ( N i ( T − 1)) and rearranging the abov e tw o equations, w e obtain y ( T ) = σ ( N i ( T )) , whic h completes the inductive step. Using definition of the function σ ( · ), w e ha ve ( ˜ λ ( x k )) 2 = σ ( S ). By the ab ov e theorem, w e conclude that after S iterations, y i ( S ) = σ ( N i ( S )) = σ ( S ) = ( ˜ λ ( x k )) 2 . By observing that ∪ i ∈ Θ l N i ( S − 1) = S , we also ha ve z i ( S − 1) = σ ( ∪ i ∈ Θ l N i ( S − 1)) = σ ( S )) = ( ˜ λ ( x k )) 2 , where w e used part (5) of Lemma A.1. This sho ws that the v alue ˜ λ ( x k ) 2 is a v ailable to all sources and links after S − 1 iterations. Note that the n umber S is an upp er b ound on the n um b er of iterations required in the distributed summation pro cess to obtain the correct v alue at the links and sources in the original graph. If the v alue of the diameter of the auxiliary graph (or an upper b ound on it) is known, then the pro cess would terminate in n um b er of steps equal to this v alue plus 1. F or instance, when all the sources share one common link, then the auxiliary graph is a complete graph, and only 2 iterations is required. On the other hand, when the auxiliary graph is a line, the summation pro cedure w ould tak e S iterations. W e finally contrast our distributed summation pro cedure with spanning tree compu- tations, whic h w ere used widely in 1970s and 1980s for p erforming information exc hange 57 among different processors in netw ork flow problems. In spanning tree based approaches, information from all pro cessors is passed along the edges of a spanning tree, and stored at and broadcast b y a designated cen tral root node (see [23] and [10]). In con trast, our sum- mation pro cedure in volv es (scalar) information aggregated along the routes and fed bac k indep enden tly to differen t sources, which is a more natural exc hange mec hanism in an en- vironmen t with decentralized sources. Moreo ver, pro cessors in the system (i.e., sources and links) do not need to maintain predecessor/successor information (as required by spanning tree metho ds). The only net w ork-related information is the sets θ l for l ∈ L kept at the individual links and obtained from the auxiliary graph, whic h is itself constructed using the feedbac k mec hanism described abov e. B Distributed Error Chec king In this section, we present a distributed error c hec king metho d to determine when to termi- nate the dual computation procedure to meet the error tolerance level in Assumption 2 at a primal iteration k . The method inv olv es t wo stages: in the first stage, the links and sources execute a predetermined n umber of dual iterations. In the second stage, if the error tolerance lev el is not satisfied in the previous stage, the links and sources implemen t dual iterations un til some distributed termination criteria is met. F or the rest of this section w e suppress the dep endence of the dual v ector on the primal iteration index k for notational conv enience and we adopt the follo wing assumption on the information av ailable to each node and link. Assumption 5. There exists a p ositive scalar F < 1 suc h that the sp ectral radius of the matrix M = ( D k + ¯ B k ) − 1 ( ¯ B k − B k ) satisfies ρ ( M ) ≤ F . Eac h source and link kno ws the scalar F and the total n umber of sources and links in the graph, denoted b y S and L resp ectiv ely . As noted in Section 4.2, the matrix M = ( D k + ¯ B k ) − 1 ( ¯ B k − B k ) is the w eighted Laplacian matrix of the dual graph, therefore the b ound F can b e obtained once the structure of the dual graph is known [12], [7]. In this assumption, w e only require a v ailabilit y of some aggregate information, and hence the distributed nature of the algorithm is preserv ed. Before w e introduce the details of the algorithm, w e establish a relation b et ween || w ∗ − w ( t ) || ∞ and || w ( t + 1) − w ( t ) || ∞ , whic h is a k ey relation in dev eloping the distributed error c hec king metho d. Lemma B.1. Let the matrix M b e M = ( D k + ¯ B k ) − 1 ( ¯ B k − B k ). Let w ( t ) denote the dual v ariable generated by iteration (18), and w ∗ b e the fixed p oint of the iteration. Let F and L b e the p ositiv e scalar defined in Assumption 5. Then the following relation holds, || w ∗ − w ( t ) || ∞ ≤ √ L 1 − F || w ( t + 1) − w ( t ) || ∞ . (90) Pr o of. Iteration (18) implies that the fixed p oint w ∗ satisfies the following relation, w ∗ = M w ∗ + ( D k + ¯ B k ) − 1 ( − AH − 1 k ∇ f ( x k )) , 58 and the iterates w ( t ) satisfy , w ( t + 1) = M w ( t ) + ( D k + ¯ B k ) − 1 ( − AH − 1 k ∇ f ( x k )) . By combining the ab o v e t wo relations, w e obtain, w ( t + 1) − w ( t ) = ( I − M )( w ∗ − w ( t )) . Hence, by the definition of matrix infinity norm, we ha ve || w ∗ − w ( t ) || ∞ ≤     (1 − M ) − 1     ∞ || w ( t + 1) − w ( t ) || ∞ . Using norm equiv alence for finite dimensional Euclidean space and theories of linear algebra, w e obtain || ( I − M ) − 1 || ∞ ≤ √ L || ( I − M ) − 1 || 2 ≤ √ L 1 −|| M || 2 [16], [2]. F or the symmetric real matrix M w e hav e ρ ( M ) = || M || 2 , and hence w e obtain the desired relation. W e next use the abov e lemma to dev elop tw o theorems eac h of whic h serves as a starting p oin t for one of the stages of the distributed error chec king metho d. Theorem B.2. Let { x k } b e the primal sequence generated b y the inexact Newton method (29) and H k b e the corresp onding Hessian matrix at the k th iteration. Let w ( t ) b e the inexact dual v ariable obtained after t dual iterations (18) and w ∗ b e the exact solution to (11), i.e. the limit of the sequence { w ( t ) } as t → ∞ . Let vectors ∆ x k and ∆ ˜ x k b e the exact and inexact Newton directions obtained using w ∗ and w ( t ) [cf. Eqs. (27)-(28)], and v ector γ k b e the error in the Newton direction computation at x k , defined b y γ k = ∆ x k − ∆ ˜ x k . F or some p ositiv e scalar p , let ρ i =      √ L ( H − 1 k ) ii | L ( i ) | || w ( t + 1) − w ( t ) || ∞ (1 − F )( H − 1 k ) ii [ R 0 ] i w ( t )      (91) for each source i , and ρ l =      √ L P i ∈ S ( l ) ( H − 1 k ) ii | L ( i ) | || w ( t + 1) − w ( t ) || ∞ (1 − F ) P i ∈ S ( l ) ( H − 1 k ) ii [ R 0 ] i w ( t )      (92) for each link l . Define a nonnegative scalar β k as β k =  max  max i ∈S ρ i p , max l ∈L ρ l p  − 2 . (93) Then we ha ve β k ( γ k ) 0 H k γ k ≤ p 2 (∆ ˜ x k ) 0 H k (∆ ˜ x k ) . (94) 59 Pr o of. F or notational con venience, w e let matrix P k denote the S × S principal submatrix of H k , i.e. ( P k ) ii = ( H k ) ii for i ≤ S , vector ∆ ˜ s in R S denote the first S comp onents of the v ector ∆ ˜ x k , vector ∆ ˜ y l in R L denote the last L components of the v ector ∆ ˜ x k . Similarly , we denote by ∆ s and ∆ y the first S and last L components of the exact Newton direction ∆ x resp ectiv ely . F rom Eq. (27), we ha ve for eac h i ∈ S ,     ∆ s i − ∆ ˜ s i ∆ ˜ s i     =     ( H − 1 k ) ii [ R 0 ] i ( w ∗ − w ( t )) ( H − 1 k ) ii [ R 0 ] i w ( t )     ≤     ( H − 1 k ) ii [ R 0 ] i e || w ∗ − w ( t ) || ∞ ( H − 1 k ) ii [ R 0 ] i w ( t )     ≤      √ L ( H − 1 k ) ii | L ( i ) | || w ( t + 1) − w ( t ) || ∞ (1 − F )( H − 1 k ) ii [ R 0 ] i w ( t )      = ρ i , where the first inequalit y follows from the element-wise nonnegativit y of matrices H k and R , and the second inequality follo ws from relation (90). Similarly for each link l ∈ L , by relations (27) and (28) w e obtain     ∆ y l − ∆ ˜ y l ∆ ˜ y l     =     [ R ] l P − 1 k R 0 ( w ∗ − w ( t )) [ R ] l P − 1 k R 0 w ( t )     ≤      P i ∈ S ( l ) ( P − 1 k ) ii R 0 e || w ∗ − w ( t ) || ∞ P i ∈ S ( l ) ( P − 1 k ) ii [ R 0 ] i w ( t )      ≤      √ L P i ∈ S ( l ) ( H − 1 k ) ii | L ( i ) | || w ( t + 1) − w ( t ) || ∞ (1 − F ) P i ∈ S ( l ) ( H − 1 k ) ii [ R 0 ] i w ( t )      = ρ l , where the first inequality follo ws from the structure of the matrix R and the elemen t-wise nonnegativit y of matrices H k and R , and the second inequalit y follo ws from relation (90) and the definition for matrix P k . The definition for β k [cf. Eq. (93)] implies that p p β k = max  max i ∈S ρ i , max l ∈L ρ l  . Therefore the preceding relations imply that    ∆ s i − ∆ ˜ s i ∆ ˜ s i    ≤ p √ β k and    ∆ y l − ∆ ˜ y l ∆ ˜ y l    ≤ p √ β k , i.e., p β k   γ k i   ≤ p | ∆ ˜ x i | , whic h implies the desired relation. Theorem B.3. Let { x k } b e the primal sequence generated b y the inexact Newton method (29) and H k b e the corresp onding Hessian matrix at k th iteration. Let w ( t ) b e the inexact 60 dual v ariable obtained after t dual iterations (18) and w ∗ b e the exact solution to (11), i.e. the limit of the sequence { w ( t ) } as t → ∞ . Let vectors ∆ x k and ∆ ˜ x k b e the exact and inexact Newton directions obtained using w ∗ and w ( t ) [cf. Eqs. (27)-(28)] resp ectiv ely , and vector γ k b e the error in the Newton direction computation at x k , defined b y γ k = ∆ x k − ∆ ˜ x k . F or some scalar β and  where 0 < β k < 1 and  > 0, let h i = r  (1 − β k )( L + S ) L 1 − F | L ( i ) | ( H − 1 2 k ) ii (95) for each source i , and h l = r  (1 − β k )( L + S ) L 1 − F ( H 1 2 k ) ( S + l )( S + l ) P i ∈ S ( L ) | L ( i ) | ( H k ) − 1 ii (96) for each link l . Define a nonnegative scalar h as h =  min  min i ∈S h i , min l ∈L h l  . (97) Then the condition || w ( t + 1) − w ( t ) || ∞ ≤ h (98) implies ( γ k ) 0 H k γ k ≤  1 − β k . (99) Pr o of. W e let matrix P k denote the S × S principal submatrix of H k , i.e. ( P k ) ii = ( H k ) ii for i ≤ S , for notational con v enience. The definition of h [cf. Eq. (97)] and relation (98) implies || w ( t + 1) − w ( t ) || ∞ ≤ h i , ∈ S , and || w ( t + 1) − w ( t ) || ∞ ≤ h l , for l ∈ L , Using relation (90) and the definition of h i and h l [cf. Eqs. (95) and (96)], the ab ov e tw o relations implies resp ectiv ely that || w ∗ − w ( t ) || ∞ ≤ r  (1 − β k )( L + S ) 1 | L ( i ) | ( H − 1 2 k ) ii , for i ∈ S , (100) and || w ∗ − w ( t ) || ∞ ≤ r  (1 − β k )( L + S ) 1 ( H 1 2 k ) ( S + l )( S + l ) P i ∈ S ( L ) | L ( i ) | ( H k ) − 1 ii , for l ∈ L . (101) 61 By using the elemen t-wise nonnegativity of matrices H and A , we ha ve for each source i ,    ( H − 1 2 k ) ii [ R 0 ] i ( w ∗ − w ( t ))    ≤ ( H − 1 2 k ) ii [ R 0 ] i e || w ∗ − w ( t ) || ∞ = | L ( i ) | ( H − 1 2 k ) ii || w ∗ − w ( t ) || ∞ , where the last equalit y follows from the fact that [ R 0 ] i e = | L ( i ) | for eac h source i . The ab ov e inequality and relation (100) imply    ( H − 1 2 k ) ii [ R 0 ] i ( w ∗ − w ( t ))    ≤ r  (1 − β k )( L + S ) . (102) By the definition of matrices P k and R , w e hav e for eac h link l , | ( H 1 2 k ) ( S + l )( S + l ) ( RP − 1 k R 0 ( w ∗ − w ( t )) ) l | ≤ ( H 1 2 k ) ( S + l )( S + l ) [ R ] l P − 1 k R 0 e || w ∗ − w ( t ) || ∞ = ( H 1 2 k ) ( S + l )( S + l ) X i ∈ S ( l ) | L ( i ) | ( H − 1 k ) ii || w ∗ − w ( t ) || ∞ . When combined with relation (101), the preceding relation yields    ( H 1 2 k ) ( S + l )( S + l )  RP − 1 k R 0 ( w ∗ − w ( t ))  l    ≤ r  (1 − β k )( L + S ) . (103) F rom Eqs. (27)-(28) and the definition of γ , we ha v e γ k i = −  P − 1 k R 0 ( w ∗ − w ( t )) RP − 1 k R 0 ( w ∗ − w ( t ))  , whic h implies that ( γ k ) 0 H k γ k = X i ∈S  ( H − 1 2 k ) ii [ R 0 ] i ( w ∗ − w ( t ))  2 + X l ∈L  ( H 1 2 k ) ( S + l )( S + l )  RP − 1 k R 0 ( w ∗ − w ( t ))  l  2 ≤  1 − β k , where the inequality follo ws from (102), (103), which establishes the desired relation. W e dev elop the distributed error chec king method based on the preceding t w o theorems: • Stage 1: The links and sources implemen t T iterations of (18), where T is a prede- termined globally kno wn constan t. The links and sources then use Theorem B.2 with t = T − 1 and p as the desired relativ e error tolerance lev el defined in Assumption 2 to obtain a v alue β k . If β k ≥ 1, then the dual iteration terminates. • Stage 2: The links and sources use Theorem B.3 with β k obtained in the first stage and  defined in Assumption 2 to obtain v alue h . Then they p erform more iterations of the form (18) un til the criterion (98) is satisfied. 25 25 The error tolerance lev el will terminate after finite n umber of iterations for an y h > 0, due to the con v ergence of the sequence w ( t ) established in Section 5.1. 62 Stage 1 corresp onds to c hec king the term p 2 (∆ ˜ x k ) 0 H k (∆ ˜ x k ), while Stage 2 corresp onds to the term  in the error tolerance lev el. If the metho d terminates the dual iterations in Stage 1, then Assumption 2 is satisfied for an y  > 0; otherwise, by combining relations (94) and (99), we ha ve ( γ k ) 0 H k γ k = ( β k + (1 − β k ))( γ k ) 0 H k γ k ≤ p 2 (∆ ˜ x k ) 0 H k (∆ ˜ x k ) + , whic h sho ws that the error tolerance level in Assumption 2 is satisfied. T o show that the ab o ve method can b e implemented in a distributed wa y , we first rewrite the terms ρ i , ρ l , h i and h l and analyze the information required to compute them in a decen tralized wa y . W e use the definition of the weigh ted price of the route Π i ( t ) and obtain Π i ( t ) = ( H − 1 k ) ii P l ∈ L ( i ) w ( t ) = ( H − 1 k ) ii [ R 0 ] i w ( t ) and Π i (0) = ( H − 1 k ) ii | L ( i ) | , where w l (0) = 1 for all links l . Therefore relations (91) and (92) can b e rewritten as ρ i =      √ L Π i (0) || w ( t + 1) − w ( t ) || ∞ (1 − F )Π i ( t )      , ρ l =      √ L P i ∈ S ( l ) Π i (0) || w ( t + 1) − w ( t ) || ∞ (1 − F ) P i ∈ S ( l ) Π i ( t )      . Similarly , relations (95) and (96) can b e transformed into h i = r  (1 − β k )( L + S ) L 1 − F π i (0)( H − 1 2 k ) ii , h l = r  (1 − β k )( L + S ) L 1 − F ( H 1 2 k ) ( S + l )( S + l ) P i ∈ S ( L ) Π i (0) . In our dual v ariable computation pro cedure, the v alues π i (0), Π i (0) and Π i ( t ) are made a v ail- able to all the links source i tra verses through the feedbac k mec hanism describ ed in Section 4.2. Each source and no de kno ws its lo cal Hessian, i.e., ( H k ) ii for source i and ( H k ) ( S + l )( S + l ) for link l . The v alue β k is a v ailable from the previous stage. Therefore in the ab o v e four expressions, the only not immediately av ailable information is || w ( t + 1) − w ( t ) || ∞ , which can be obtained using a maximum consensus algorithm. 26 Based on these four terms, the v alues of β k and h can be obtained using once again maxim um consensus and hence all the comp onen ts necessary for the error c hecking method can b e computed in a distributed wa y . W e observ e that in the first T iterations, i.e., Stage 1, only tw o executions of maxim um consensus algorithms is required, where one is used to compute || w ( t + 1) − w ( t ) || ∞ and the 26 In a maxim um consensus algorithm, each no de starts with some state and up dates its current state with the maxim um state v alue in its neighborho o d (including itself ). Therefore after one round of algorithm, the neigh b orho o d of the no de with maximal v alue has now the maximum v alue, after the diameter of the graph rounds of algorithm, the entire graph reaches a consensus on the maximum state v alue and the algorithm terminates. 63 other for β k . On the other hand, even though the computation of the v alue h in Stage 2 needs only one execution of the maxim um consensus algorithm, the term || w ( t + 1) − w ( t ) || ∞ needs to b e computed at eac h dual iteration t . Therefore the error chec king in Stage 1 can b e completed muc h more efficien tly than in Stage 2. Hence, when w e design v alues p and  in Assumption 2, we should choose p to b e relatively large, which results in an error c hec king metho d that does not en ter Stage 2 frequently , and is hence faster. References [1] S. Ath uraliya and S. Low. Optimization flow con trol with Newton-like algorithm. Jour- nal of T ele c ommunic ation Systems , 15:345–358, 2000. [2] A. Berman and R. J. Plemmons. Nonne gative Matric es in the Mathematic al Scienc es . Academic Press, New Y ork, 1979. [3] D. P . Bertsek as. Convex Optimization The ory . A thena Scien tific, 2009. [4] D. P . Bertsek as and E. M. Gafni. Pro jected Newton Metho ds and Optimization of Multicommo dit y Flows. IEEE T r ansactions on Automatic Contr ol , 28(12), 1983. [5] D. P . Bertsek as, A. Nedic, and A. E. Ozdaglar. Convex A nalysis and Optimization . A thena Scien tific, Cam bridge, MA, 2003. [6] D. Bic kson, Y. T o c k, A. Zymnis, S. Bo yd, and D. Dolev. Distributed large scale net- w ork utility maximization. Pr o c e e dings of the 2009 IEEE International Confer enc e on Symp osium on Information The ory , 2, 2009. [7] N. Biggs. A lgebr aic Gr aph The ory . Cam bridge Universit y Press, second edition, 1993. [8] V. D. Blondel, J. M. Hendric kx, A. Olshevsky , and J. N. Tsitsiklis. Con v ergence in m ultiagent co ordination, consensus, and flo c king. Pr o c e e dings of IEEE CDC , 2005. [9] S. Boyd and L. V andenberghe. Convex Optimization . Cam bridge Univ ersit y Press, 2004. [10] G. H. Bradley , G. G. Bro wn, and G. W. Grav es. Design and implemen tation of large scale primal transshipment algorithms. Management Scienc e , 24(1):1–34, 1977. [11] M. Chiang, S. H. Lo w, A. R. Calderbank, and J.C. Doyle. Lay ering as optimization decomp osition: a mathematical theory of netw ork arc hitectures. Pr o c e e dings of the IEEE , 95(1):255–312, 2007. [12] F. R. K. Chung. Sp e ctr al Gr aph The ory (CBMS R e gional Confer enc e Series in Mathe- matics) . No. 92, American Mathematical So ciety , 1997. [13] R. Cottle, J. Pang, and R. Stone. The Line ar Complementarity Pr oblem . Academic Press, 1992. 64 [14] R. Dem b o, S. Eisenstat, and T. Steihaug. Inexact Newton Metho ds. SIAM Journal on Numeric al Analysis , 19, 1982. [15] A. V. Fiacco and G. P . McCormick. Nonline ar Pr o gr amming: Se quential Unc onstr aine d Minimization T e chniques . SIAM, 1990. [16] R. Horn and C. R. Johnson. Matrix Analysis . Cam bridge Universit y Press, New Y ork, 1985. [17] A. Jadbabaie, J. Lin, and S. Morse. Co ordination of groups of mobile autonomous agen ts using nearest neigh b or rules. IEEE T r ansactions on Automatic Contr ol , 48(6):988–1001, 2003. [18] A. Jadbabaie, A. Ozdaglar, and M. Zargham. A Distributed Newton metho d for netw ork optimization. Pr o c. of CDC , 2009. [19] F. Jarre. In terior-p oint methods for conv ex programming. Applie d Mathematics and Optimization , 26:287–311, 1992. [20] C. T. Kelley . Iter ative Metho ds for Line ar and Nonline ar Equations . SIAM, Philadel- phia, P A, 1995. [21] F. Kelly . Charging and rate control for elastic traffic. Eur op e an T r ansactions on T ele c ommunic ations , 8:33–37, 1997. [22] F. P . Kelly , A. K. Maullo o, and D. K. T an. Rate con trol for communication netw orks: shado w prices, prop ortional fairness, and stability. Journal of the Op er ational R ese ar ch So ciety , 49:237–252, 1998. [23] J. G. Klincewicz. A Newton Metho d for Conv ex Separable Netw ork Flo w Problems. Networks , 13:427–442, 1983. [24] D. C. Lay . Line ar A lgebr a and Its Applic ations . Person Education, third edition, 2006. [25] S. H. Low and D. E. Lapsley . Optimization flow control, I: basic algorithm and conv er- gence. IEEE/A CM T r ansaction on Networking , 7(6):861–874, 1999. [26] B. Mohar. Some Applications of Laplace Eigenv alues of Graphs. In: Hahn, G. and Sabidussi, G. (Eds.) Gr aph Symmetry: A lgebr aic Metho ds and Applic ations , NA TO ASI Series C 497:227–275, 1997. [27] A. Nedic and A. Ozdaglar. Convex Optimization in Signal Pr o c essing and Communi- c ations , chapter Coop erativ e distributed multi-agen t optimization. Eds., Eldar, Y. and P alomar, D., Cam bridge Universit y Press, 2008. [28] A. Nedic and A. Ozdaglar. Distributed subgradient metho ds for m ulti-agent optimiza- tion. A utomatic Contr ol, IEEE T r ansactions on , 54(1):48 –61, Jan 2009. 65 [29] A. Nedic and A. Ozdaglar. Conv ergence rate for consensus with dela ys. Journal of Glob al Optimization , 47(3):437–456, 2010. [30] Y. Nesterov and A. Nemirovskii. Interior-Point Polynomial A lgorithms in Convex Pr o- gr amming . SIAM, 2001. [31] R. Olfati-Sab er and R. M. Murray . Consensus problems in netw orks of agents with switc hing top ology and time-dela ys. IEEE T r ansactions on Automatic Contr ol , 49(9):1520–1533, 2004. [32] A. Olshevsky and J. Tsitsiklis. Conv ergence speed in distributed consensus and a v er- aging. SIAM Journal on Contr ol and Optimization , 48(1):33–35, 2009. [33] R. Srik an t. The Mathematics of Internet Congestion Contr ol (Systems and Contr ol: F oundations and Applic ations) . Birkh¨ auser Boston, 2004. [34] J. N. Tsitsiklis. Pr oblems in De c entr alize d De cision Making and Computation . PhD thesis, Dept. of Electrical Engineering and Computer Science, Massac husetts Institute of Technology , 1984. [35] J. N. Tsitsiklis, D. P . Bertsek as, and M. A thans. Distributed async hronous deterministic and sto chastic gradient optimization algorithms. IEEE T r ansactions on Automatic Contr ol , 31(9):803–812, 1986. [36] R. V arga. Matrix Iter ative Analysis . Prentice-Hall, Inc, Englew o o d Cliffs, NJ, 1965. [37] R. V arga. Gershgorin and His Cir cles . Springer Series in Computational Mathematics, 2004. 66

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment