Sum-Rate Maximization via Convex Optimization Using Subgradient Projections Onto Nonlinear Spectral Radius Constraint Sets
We solve the (weighted) sum-rate maximization problem over the set of achievable rates characterized by a nonlinear spectral radius function. This set has been recently shown to be convex in some practically relevant settings in modern wireless netwo…
Authors: Hiroki Kuroda, Renato Luis Garrido Cavalcante
1 Sum-Rate Maximization via Con v e x Optimization Using Subgradient Projections Onto Nonlinear Spectral Radius Constraint Sets Hiroki Kuroda, Member , IEEE, and Renato L. G. Cav alcante, Member , IEEE Abstract —W e solve the (weighted) sum-rate maximization problem over the set of achievable rates characterized by a nonlinear spectral radius function. This set has been recently shown to be con vex in some practically relev ant settings in modern wireless networks, including cell-less networks. However , even under con vexity , sum-rate maximization is challenging because the nonlinear spectral radius characterization of the achievable rate region is difficult to handle directly . W e over come this difficulty by exploiting subgradient projections onto the level sets of suitably ref ormulated spectral radius functions. Notably , the derived subgradient projection algorithm pro vably con verges to the global optimum of the sum-rate maximization problem under the convexity condition. The efficacy of the proposed algorithm is illustrated in simulations for cell-less networks. Index T erms —Constrained con vex optimization, subgradient projection, nonlinear spectral radius, sum-rate maximization. I . I N T R O D U C T I ON W eighted sum-rate maximization has a long-standing his- tory in the wireless communications literature, with applica- tions spanning cross-layer control, link scheduling, and po wer control, to name a few (see [ 1 ] for an overvie w). Early studies on sum-rate maximization [ 1 ]–[ 9 ] typically considered instantaneous rates defined by individual channel realizations. More recently , driv en in large part by advances in the massive MIMO and cell-less literature, the focus has shifted toward maximizing tractable bounds on Shannon achiev able (ergodic) rates, most notably the use-and-then-forget (UatF) bound [ 10 ]– [ 13 ]. In these formulations, the optimization problems are solved once per channel distribution [ 10 ]–[ 13 ], rather than once per channel realization as in the earlier approaches. This reduced dependence on instantaneous channel state informa- tion is especially valuable in distributed deployments (e.g., cell-less networks), where a per-realization resource-allocation strategy would entail acquiring channel state information from many geographically separated nodes and repeatedly solving an optimization problem on the channel-coherence time scale – an ov erhead often prohibitiv e in practice. The work of the first author was supported in part by the Japan Society for the Promotion of Science under Grants-in-Aid 21K17827. The second author acknowledges the 6G-MIRAI-Harmony project, which has received funding from the Smart Networks and Services Joint Undertaking (SNS JU) under the European Union’ s Horizon Europe research and innovation programme, Grant Agreement No. 10119236. V ie ws and opinions expressed are, howev er , those of the authors only and do not necessarily reflect those of the European Union or the SNS JU (granting authority). Neither the European Union nor the granting authority can be held responsible for them. (Corresponding author: Hir oki Kur oda.) Hiroki Kuroda is with Nagaoka University of T echnology , Niigata 940- 2188, Japan (e-mail: kuroda.h@ieee.org). He conducted part of this work as a guest researcher at Fraunhofer Heinrich Hertz Institute. Renato L. G. Cav alcante is with the Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany (e-mail: renato.cavalcante@hhi.fraunhofer .de) From an abstract mathematical viewpoint, formulations based on instantaneous rates and those based on trackable bounds on ergodic rates are closely connected. In either case, the resulting weighted sum-rate maximization problems are (strongly) NP-hard in general [ 2 ], which forces a tradeoff between prov ably optimal algorithms that are typically slo w and fast heuristics that offer no optimality guarantees [ 1 ]. Representativ e methods in the former category include branch- and-bound schemes [ 14 ] and the polyblock algorithm [ 3 ]. Unfortunately , the comple xity of global approaches becomes prohibitiv e ev en for modest problem sizes. For instance, the polyblock algorithm is reported to be impractical beyond roughly six users [ 3 , Sect. VI]. For this reason, the literature largely relies on fast heuristics, with proposed techniques spanning con v ex relaxations [ 4 , Sect. IV], [ 5 , Sect. 4], [ 6 ], ge- ometric programming formulations [ 7 ], [ 8 ], and the weighted minimum mean-square error (WMMSE) method [ 9 ], among others. Despite the NP-hardness of sum-rate maximization in gen- eral, some special cases can be cast as con ve x optimization problems. Notable examples include the low transmit power regime [ 15 , Sect. V], [ 4 , Sect. III] and settings in which users experience equal interference-plus-noise [ 16 , Sect. V .A]. More recently , the study in [ 17 ] has lifted some of these assumptions by focusing on interference models based on the UatF bound. By using the spectral radius of nonlinear mappings, that study identifies practical conditions in which the set of achiev able rates becomes conv e x. Howe v er , even when con vexity holds, designing efficient algorithms tailored to the resulting formulations have remained elusive, largely owing to the intricate structure of the achiev able rate region, as detailed below . A. P otentially tractable classes of the sum-rate maximization pr oblems and their associated challenges T ypically , weighted sum-rate maximization – whether based on instantaneous signal-to-interference-plus-noise ratio (SINR) or the UatF bound – can be written in the following unified form (see Sect. II-A for mathematical definitions): minimize p ∈P − X n ∈N w n log 1 + p n f n ( p ) , (P) where P is the power constraint set gi ven by P := { p ∈ R N + | ∥ p ∥ ≤ 1 } (1) for a given order -pr eserving norm ∥ · ∥ , N := { 1 , . . . , N } denotes the set of N ∈ N users, w n ∈ R ++ is a positi ve weight 2 (priority) assigned to user n ∈ N , and f n : R N + → R ++ is a strictly-subhomogeneous or der-pr eserving continuous (SSOC) function – called standard interfer ence function in the wireless literature [ 18 ], [ 19 ] – for ev ery n ∈ N . Problem ( P ) can be equiv alently cast as minimize s ∈S − X n ∈N w n log(1 + s n ) , (S) where the achievable SINR r e gion S , i.e., the set of achiev able SINRs of users, is giv en by S := p n f n ( p ) n ∈N ∈ R N + p ∈ P . Under a fairly general assumption (see Sect. II-C for details), by constructing a positively-homogeneous or der-pr eserving continuous (PHOC) mapping G : R N + → R N + from f n ( n ∈ N ) and ∥ · ∥ in ( 1 ), we can rewrite the set S in the following equiv alent form [ 17 ]: S = { s ∈ R N + | ϱ G ( s ) ≤ 1 } , (2) where, for each s ∈ R N + , ϱ G ( s ) ∈ R + is the spectral radius (see Definition II.1 ) of the (possibly nonlinear) PHOC mapping defined by R N + → R N + : p 7→ diag( s )( G ( p )) . Furthermore, by the nonlinear bijection R N + ∋ r = (log (1 + s n )) n ∈N ⇔ ( e r n − 1) n ∈N = s ∈ R N + , Problem ( S ) is equiv alent to minimize r ∈R − w T r , (R) where the achie vable rate re gion R is given by R := { r ∈ R N + | ( ϱ G ◦ E )( r ) ≤ 1 } , (3) E : R N + → R N + : r 7→ ( e r n − 1) n ∈N , (4) and w := ( w n ) n ∈N ∈ R N ++ . A practical suf ficient con- dition for con ve xity of the nonlinear spectral radius func- tion ϱ G : R N + → R + is rev ealed in [ 17 ], with a focus on PHOC mappings G constructed from UatF-based instances of f n ( n ∈ N ) and polyhedral order-preserving norms ∥ · ∥ . It is also shown that con ve xity of ϱ G implies con ve xity of ϱ G ◦ E : R N + → R + [ 17 , Proposition 6]. Both Problems ( S ) and ( R ) are guaranteed to be con ve x under the sufficient condition in [ 17 ]. Ho we ver , in general, Problem ( R ) is more likely to be con v ex than Problem ( S ) because ϱ G ◦ E can be conv e x e ven if ϱ G is noncon v ex. Hence, we focus on solving Problem ( R ), while Problem ( S ) is addressed in the Supplemental Material. Note that, once a solution to Problem ( R ) is obtained, an efficient scheme for computing a solution to Problem ( P ) is av ailable (see Appendix A for details). Even under the con ve xity condition, solving Problem ( R ) is challenging because the projection onto the constraint set R is difficult to compute. In fact, this difficulty already arises even for the simplest linear PHOC mapping G : R N + → R N + : x 7→ M x , where M ∈ R N × N + is a giv en nonnegati ve matrix. In this special case, for ev ery s ∈ R N + , ϱ G ( s ) coincides with the lar gest absolute eigenv alue of diag( s ) M ∈ R N × N + , i.e., the usual notion of the spectral radius of diag( s ) M in linear algebra. Howe ver , typically diag ( s ) M is nonsymmetric for any gi ven s ∈ R N + \ { 0 } , so computing the projection onto R is challenging. Indeed, projection formulas onto le vel sets of the spectral radius function are a v ailable only for symmetric matrices, and the deriv ations rely crucially on the fact that ev ery symmetric matrix admits an orthogonal diagonalization [ 20 , Corollary 24.65]. In contrast, a nonsymmetric matrix need not be diagonalizable, so explicit projection formulas are generally unav ailable outside the symmetric setting. A more challenging situation arises in nonlinear settings, such as when projecting onto le vel sets of the nonlinear spectral radius function ϱ G . This challenge is hard to overcome even with sophisticated splitting schemes such as those in [ 21 ]–[ 27 ] (see also Sect. I-C ). B. Main contributions The first objectiv e of this study is to construct an ef ficient algorithm that provably conv erges to the global optimal value of Problem ( R ) – and hence that of the original problem ( P ) – under the assumption that ϱ G ◦ E : R N + → R + is con ve x. T o this end, we exploit the subgradient pr ojection as a key ingredient to solve Problem ( R ). More precisely , after introduc- ing necessary mathematical notions in Sect. II , in Sect. III-A , we carefully reformulate Problem ( R ) to enable the use of algorithmic framew orks based on subgradient projections. One challenge to ov ercome is that existing framew orks rely on subgradient projection mappings relativ e to con vex functions that are real-valued on vector spaces. As a result, these framew orks are not directly applicable to Problem ( R ) because the function ϱ G is well defined only on the nonne gati v e orthant R N + (see Definitions II.1 and II.2 ). W e address this dif ficulty by carefully translating Problem ( R ) into an equiv alent problem composed of functions from R N to R . Then, in Sect. III-B , we deriv e the proposed algorithm by applying the hybrid steepest descent (HSD) method [ 28 ] to the reformulated problem, and then we pro ve its con v ergence to the global optimum of Problem ( R ) under the assumption of con v exity . In Sect. III-C , by focusing on typical models (Example II.1 ) that appear in cell-less networks, we show that the steps of the proposed algorithm can be efficiently computed at each iteration. The second objectiv e of this study is to illustrate the effec- tiv eness of the proposed algorithm for UatF-based resource allocation in cell-less networks. In Sect. IV -A , we validate the models used to deri ve the algorithm and the associated theoret- ical results. In Sect. IV -B , we verify conditions for conv exity of Problem ( R ) and confirm our theoretical guarantee that the proposed algorithm con ver ges to global optima. Moreov er , the results indicate that the proposed algorithm achieves con ver- gence speed comparable to that of the WMMSE method, a representativ e fast heuristic that may not con ver ge to global optima ev en if Problem ( R ) is con vex. C. Relation to existing studies on optimization Constrained con ve x optimization: For conv ex optimization problems where the projections onto constraint sets are dif- ficult to compute, subgradient projections can be exploited 3 as low-comple xity alternativ es. Since subgradient projection mappings are quasi-nonexpansive , a promising approach is to reformulate the problem of interest as minimizing a dif- ferentiable con vex function over the fixed point set of a quasi-nonexpansi v e mapping (see Sect. II-B for details). This reformulation results in a problem that can be efficiently solved with the HSD method 1 . Indeed, the HSD method has been successfully applied to problems where only the subgradi- ent projections onto constraint sets are computable: minimal antenna-subset selection [ 30 ] and image recovery under non- Gaussian noise [ 31 ]. While the algorithm we propose is also based on the HSD method, the problems considered in [ 30 ], [ 31 ] differ substantially from Problem ( R ) (and also Problem ( S )). As a result, significant new contributions are needed to construct the proposed algorithm and to prov e its conv er gence to the global optimal value of Problem ( R ). Generalized Haugazeau’ s method [ 32 ], [ 33 ] is designed to minimize a strictly con ve x function over the fixed point set of a quasi-nonexpansi ve mapping. Since the cost function in Problem ( R ) is not strictly con ve x, generalized Haugazeau’ s method is not applicable to Problem ( R ). Although the cost function in Problem ( S ) is strictly conv ex, the subproblem that needs to be solved at each iteration of generalized Haugazeau’ s method seems difficult to solve in our setting. Proximal splitting methods [ 21 ]–[ 23 ] are widely used for constrained and/or nonsmooth con v ex optimization problems because of the ability to address comple x problems by splitting them into tractable components [ 34 ]–[ 39 ]. Ho we ver , these methods are not suitable for Problems ( R ) and ( S ) because computing the projections onto level sets of functions in v olv- ing the nonlinear spectral radius function ϱ G is challenging (see Sect. I-A ), and it is also difficult to decompose ϱ G into simpler functions that can be handled by these methods. Epigraphical projection techniques [ 24 ]–[ 27 ] are also used to deal with complicated constraint sets. Howe v er , these tech- niques solv e relax ed problems that are not necessarily the same as the original problems. In addition, these reformulations of Problems ( R ) and ( S ) remain difficult to solve because com- puting the projection onto the epigraph of ϱ G is challenging. Matrix eigen value optimization: If we restrict our attention to linear PHOC mappings G : R N + → R N + : x 7→ M x with M ∈ R N × N + , Problems ( R ) and ( S ) are related to matrix eigen v alue optimization but hav e not been addressed in the literature. While conv e x optimization inv olving eigenv alues of symmetric matrices is well studied [ 40 ]–[ 46 ], these results are difficult to exploit for Problems ( R ) and ( S ). Indeed, for the applications under consideration, the matrix M is nonsymmetric in general, so the matrix diag( s ) M is also nonsymmetric for any given s ∈ R N + \ { 0 } , except for specific combinations of M and s that are unlikely to arise. Note that, although conv e x optimization in v olving singular values of general matrices is well studied, singular values and eigen v alues are distinct notions for nonsymmetric matrices. Since optimization problems in volving the spectral radius of (possibly nonsymmetric) nonnegativ e matrices are notoriously 1 The HSD method was originally proposed for variational inequality problems ov er the fixed point sets of nonexpansive mappings in [ 29 ] and was subsequently generalized for quasi-nonexpansive mappings in [ 28 ]. challenging, only specific classes of problems have been addressed in the literature: minimization or maximization of only the spectral radius ov er the so-called product family of nonnegati ve matrices [ 47 ]–[ 50 ] and minimization of the max norm under the spectral radius constraint on nonnegati ve matrices [ 51 ]. The approaches in [ 47 ]–[ 51 ] are based on the special properties of their problems that are substantially different from Problems ( R ) and ( S ) ev en for linear PHOC mappings. As a result, these approaches are difficult to apply to the problems considered here. I I . M A T H E M A T I C A L P R E L I M I N A R I E S A. Notation and definitions This subsection focuses on unifying terminology across various mathematical and wireless research domains. W e use the con vention that the set N of natural numbers does not include zero. The set of nonnegativ e and positiv e reals are denoted by R + := [0 , ∞ ) and R ++ := (0 , ∞ ) , respecti vely . W e use the notation N := { 1 , . . . , N } with N ∈ N . For a giv en vector x ∈ R N , its n th coordinate is denoted as x n for ev ery n ∈ N . Inequalities in v olving vectors should be understood coordinatewise. For example, giv en x ∈ R N and y ∈ R N , we write x ≤ y if and only if ( ∀ n ∈ N ) x n ≤ y n . W e write the transpose and the Hermitian transpose of a matrix M as M T and M H , respectiv ely . W e denote the diagonal matrix with the entries of x ∈ R N on the main diagonal by diag( x ) ∈ R N × N . A sequence ( x n ) n ∈ N ⊂ R N is said to con ver ge to x ∈ R N if lim n →∞ ∥ x n − x ∥ = 0 for some (and, hence, every) norm ∥ · ∥ on R N . Let X ⊂ R N and Y ⊂ R M . A mapping f : X → Y is said to be continuous if, for every sequence ( x n ) n ∈ N in X conv er ging to x ∈ X , the sequence ( f ( x n )) n ∈ N in Y con v erges to f ( x ) ∈ Y . A set S ⊂ R N is said to be closed if the limit of ev ery conv ergent sequence ( x n ) n ∈ N in S lies in S ; bounded if ( ∃ R > 0)( ∀ x ∈ S ) ∥ x ∥ ≤ R for some (and, hence, e very) norm ∥ · ∥ on R N ; and compact if and only if S is bounded and closed (Heine-Borel theorem). The ( 1 -lower) level set of f : R N → R is denoted by lev ≤ 1 ( f ) := { x ∈ R N | f ( x ) ≤ 1 } . Let C ⊂ R N be a nonempty conv e x set. A function f : C → R is con vex if f ( α x + (1 − α ) y ) ≤ α f ( x ) + (1 − α ) f ( y ) for e very x and y in C and α ∈ (0 , 1) ; and strictly con ve x if f ( α x + (1 − α ) y ) < αf ( x ) + (1 − α ) f ( y ) for every x and y in C and α ∈ (0 , 1) . The set of fixed points of a mapping T : C → C is denoted by Fix( T ) := { x ∈ C | T ( x ) = x } . A norm ∥ · ∥ on R N is said to be or der -pr eserving if ( ∀ x ∈ R N + )( ∀ y ∈ R N + ) x ≤ y ⇒ ∥ x ∥ ≤ ∥ y ∥ ; and polyhedral if there exist L ∈ N and a 1 , . . . , a L in R N + \ { 0 } such that ( ∀ x ∈ R N ) ∥ x ∥ = max l ∈{ 1 ,...,L } a T l | x | , where | x | := ( | x n | ) n ∈N ∈ R N + . A function f : R N + → R + is said to be order -pr eserving if ( ∀ x ∈ R N + )( ∀ y ∈ R N + ) x ≤ y ⇒ f ( x ) ≤ f ( y ) ; positively homogeneous if ( ∀ x ∈ R N + )( ∀ α > 0) f ( α x ) = αf ( x ) ; and strictly subhomogeneous if ( ∀ x ∈ R N + )( ∀ α ∈ (0 , 1)) f ( α x ) > αf ( x ) . Differently from [ 52 ], [ 53 ], we do not exclude x = 0 in our definition of strict subhomogeneity , which can be shown to be equiv alent to scalability ( ∀ x ∈ R N + )( ∀ β > 1) f ( β x ) < β f ( x ) studied in the wireless literature [ 18 ], [ 19 ]. 4 A mapping F : R N + → R N + : x 7→ ( f n ( x )) n ∈N is called a SSOC mapping (respectiv ely , PHOC mapping) if f n : R N + → R + is strictly subhomogeneous, order-preserving, and contin- uous (respectiv ely , positively homogeneous, order-preserving, and continuous) for every n ∈ N . If F : R N + → R N + is strictly subhomogeneous and order-preserving, then we ha ve ( ∀ x ∈ R N + ) F ( x ) > 0 , and hence we usually denote a SSOC mapping as F : R N + → R N ++ . The spectral radius of PHOC mappings is defined as the supremum of all eigen v alues [ 54 , Definition 3.2]: Definition II.1. The spectral radius of a PHOC mapping G : R N + → R N + is defined to be ρ ( G ) := sup { λ ∈ R + | ( ∃ x ∈ R N + \ { 0 } ) G ( x ) = λ x } . (5) W e recall that there always exists a nonneg ati ve scalar and a corresponding vector in R N + \ { 0 } attaining the supremum in ( 5 ) [ 52 , Proposition 5.3.2(ii) and Corollary 5.4.2]. For a linear PHOC mapping G : R N + → R N + : x 7→ M x with M ∈ R N × N + , standard Perron-Frobenius theory shows that ρ ( G ) is identical to the usual notion of the spectral radius of M in linear algebra. The nonlinear spectral radius function ϱ G used in ( 2 ) and ( 3 ) is formally defined as follows. Definition II.2. For a given PHOC mapping G : R N + → R N + , we define the function ϱ G : R N + → R + : x 7→ ρ ( T G, x ) , where, for each x ∈ R N + , we recall that ρ ( T G, x ) is the spectral radius of the PHOC mapping defined by T G, x : R N + → R N + : p 7→ diag( x )( G ( p )) . For a linear PHOC mapping G : R N + → R N + : x 7→ M x with M ∈ R N × N + , by regarding M as this linear mapping (with the standard basis), we also write ϱ M instead of ϱ G . B. Conve x analysis and fixed point theory in Euclidean spaces This subsection presents necessary results of conv ex analy- sis and fixed point theory in a Euclidean space R N equipped with the standard inner product ⟨ x , y ⟩ := x T y for e very ( x , y ) ∈ R N × R N and its induced norm ∥ x ∥ 2 := p ⟨ x , x ⟩ for ev ery x ∈ R N . A mapping T : R N → R N is called Lipschitz continuous on a set S ⊂ R N with Lipschitz constant κ > 0 if ( ∀ x ∈ S )( ∀ y ∈ S ) ∥ T ( x ) − T ( y ) ∥ 2 ≤ κ ∥ x − y ∥ 2 . In particular , T : R N → R N is called nonexpansive if it is Lipschitz continuous on R N with Lipschitz constant 1 . A mapping T : R N → R N is called quasi-nonexpansive if ( ∀ x ∈ R N )( ∀ y ∈ Fix( T )) ∥ T ( x ) − y ∥ 2 ≤ ∥ x − y ∥ 2 . Let C ⊂ R N be a nonempty closed con ve x set. For e very x ∈ R N , there exists a unique point P C ( x ) ∈ C satisfying d 2 ( x , C ) := min y ∈ C ∥ x − y ∥ 2 = ∥ x − P C ( x ) ∥ 2 , (6) and P C : R N → C is called the pr ojection mapping onto C . The subdiffer ential of a conv e x function h : R N → R at x ∈ R N is defined by ∂ h ( x ) := { u ∈ R N | ( ∀ y ∈ R N ) ⟨ y − x , u ⟩ + h ( x ) ≤ h ( y ) } , and u ∈ ∂ h ( x ) is called a subgradient of h at x . Conv ex functions from R N to R hav e the follo wing useful property . Fact II.1 ( [ 20 , Proposition 16.20]) . Let h : R N → R be a con ve x function. Then h is continuous on R N , and ∂ h ( x ) = ∅ for every x ∈ R N . Mor eover , for every bounded subset S of R N , the set { u ∈ ∂ h ( x ) | x ∈ S } is bounded. By Fact II.1 and Fermat’ s rule x ⋆ ∈ argmin x ∈ R N h ( x ) ⇔ 0 ∈ ∂ h ( x ⋆ ) , subgradient projection mappings can be defined as follows. Definition II.3. Let h : R N → R be a con v ex function with lev ≤ 1 ( h ) = ∅ . Let g : R N → R N be a selection of the subdifferential of h , i.e., ( ∀ x ∈ R N ) g ( x ) ∈ ∂ h ( x ) . The subgradient projection mapping T sp( h ) : R N → R N , relativ e to h , onto lev ≤ 1 ( h ) is defined by T sp( h ) ( x ) := ( x − ( h ( x ) − 1) g ( x ) / ∥ g ( x ) ∥ 2 2 , if h ( x ) > 1; x , if h ( x ) ≤ 1 . The follo wing result establishes the con vergence of the HSD method using subgradient projections for a class of constrained con v ex optimization problems. Fact II.2 (HSD method using subgradient projections [ 28 ]) . Suppose that the following conditions hold: (a1) h : R N → R is con vex, K ⊂ R N is compact and con vex, and K ∩ lev ≤ 1 ( h ) = ∅ ; and (a2) Θ : R N → R is conve x and (G ˆ ateaux) differ entiable (see Definition B.1 in Appendix B ) on R N , and its gradient ∇ Θ : R N → R N is Lipschitz continuous on K . Then we have 2 S := argmin x ∈K∩ lev ≤ 1 ( h ) Θ( x ) = argmin x ∈ Fix( P K ◦ T sp( h ) ) Θ( x ) = ∅ . Furthermor e, let x 1 ∈ R N and ( µ k ) k ∈ N ⊂ R + such that lim k →∞ µ k = 0 and P k ∈ N µ k = ∞ , and gener ate the sequence ( x k ) k ∈ N by x k +1 := ( P K ◦ T sp( h ) )( x k ) − µ k ∇ Θ(( P K ◦ T sp( h ) )( x k )) for every k ∈ N . Then lim k →∞ d 2 ( x k , S ) = 0 holds. Remark II.1. In [ 28 , Proposition 6], it is stated as an assump- tion that any selection g : R N → R N of ∂ h (i.e., that satisfies ( ∀ x ∈ R N ) g ( x ) ∈ ∂ h ( x ) ) is bounded on ev ery bounded subset of R N . Since we focus on the finite-dimensional space R N , this condition automatically holds by Fact II.1 . C. Nonlinear spectral radius functions for characterizing achie vable rate re gions This subsection collects the essential properties of the PHOC mapping G and the nonlinear spectral radius function ϱ G used to characterize the achiev able rate re gion R in 2 The mapping P K ◦ T sp( h ) belongs to a class of quasi-shrinking mappings [ 28 , Definition 1] – a special subclass of quasi-nonexpansiv e mappings. 5 Problem ( R ). In particular , we focus on the following class of PHOC mappings F ∥·∥ : R N + → R N + constructed from SSOC mappings F and order-preserving norms ∥ · ∥ by the following procedure gi ven in [ 17 ] (see Sect. IV for its relation to wireless applications). Assumption II.1. Let F : R N + → R N ++ : x 7→ ( f n ( x )) n ∈N be a continuous mapping. For each n ∈ N , we assume that there exist a nonempty set Y n , positive scalars ( u n,y ) y ∈Y n bounded away from zero, and PHOC functions ( g n,y ) y ∈Y n , such that ( ∀ x ∈ R N + ) f n ( x ) = inf y ∈Y n ( g n,y ( x ) + u n,y ) . In addition, giv en an order -preserving norm ∥ · ∥ on R N , we assume that the function ( f n ) ∥·∥ : R N + → R + defined by ( ∀ x ∈ R N + ) ( f n ) ∥·∥ ( x ) := inf y ∈Y n ( g n,y ( x ) + u n,y ∥ x ∥ ) is continuous on R N + for e v ery n ∈ N . (NO TE: we assume ev erywhere continuity to avoid technical digressions.) Fact II.3 ( [ 17 , Lemma 1]) . Under Assumption II.1 , the mapping F is a SSOC mapping, and the mapping F ∥·∥ : R N + → R N + : x 7→ (( f n ) ∥·∥ ( x )) n ∈N is a PHOC mapping. Remark II.2. If f n ( n ∈ N ) and ∥ · ∥ in Problem ( P ) satisfy Assumption II.1 , then the global optimal value of Problem ( P ) is the same as that of Problem ( R ) with G = F ∥·∥ (see Appendix A for the scheme for con verting solutions). The next fact shows properties of the nonlinear spectral radius function ϱ G for the class of PHOC mappings G = F ∥·∥ . Fact II.4 ( [ 17 , Proposition 2]) . Suppose that G = F ∥·∥ holds for some PHOC mapping F ∥·∥ : R N + → R N + constructed fr om F and ∥ · ∥ that satisfy Assumption II.1 . Then, for the function ϱ G in Definition II.2 , each of the following holds: (i) ( ∀ α > 0)( ∀ x ∈ R N + ) ϱ G ( α x ) = αϱ G ( x ) ; (ii) ( ∀ x ∈ R N + ) ϱ G ( x ) = 0 ⇔ x = 0 ; (iii) ϱ G is or der -pr eserving; and (iv) ϱ G is continuous on R N + . The following e xample provides typical instances of F ∥·∥ constructed from F and ∥ · ∥ that appear in UatF-based instances of Problem ( P ) for cell-less networks (see Sect. IV for application perspectiv e). Example II.1. Let F : R N + → R N ++ : x 7→ M x + u for given M ∈ R N × N ++ and u ∈ R N ++ , and ∥ · ∥ be a polyhedral order- preserving norm, i.e., there exist L ∈ N vectors a 1 , . . . , a L in R N + \ { 0 } satisfying ( ∀ x ∈ R N ) ∥ x ∥ = max l ∈{ 1 ,...,L } a T l | x | , where | x | = ( | x n | ) n ∈N . In this case, Assumption II.1 holds, and F ∥·∥ is given by F ∥·∥ : R N + → R N + : x 7→ M x + u ∥ x ∥ . Let G := F ∥·∥ and M l := M + ua T l ∈ R N × N ++ for every l ∈ { 1 , . . . , L } . Then, for the functions ϱ G and ϱ M l ( l = 1 , . . . , L ) given in Definition II.2 , we have ( ∀ x ∈ R N + ) ϱ G ( x ) = max l ∈{ 1 ,...,L } ϱ M l ( x ) . (7) Note that ϱ G also satisfies the properties stated in Fact II.4 (i)– (iv). The mathematical claims in this example can be proved by following the argument of [ 17 , Example 5]. Fact II.5 ( [ 17 , Example 4]) . In the setting of Example II.1 , for every l ∈ { 1 , . . . , L } , G l : R N + → R N + : x 7→ M l x satisfies the assumptions in F act II.4 , and ϱ M l satisfies the pr operties shown in F act II.4 (i)–(iv). I I I . P R O P O S E D M E T H O D In this section, we present the proposed algorithm that prov ably con ver ges to the global optimal value of Problem ( R ) if the composite function ϱ G ◦ E : R N + → R + is con ve x. Based on the discussion in Sect. II-C , we focus on the following class of PHOC mappings G in Problem ( R ). Assumption III.1. W e assume that G = F ∥·∥ holds for some PHOC mapping F ∥·∥ : R N + → R N + constructed as shown in Fact II.3 from a SSOC mapping F and an order-preserving norm ∥ · ∥ that satisfy Assumption II.1 . In Sect. III-A , we reformulate Problem ( R ) in volving func- tions well defined only on R N + to an equi v alent optimization problem composed of functions from R N to R . Then, in Sect. III-B , we deriv e the proposed algorithm by applying the HSD method to the reformulated problem, and then prov e its con v ergence to the global optimum of Problem ( R ) under the con v exity condition. In Sect. III-C , by focusing on the models giv en in Example II.1 , we present a low-comple xity method for computing the subgradient projection used in the proposed algorithm. A. Reformulation to a problem over Euclidean spaces The nonlinear spectral radius function ϱ G is well defined only on R N + because of Definitions II.1 and II.2 . As a result, most con v ex optimization frameworks, including the HSD method, are not directly applicable to Problem ( R ) since these framew orks are designed for functions defined on the entire vector space. Straightforward extensions of the functions in Problem ( R ) to functions taking the extended value + ∞ on R N \ R N + are not suitable to address Problem ( R ) because the subgradient projection mappings are guaranteed to be well defined only for con ve x functions taking real values on the whole space R N (see Definition II.3 and Fact II.1 ). Real- valued conv ex extensions are also necessary to satisfy the assumptions in Fact II.2 that sho ws conv er gence of the HSD method with subgradient projections. Note that, to apply Fact II.2 , we need to translate Problem ( R ) into an equiv alent problem that satisfies all assumptions in Fact II.2 . W e first show that the composite function ϱ G ◦ E : R N + → R + can be extended to a con v ex function from R N to R under the following assumption (see Sect. IV for the validity of this assumption in wireless applications): Assumption III.2. In addition to Assumption III.1 , we assume that ϱ G ◦ E : R N + → R + is con ve x on R N ++ , where ϱ G and E are giv en in Definition II.2 and ( 4 ), respectively . 6 Lemma III.1. Define h G, E : R N → R + : r 7→ ( ϱ G ◦ E ◦ P R N + )( r ) . Then, under Assumption III.2 , each of the following holds: (i) ( ∀ r ∈ R N + ) h G, E ( r ) = ( ϱ G ◦ E )( r ) ; and (ii) h G, E is con ve x on R N . Pr oof. (i) Clear from the definition of h G, E . (ii) W e recall that Fact II.4 holds under Assumption III.1 required by Assumption III.2 . Since ϱ G ◦ E is continuous on R N + by F act II.4 (i v) and the definition of E in ( 4 ), con ve xity of ϱ G ◦ E on R N ++ in Assumption III.2 implies its con v exity on R N + . Choose r 1 ∈ R N , r 2 ∈ R N , and α ∈ (0 , 1) arbitrarily . By definition of P R N + : R N → R N + , we hav e P R N + ( r ) = (max { 0 , r n } ) n ∈N for ev ery r ∈ R N . Hence, con v exity of the function R → R : x 7→ max { 0 , x } implies con v exity of P R N + in each coordinate: P R N + ( α r 1 + (1 − α ) r 2 ) ≤ αP R N + ( r 1 ) + (1 − α ) P R N + ( r 2 ) . Meanwhile, since ϱ G and E are order-preserving by Fact II.4 (iii) and by definition, respecti vely , their composition ϱ G ◦ E is also order-preserving, i.e., ( ∀ x ∈ R N + )( ∀ y ∈ R N + ) x ≤ y ⇒ ( ϱ G ◦ E )( x ) ≤ ( ϱ G ◦ E )( y ) . Altogether , conv exity of h G, E on R N is shown by ( ϱ G ◦ E )( P R N + ( α r 1 + (1 − α ) r 2 )) ≤ ( ϱ G ◦ E )( αP R N + ( r 1 ) + (1 − α ) P R N + ( r 2 )) ≤ α ( ϱ G ◦ E )( P R N + ( r 1 )) + (1 − α )( ϱ G ◦ E )( P R N + ( r 2 )) , where the last inequality follows from con ve xity of ϱ G ◦ E on R N + . Meanwhile, the cost function r 7→ − w T r in Problem ( R ) can be reg arded as a function from R N to R . W e state the following trivial lemma for later reference. Lemma III.2. Let Ψ : R N → R : r 7→ − w T r . Then Ψ is con ve x and differ entiable on R N . Its gradient is given by ( ∀ r ∈ R N ) ∇ Ψ( r ) = − w , and hence it is Lipschitz continuous on ( R N , ∥ · ∥ 2 ) . Under Assumption III.2 , using h G, E giv en in Lemma III.1 and Ψ giv en in Lemma III.2 , we reformulate Problem ( R ) to minimize Ψ( r ) subject to r ∈ [0 , b ] N ∩ lev ≤ 1 ( h G, E ) , (R’) where we also replace R N + with [0 , b ] N using a sufficiently large constant b > 0 in order to av oid technical digressions of little practical relev ance. Under Assumption III.2 , if b is large enough, Problems ( R ) and ( R’ ) are guaranteed to hav e the same nonempty solution set. T o prove this result, we use the following lemma. Lemma III.3. The set { x ∈ R N + | ϱ G ( x ) ≤ 1 } is bounded under Assumption III.1 . Pr oof. Assume for the sake of contradiction that the set S G := { x ∈ R N + | ϱ G ( x ) ≤ 1 } is unbounded. In this case, there exists a sequence ( x n ) n ∈ N in S G \ { 0 } such that lim n →∞ ∥ x n ∥ 2 = ∞ . For every n ∈ N , define d n := x n / ∥ x n ∥ 2 . Since ( d n ) n ∈ N is a bounded sequence, we can use the Bolzano-W eierstrass theorem to extract a con v ergent subsequence ( d n ) n ∈ K ⊂ N from ( d n ) n ∈ N . Denote by d := lim n ∈ K d n the limit of the (sub)sequence ( d n ) n ∈ K , and note that ∥ d ∥ 2 = 1 because, for e very n ∈ K , d n belongs to the set { u ∈ R N | ∥ u ∥ 2 = 1 } , which is closed by continuity of the norm. Therefore, in light of F act II.4 (ii) and (iv), we deduce lim n ∈ K ϱ G ( d n ) = ϱ G ( d ) > 0 , which enables us to obtain the contradiction 1 (a) ≥ lim sup n ∈ N ϱ G ( x n ) ≥ lim sup n ∈ K ϱ G ( ∥ x n ∥ 2 d n ) (b) = lim sup n ∈ K ∥ x n ∥ 2 ϱ G ( d n ) = ∞ , where (a) follo ws from ( ∀ n ∈ N ) x n ∈ S G and (b) follo ws from the positiv e homogeneity of ϱ G [see Fact II.4 (i)]. Proposition III.1. Under Assumption III.2 , the solution set of Pr oblem ( R’ ) is a nonempty compact con vex set for any b > 0 . Mor eover , if b is lar ge enough, the solution sets of Pr oblems ( R ) and ( R’ ) ar e the same. Pr oof. Fix b > 0 arbitrarily . By setting r = 0 in Lemma III.1 (i), we have h G, E ( 0 ) = ( ϱ G ◦ E )( 0 ) = ϱ G ( 0 ) = 0 , implying that the set [0 , b ] N ∩ lev ≤ 1 ( h G, E ) is nonempty . Since h G, E : R N → R is con ve x by Lemma III.1 (ii) and hence continuous by F act II.1 , the le vel set lev ≤ 1 ( h G, E ) is closed and con v ex. Therefore, the set [0 , b ] N ∩ lev ≤ 1 ( h G, E ) is compact and con v ex. Since Ψ : R N → R given in Lemma III.2 is also con v ex, we deduce that the solution set of Problem ( R’ ) is a nonempty compact con ve x set. Next, we show that the solution sets of Problems ( R ) and ( R’ ) are the same for sufficiently large b . By definition of E in ( 4 ), if r ∈ R N + and ∥ r ∥ 2 → ∞ , we hav e E ( r ) ∈ R N + and ∥ E ( r ) ∥ 2 → ∞ . This relation and Lemma III.3 imply that R in ( 3 ) is bounded. Thus, since R ⊂ R N + holds by definition, there exists c > 0 such that ( ∀ b ≥ c ) R = [0 , b ] N ∩ R = [0 , b ] N ∩ lev ≤ 1 ( h G, E ) , where the last equality holds by Lemma III.1 (i) and the definition of R . Hence, for ev ery b ≥ c , the desired result follows from the definition of Ψ . B. Optimization algorithm and its con verg ence guarantee The reformulated problem ( R’ ) is still challenging to solve because the projection onto the lev el set lev ≤ 1 ( h G, E ) is difficult to compute owing to the nonlinear spectral radius function ϱ G in v olved in h G, E . It is also difficult to decompose ϱ G into simpler functions that can be handled by the adv anced splitting strategies re vie wed in Sect. I-C . Fortunately , as shown in Sect. III-C , the subgradient projection mapping relative to h G, E can be computed ef ficiently in (pseudo) closed-form in the scenarios considered in Example II.1 . Motiv ated by this result, we deriv e the proposed algorithm by applying the subgradient-projection-based HSD method shown in Fact II.2 to Problem ( R’ ). 7 Specifically , with an initial point r 1 ∈ R N ++ and a slowly decreasing step-size 3 ( µ k ) k ∈ N ⊂ R N ++ , the proposed algorithm generates the sequence ( r k ) k ∈ N by For k = 1 , 2 , . . . Choose g k ∈ ∂ ( h G, E )( r k ) , ˆ r k := ( r k − ( h G, E ( r k ) − 1) g k / ∥ g k ∥ 2 2 , if h G, E ( r k ) > 1; r k , if h G, E ( r k ) ≤ 1 , ˜ r k := P [0 ,b ] N ( ˆ r k ) , r k +1 := ˜ r k − µ k ∇ Ψ( ˜ r k ) . (8) Since h G, E is a con ve x function from R N to R by Lemma III.1 , the existence of g k is guaranteed by Fact II.1 at each iteration k ∈ N , while a specific method for finding g k is pre- sented in Sect. III-C . The projection mapping P [0 ,b ] N : R N → [0 , b ] N can be computed by ( ∀ x ∈ R N ) P [0 ,b ] N ( x ) = (min { max { 0 , x n } , b } ) n ∈N , (9) and the expression for ∇ Ψ is shown in Lemma III.2 . In the follo wing theorem, we show that, under Assumption III.2 , the distance between ( r k ) k ∈ N and the solution set of Problem ( R’ ) conv er ges to zero, and the weighted sum-rate ( w T r k ) k ∈ N con v erges to (the negati v e of) the global optimal value of Problem ( R ) – and hence that of the original problem ( P ) – if b is large enough. Theorem III.1. Let w ∈ R N ++ , b > 0 , r 1 ∈ R N ++ , and ( µ k ) k ∈ N ⊂ R N ++ satisfy lim k →∞ µ k = 0 and P k ∈ N µ k = ∞ . Then, under Assumption III.2 , each of the following holds for ( r k ) k ∈ N generated by ( 8 ) . (i) Let Ω := argmin r ∈ [ 0 ,b ] N ∩ lev ≤ 1 ( h G, E ) Ψ( r ) . Then lim k →∞ d 2 ( r k , Ω) = 0 ; (ii) ( ∀ k ∈ N ) r k ∈ R N ++ ; (iii) lim k →∞ d 2 ( r k , R ) = 0 ; and (iv) If b in Pr oblem ( R’ ) is lar ge enough, then lim k →∞ w T r k = max r ∈ R w T r . Pr oof. (i) T o prov e this claim via Fact II.2 , we show that conditions (a1) and (a2) in Fact II.2 are satisfied for h = h G, E , K = [0 , b ] N , and Θ = Ψ as follo ws. For any b > 0 , [0 , b ] N is a compact con v ex set. By Lemma III.1 , h G, E : R N → R is con v ex, and [0 , b ] N ∩ lev ≤ 1 ( h G, E ) is nonempty (see the proof of Proposition III.1 ). Hence, condition (a1) is verified. Lemma III.2 ensures that condition (a2) is satisfied for Θ = Ψ . (ii) For ev ery k ∈ N , we hav e ˜ r k = P [0 ,b ] N ( ˆ r k ) ∈ R N + in ( 8 ). W e also hav e ( ∀ r ∈ R N ) − ∇ Ψ( r ) = w ∈ R N ++ and ( µ k ) k ∈ N ⊂ R N ++ by assumption. Hence, the last step in ( 8 ) ensures r k +1 ∈ R N ++ for ev ery k ∈ N . Note that r 1 ∈ R N ++ holds by assumption. (iii) Since R = R N + ∩ lev ≤ 1 ( h G, E ) holds by definition of R in ( 3 ) and Lemma III.1 (i), we have Ω ⊂ [0 , b ] N ∩ lev ≤ 1 ( h G, E ) ⊂ R . 3 W e focus on a positive initial point and a positive step-size in order to deriv e an efficient subgradient computation method in Sect. III-C . By definition of the distance d 2 in ( 6 ), Ω ⊂ R implies d 2 ( r , R ) ≤ d 2 ( r , Ω) for every r ∈ R N , and hence the desired result follows from the claim in (i). (iv) Since Ω is a nonempty closed conv ex set by Proposition III.1 , the projection mapping P Ω : R N → Ω is well defined. Since the solution sets of Problems ( R ) and ( R’ ) are the same for sufficiently large b > 0 by Proposition III.1 , we have Ω = argmax r ∈R w T r for such b . Altogether, the desired result is shown by lim k →∞ w T r k − max r ∈R w T r = lim k →∞ | w T r k − w T P Ω ( r k ) | (a) ≤ lim k →∞ ∥ w ∥ 2 ∥ r k − P Ω ( r k ) ∥ 2 = ∥ w ∥ 2 lim k →∞ d 2 ( r k , Ω) (b) = 0 , where (a) and (b) follow from the Cauchy-Schwarz inequality and the claim in (i), respectiv ely . Remark III.1. The condition for ( µ k ) k ∈ N in Theorem III.1 is satisfied by , e.g., ( ∀ k ∈ N ) µ k = ak − q for any a > 0 and q ∈ (0 , 1] . C. A low-complexity method for computing subgradients T o ef ficiently compute the steps of the proposed algorithm, we only need an efficient method for finding a subgradient at the first step in ( 8 ), since the other steps can be done in O ( N ) operations. Since ( ∀ k ∈ N ) r k ∈ R N ++ is guaranteed in ( 8 ) by Theorem III.1 (ii), it is sufficient to find a subgradient of h G, E at a giv en point in R N ++ . T o this end, we focus on the PHOC mappings G considered in Example II.1 , and make the following additional assumption. Assumption III.3. In addition to the setting of Example II.1 , we assume that ϱ M l ◦ E is con ve x on R N ++ for every l ∈ { 1 , . . . , L } . Remark III.2. By combining [ 55 , Theorem 4.3] and [ 17 , Proposition 6], we deduce that Assumption III.3 holds if M l is an inver se Z-matrix [ 54 ], [ 56 ] (i.e., M l has an inv erse matrix whose off-diagonal components are nonpositive) for ev ery l ∈ { 1 , . . . , L } . Ho wev er , this condition is just sufficient for Assumption III.3 : ϱ M l ◦ E can be con ve x even if M l is not an inv erse Z-matrix. For v alidity of Assumption III.3 in wireless applications, see Sect. IV . Lemma III.4. Consider the setting of Example II.1 . If As- sumption III.3 holds, then G given in Example II.1 satisfies Assumption III.2 . Pr oof. Assumption III.1 , which is required by Assumption III.2 , holds by definition of G in Example II.1 . By definition of E in ( 4 ), we have E ( x ) ∈ R N + for e very x ∈ R N + . Hence, by the equality in ( 7 ), we have ( ∀ x ∈ R N + ) ( ϱ G ◦ E )( x ) = max l ∈{ 1 ,...,L } ( ϱ M l ◦ E )( x ) . (10) This shows that ϱ G ◦ E is the pointwise maximum of conv ex functions on R N ++ , and hence ϱ G ◦ E is con ve x on R N ++ . Under Assumption III.3 , the follo wing proposition provides an efficient way to compute a subgradient of h G, E at any point in the positiv e orthant. 8 Proposition III.2. Consider the setting of Example II.1 . F ix r ∈ R N ++ arbitrarily , and let l ⋆ ∈ argmax l ∈{ 1 ,...,L } ( ϱ M l ◦ E )( r ) = argmax l ∈{ 1 ,...,L } ρ (diag( E ( r )) M l ) . Then, under Assumption III.3 , we have diag(( e r n ) n ∈N )diag( η ) M l ⋆ ξ η T ξ ∈ ∂ h G, E ( r ) , (11) wher e ξ ∈ R N ++ and ζ ∈ R N ++ ar e positive right and left eigen vectors of diag ( E ( r )) M l ⋆ ∈ R N × N ++ corr esponding to the spectral radius ρ (diag ( E ( r )) M l ⋆ ) > 0 , i.e., positive vec- tors satisfying diag ( E ( r )) M l ⋆ ξ = ρ (diag( E ( r )) M l ⋆ ) ξ and η T diag( E ( r )) M l ⋆ = ρ (diag( E ( r )) M l ⋆ ) η T , r espectively . 4 Pr oof. Since Lemma III.4 shows that Assumption III.3 implies Assumption III.2 , we can construct h G, E defined in Lemma III.1 . Meanwhile, for e v ery l ∈ { 1 , . . . , L } , G l : R N + → R N + : x 7→ M l x satisfies Assumption III.1 by Fact II.5 . Owing to Assumption III.3 , G l satisfies the conditions in Assumption III.2 . Hence, for ev ery l ∈ { 1 , . . . , L } , applying Lemma III.1 , we deduce that the function h G l , E : R N → R + : x 7→ ( ϱ M l ◦ E ◦ P R N + )( x ) is con v ex on R N , and ( ∀ x ∈ R N + ) h G l , E ( x ) = ( ϱ M l ◦ E )( x ) . (12) Since ( ∀ x ∈ R N ) P R N + ( x ) ∈ R N + holds, using the equality in ( 10 ), we further obtain ( ∀ x ∈ R N ) h G, E ( x ) = max l ∈{ 1 ,...,L } h G l , E ( x ) . (13) Now consider r and l ⋆ giv en in the statement of this propo- sition. Since h G l , E is a con v ex function from R N to R for ev ery l ∈ { 1 , . . . , L } , g ∈ ∂ h G l ⋆ , E ( r ) implies g ∈ ∂ h G, E ( r ) by ( 13 ) and Fact B.1 in Appendix B . Hence it is sufficient to show that the left-hand side of ( 11 ) is a subgradient of h G l ⋆ , E at r . By definition of E in ( 4 ), E is Fr ´ echet dif ferentiable at r ∈ R N ++ , and E ( r ) ∈ R N ++ holds. Since M l ⋆ ∈ R N × N ++ holds in the setting of Example II.1 , ϱ M l ⋆ is Fr ´ echet differentiable at E ( r ) ∈ R N ++ by Fact B.2 . Hence, by the chain rule (see, e.g., [ 20 , Fact 2.63]), ϱ M l ⋆ ◦ E is (Fr ´ echet) differentiable at r ∈ R N ++ , and its gradient is giv en by ∇ ( ϱ M l ⋆ ◦ E )( r ) = diag(( e r n ) n ∈N ) ∇ ϱ M l ⋆ ( E ( r )) = the left-hand side of ( 11 ) , where the last equality follo ws from the expression of ∇ ϱ M l ⋆ in Fact B.2 . Since the equality in ( 12 ) holds in a neighborhood of r ∈ R N ++ , ∇ ( ϱ M l ⋆ ◦ E )( r ) is the gradient of h G l ⋆ , E at r , and hence is a (unique) subgradient of h G l ⋆ , E at r by [ 20 , Proposition 17.31]. Remark III.3 (Optimality of Algorithm 1 ) . Algorithm 1 is a particular instance of the proposed algorithm ( 8 ), obtained by 4 Since diag( E ( r )) M l ⋆ is a positiv e matrix, ρ (diag ( E ( r )) M l ⋆ ) is iden- tical to the spectral radius of diag( E ( r )) M l ⋆ in standard linear algebra, and hence the existence of positive right and left eigenv ectors is guaranteed by standard Perron-Frobenius theory (see, e.g., [ 57 , Theorem 8.2.8]). Algorithm 1: for Problem ( R ) using Example II.1 Input: M l ∈ R N × N ++ ( l = 1 , . . . , L ) , w ∈ R N ++ , r 1 ∈ R N ++ , ( µ k ) k ∈ N ⊂ R N ++ satisfying lim k →∞ µ k = 0 and P k ∈ N µ k = ∞ , and large enough b > 0 . for k = 1 , 2 , . . . do γ k = max l ∈{ 1 ,...,L } ρ (diag( E ( r k )) M l ) ; Choose l ⋆ k ∈ argmax l ∈{ 1 ,...,L } ρ (diag( E ( r k )) M l ) ; if γ k > 1 then Find ξ k ∈ R N ++ s.t. diag( E ( r k )) M l ⋆ k ξ k = γ k ξ k ; Find η k ∈ R N ++ s.t. η T k diag( E ( r k )) M l ⋆ k = γ k η T k ; g k = (diag(( e r n ) n ∈N )diag( η k ) M l ⋆ k ξ k ) / ( η T k ξ k ) ; ˆ r k = r k − ( γ k − 1) g k / ∥ g k ∥ 2 2 ; else ˆ r k = r k ; ˜ r k = P [0 ,b ] N ( ˆ r k ) ; // See ( 9 ) r k +1 = ˜ r k − µ k ∇ Ψ( ˜ r k ) ; // See Lemma III.2 employing the subgradient characterized in Proposition III.2 , which is valid under Assumption III.3 . Since Assumption III.3 implies that G given in Example II.1 satisfies Assumption III.2 (see Lemma III.4 ), con ver gence of Algorithm 1 to the global optimal value of Problem ( R ) – and hence that of the original problem ( P ) – is guaranteed under Assumption III.3 by Theorem III.1 . Meanwhile, con ver gence of Algorithm 2 – dev eloped in the Supplemental Material for Problem ( S ) – to the global optimal v alue of Problem ( P ) is proved under the con ve xity of ϱ M l : R N + → R + for every l ∈ { 1 , . . . , L } . This con vexity implies Assumption III.3 by [ 17 , Proposition 6], but the con- verse does not hold in general (see Sect. IV -B for examples). Thus, conv ergence of Algorithm 1 to the global optimal value of Problem ( P ) is guaranteed in substantially more general scenarios than Algorithm 2 . Remark III.4 (Computational cost of Algorithm 1 ) . At each iteration k ∈ N , the computational cost of Algorithm 1 is dominated by the ev aluation of ρ (diag ( E ( r k )) M l ) for each l = 1 , . . . , L . Since diag( E ( r k )) M l is guaranteed to be a positiv e matrix by Theorem III.1 (ii) and the setting of Example II.1 , we just need to compute the spectral radius in the sense of standard linear algebra, i.e., the largest absolute eigenv alue of the positi ve matrix diag( E ( r k )) M l for each l = 1 , . . . , L . Hence, the computational cost of Algorithm 1 is considered fairly low because we can exploit the rich literature on linear algebra (e.g., [ 58 , Ch. 7]) for efficient computation of largest absolute eigen v alues of positi ve matrices. Remark III.5 (Applicability of Algorithm 1 to general sce- narios in Example II.1 ) . Although Algorithm 1 is derived under Assumption III.3 , ev en without this assumption, the steps of Algorithm 1 are computable if M l ∈ R N × N ++ for each l ∈ { 1 , . . . , L } . At the first iteration k = 1 , since r 1 ∈ R N ++ by assumption, there exist positive right and left eigen v ectors of diag( E ( r 1 )) M l ∈ R N × N ++ corresponding to the spectral radius ρ (diag( E ( r 1 )) M l ) > 0 (see F ootnote 4 ). Hence, Algorithm 1 can compute ˆ r 1 and then ˜ r 1 and r 2 . 9 Similarly to the proof of Theorem III.1 (ii), we can ensure r 2 ∈ R N ++ , and hence ˆ r 2 is computable. Repeating this process, we deduce that the whole sequence ( r k ) k ∈ N can be generated by Algorithm 1 ev en if Assumption III.3 does not hold. This fact is important in practice because it shows that there is no need to verify Assumption III.3 to apply Algorithm 1 to, e.g., resource allocation in cell-less networks (see also Sect. IV below). I V . A P P L I C A T I O N S I N C E L L - L E S S N E T W O R K S A. System model T o illustrate the preceding results in a concrete application, we consider the uplink of a cell-less network model similar to [ 17 ], [ 59 ]. Specifically , N ∈ N users represented by the set N = { 1 , . . . , N } transmit data to K 1 ∈ N access points, each equipped with K 2 ∈ N antennas. For each user n ∈ N , the aggregated channel to all access points is modeled by a random vector h n with samples taking values in C K 1 K 2 . (For brevity , the underlying probability space is omitted, and all subsequent operations on random v ariables, such as expectations, are assumed to be well defined.) Likewise, the joint beamforming vector applied by all access point to the signal of user n ∈ N is modeled as a C K 1 K 2 -valued random vector denoted by v n . W e assume that the beamformer v n is not a function of the transmit po wer vector p = [ p 1 , . . . , p N ] T ∈ R N + , where p n denotes the transmit po wer of user n ∈ N . A classical beamforming scheme satisfying this assumption, which is also used in our numerical simulations, is conjugate beamforming. In this scheme, the beamformers ( v n ) n ∈N are chosen as the corresponding channel estimates ( h n ) n ∈N , with entries set to zero at the coordinates of access points that do not participate in detecting the symbols of user n . In this setting, given a power vector p ∈ R N ++ , the achiev able rate r n of user n ∈ N based on the UatF bound is giv en by r n := log(1 + s n ) [nats per symbol], where s n is the effecti v e SINR [ 59 ]: s n := p n | E [ h H n v n ] | 2 p n V ( h H n v n ) + P l ∈N \{ n } p l E [ | h H l v n | 2 ] + E [ ∥ v n ∥ 2 2 ] , (14) and E ( · ) and V ( · ) denote, respectiv ely , the expectation and the v ariance of a random variable. W ith the above definitions, weighted sum-rate maximization based on the UatF bound can be expressed as in Problem ( P ) by setting f n : R N + → R ++ : p 7→ m T n p + u n (15) for ev ery n ∈ N , where m n := | E [ h H n v n ] | − 2 c n ∈ R N ++ ; c n ∈ R N ++ is the vector with l th entry gi v en by V ( h H n v n ) if l = n , or E [ | h H l v n | 2 ] otherwise; and u n := | E [ h H n v n ] | − 2 E [ ∥ v n ∥ 2 2 ] > 0 . In particular , for conjugate beamforming, the vectors m n ( n ∈ N ) and u are typically positiv e, which is the assumption we adopt here. If we impose a per-user power constraint p max > 0 , the set of feasible power is giv en by P = { p ∈ R N + | ∥ p ∥ ≤ 1 } , where ∥ · ∥ is the scaled ℓ ∞ -norm defined by ( ∀ p ∈ R N ) ∥ p ∥ := (1 /p max ) ∥ p ∥ ∞ = max l ∈{ 1 ,...,N } a T l | p | , and a l ∈ R N + is the vector of zeros, expect for its l th entry , which is set to 1 /p max . In this setting, as shown in [ 17 ] and re vie wed in Sect. I-A , Problem ( P ) can be written using achiev able rates as the optimization variables as Problem ( R ) by using the following mapping G in the definition of the achiev able rate re gion R in ( 3 ): G : R N + → R N + : p 7→ M p + u ∥ p ∥ , where M := [ m 1 , . . . , m N ] T and u := [ u 1 , . . . , u N ] T . The matrix M and the vector u are typically positive for conjugate beamforming, implying that the setting of Example II.1 is verified, so ϱ G in Problem ( R ) can be expressed as in ( 7 ) with L = N . As a result, Algorithm 1 provably con v erges to the global optimal value of Problem ( R ) whenever Assumption III.3 holds (see Remark III.3 ). W e recall that a sufficient (though not necessary) condition for Assumption III.3 is that the matrices ( M l ) l ∈{ 1 ,...,N } introduced in Example II.1 are in v erse Z-matrices. B. Simulations For the numerical simulations, we consider the same settings as in [ 17 , Sect. V], [ 59 ]. Briefly , we consider the uplink of a small cell-less network with four access points, each equipped with two antennas, placed uniformly at random in a 100m × 100m square area. There are three single-antenna users also dropped uniformly at random in the same area, and each user is associated with the two access points that provide the strongest channels. All other simulations parameters (transmit frequency , system bandwidth, channel models, etc.) are the same as described in [ 59 , Sect. III-D], so we omit the details for brevity . In the simulations, f n ( n ∈ N ) in Problem ( P ) is defined according to ( 15 ) using the quantities in the UatF bound sho wn in ( 14 ), where the expectations are approximated via empirical av erages with 100 samples of the channels. For simplicity , we use the uniform weight ( ∀ n ∈ N ) w n = 1 in Problems ( P ), ( R ), and ( S ). W e first in v estigate the validity of Assumption III.3 in 1000 simulations of user drops. The matrices ( M l ) l ∈{ 1 ,...,N } hav e been found to be in verse Z-matrices in 19 . 3 % of simulations. Since this condition is just suf ficient for Assumption III.3 , we also check the necessary condition for con ve xity of ϱ M l ◦ E : R N + → R + ( l = 1 , . . . , N ) on discrete points: ( ϱ M l ◦ E )( α ¯ r 1 + (1 − α ) ¯ r 2 ) ≤ α ( ϱ M l ◦ E )( ¯ r 1 ) + (1 − α )( ϱ M l ◦ E )( ¯ r 2 ) , where 10000 pairs of ¯ r 1 and ¯ r 2 are uniformly drawn from [0 , 5] , and α is selected from 99 uniformly spaced points between 0 . 01 and 0 . 99 . This necessary condition has been satisfied in 99 . 7 % of simulations, suggesting that Assumption III.3 may hold true in most simulations. Similarly , we check the necessary condition for conv exity of ϱ M l : R N + → R + ( l = 1 , . . . , N ) – assumed for Algorithm 2 dev eloped in the Sup- plemental Material for Problem ( S ) – on discrete points: ϱ M l ( α ¯ s 1 + (1 − α ) ¯ s 2 ) ≤ αϱ M l ( ¯ s 1 ) + (1 − α ) ϱ M l ( ¯ s 2 ) + ϵ, 10 0 50 100 150 200 Iteration number 1 1.5 2 2.5 3 3.5 4 Sum-rate [b/s/Hz] Algorithm 1 Algorithm 2 WMMSE (a) WMMSE (b) (a) 0 50 100 150 200 Iteration number 0.2 0.4 0.6 0.8 1 1.2 1.4 Nonliner spectral radius Algorithm 1 Algorithm 2 (b) Fig. 1. Behavior of algorithms for a simulation where M l is an inverse Z- matrix for every l ∈ { 1 , . . . , L } . where ϵ = 10 − 13 is introduced to avoid the issue of nu- merical error . This condition has been satisfied in 77 . 7 % of simulations. Hence, Assumption III.3 is expected to hold in substantially more general scenarios than the con ve xity condition of ϱ M l ( l = 1 , . . . , N ) . Note that we hav e the mathematical fact that con v exity of ϱ M l implies conv exity of ϱ M l ◦ E [ 17 , Proposition 6]. Next, we examine the conv ergence of Algorithm 1 for Prob- lem ( R ) and Algorithm 2 for Problem ( S ). As a benchmark, we also consider the WMMSE method from [ 9 ], applied to UatF-based instances of Problem ( P ). The WMMSE method is a well-known heuristic widely used in the cell-less literature [ 13 ], [ 60 ]. It is a block coordinate descent algorithm applied to a weighted MMSE reformulation of the sum-rate optimization problem. Since this reformulation is typically noncon v ex in general, the behavior of the WMMSE algorithm is strongly initialization-dependent: different starting points may lead to different suboptimal points, ev en in cases where Problems ( R ) and ( S ) are conv ex. For example, follo wing the recommenda- tion in [ 60 , Alg. 7.2] to initialize within the feasible set P , we verify that the WMMSE algorithm tends to con v erge to different (typically suboptimal) points when started from (a) p 1 = p max [1 , 1 , 1] T ∈ P and (b) p 1 = p max [1 , 0 , 1] T ∈ P . 5 Fig. 1 sho ws the results of a user placement scenario where ( M l ) l ∈{ 1 ,...,N } are in verse Z-matrices. In this scenario, con- ver gence of Algorithm 1 to the global optimum of Problem ( R ) is ensured by Theorem III.1 if the initial point r 1 is positive and the positi ve step-size ( µ k ) k ∈ N satisfies lim k →∞ µ k = 0 and P k ∈ N µ k = ∞ . Likewise, Problem ( S ) is also guaranteed to be conv ex in this setting, and con v ergence of Algorithm 2 to the global optimum of Problem ( S ) is also guaranteed by Theorem D.1 in the Supplemental Material if the initial point s 1 is positive and the positiv e step-size ( µ k ) k ∈ N satisfies the abov e condition. In particular, we use r 1 = [0 . 5 , 0 . 5 , 0 . 5] T and ( ∀ k ∈ N ) µ k = 0 . 4 k − 0 . 999 for Algorithm 1 , and s 1 = [0 . 5 , 0 . 5 , 0 . 5] T and ( ∀ k ∈ N ) µ k = 1 . 6 k − 0 . 999 for Algorithm 2 . Fig. 1 (a) depicts the sum-rate at iteration k ∈ { 1 , . . . , 200 } : w T r k with r k generated by Algorithm 1 , P n ∈N w n log(1 + s k,n ) with [ s k, 1 , . . . , s k,N ] T = s k generated by Al- gorithm 2 , and P n ∈N w n log(1 + p k,n /f n ( p k )) with = 5 The authors gratefully acknowledge Isabel von Stebut for the suggestion on initialization of the WMMSE method. 0 50 100 150 200 Iteration number 1.2 1.4 1.6 1.8 2 2.2 2.4 2.6 2.8 3 Sum-rate [b/s/Hz] Algorithm 1 Algorithm 2 WMMSE (a) WMMSE (b) (a) 0 50 100 150 200 Iteration number 0.4 0.6 0.8 1 1.2 1.4 1.6 Nonliner spectral radius Algorithm 1 Algorithm 2 (b) Fig. 2. Behavior of algorithms for a simulation where M l is not an in verse Z-matrix for some l ∈ { 1 , . . . , L } . [ p k, 1 , . . . , p k,N ] T = p k generated by the WMMSE method using the initial points (a) and (b), respecti vely . Fig. 1 (b) depicts the v alue of the nonlinear spectral radius function at iteration k ∈ { 1 , . . . , 200 } : ( ϱ G ◦ E )( r k ) for Algorithm 1 and ϱ G ( s k ) for Algorithm 2 . Our theoretical results are validated in Fig. 1 (a) and (b) because Algorithms 1 and 2 both con ver ge to the same sum-rate value, and they ev entually satisfy the constraints in Problems ( R ) and ( S ). Note that the nonnegati v e constraints in Problems ( R ) and ( S ) are satisfied at ev ery iteration k ∈ N (see Theorems III.1 (ii) and D.1 (ii)). In Fig. 1 , Algorithms 1 and 2 exhibit con ver gence speed comparable to the fast WMMSE heuristic. Using the solutions produced by Algorithms 1 and 2 as globally optimal refer- ences, we observe that the WMMSE algorithm initialized at point (a) also appears to con ver ge to the global optimal v alue in this experiment. Howe v er , unlike the proposed methods, it seems difficult – beyond the specific setting considered here – to characterize which starting points for the WMMSE algorithm lead to the global optimal value. In Fig. 2 , we use the same settings used to produce [ 17 , Fig. 3]. It corresponds to a user placement scenario where M l is not an in v erse Z-matrix for ev ery l ∈ { 1 , . . . , N } . While the condition on M l to be an in v erse Z-matrix is just suf ficient for con v exity of ϱ M l , we hav e numerically confirmed that ϱ M l is not conv ex for e very l ∈ { 1 , . . . , N } . On the other hand, numerical ev aluations of ϱ M l ◦ E ( l = 1 , . . . , N ) suggest that Assumption III.3 may indeed hold. Hence, we conjecture that the con ver gence guarantee in Theorem III.1 also applies in this scenario. The initializations and step-sizes for Algorithms 1 and 2 are chosen exactly as in the experiment of Fig. 1 . While the guarantee in Theorem D.1 for Algorithm 2 no longer applies here since ϱ M l is not con ve x, both Algorithms 1 and 2 empirically con v erge to the same sum-rate v alue in this setting. Nev ertheless, we recommend using Algorithm 1 in general, because its con ver gence to the global optimum of Problem ( R ) is guaranteed under substantially more general conditions than those av ailable for Algorithm 2 . V . S U M M A RY W e have addressed the (weighted) sum-rate maximization problem over the achiev able rate region expressed as the nonlinear spectral radius constraint set. This set tends to be con v ex in UatF-based resource allocation in cell-less networks. 11 Nev ertheless, ev en under con ve xity , sum-rate maximization is challenging because the projections onto the nonlinear spectral radius constraint sets are dif ficult to compute. W e hav e resolved this difficulty by exploiting the subgradient projections onto the lev el sets of the carefully reformulated spectral radius functions. In particular , we have demonstrated that the proposed algorithm prov ably con v erges to the global optimum of the sum-rate maximization problem under the typical conditions in cell-less networks. A P P E N D I X A C O M P U T I N G A S O L U T I O N T O P RO B L E M ( P ) F R O M A S O L U T I O N T O P R O BL E M ( R ) Once a solution to Problem ( R ) is obtained, we can compute a solution to the original sum-rate maximization problem ( P ) as follows [ 17 , Sect. IV .E]: Fact A.1. Suppose that F : R N + → R N ++ : x 7→ ( f n ( x )) n ∈N with f n ( n ∈ N ) in Pr oblem ( P ) is a SSOC mapping satisfying Assumption II.1 with the or der -pr eserving norm ∥ · ∥ in ( 1 ) . Let F ∥·∥ be the PHOC mapping constructed as shown in F act II.3 fr om F and ∥ · ∥ , r ⋆ ∈ R N + be a solution to Pr oblem ( R ) using G = F ∥·∥ , I := { n ∈ N | r ⋆ n > 0 } , D ⋆ := diag (( e r ⋆ n − 1) n ∈I ) ∈ R |I |×|I | ++ , F I : R |I | + → R |I | ++ be the mapping obtained by extr acting the coordinates of F in I , 6 and T F, I , D ⋆ : R |I | + → R |I | ++ : q 7→ D ⋆ ( F I ( q )) . Then p ⋆ ∈ R N + given by ( p ⋆ n ) n ∈I ∈ Fix( T F, I , D ⋆ ) = { q ∈ R |I | ++ | D ⋆ ( F I ( q )) = q } , p ⋆ n = 0 ( ∀ n ∈ N \ I ) is a solution to Pr oblem ( P ) . Since F is a SSOC mapping by assumption, so is the mapping T F, I , D ⋆ . As a result, since ( ϱ G ◦ E )( r ⋆ ) ≤ 1 holds for the solution r ⋆ to Problem ( R ) using G = F ∥·∥ , the fixed point of T F, I , D ⋆ uniquely exists [ 17 , Sect. III.C]. Moreov er , the uniquely existing fixed point can be computed with the standard fixed point iterations [ 18 ], [ 61 ] or with acceleration methods [ 62 ]. If F is also a positive concave mapping , con ver gence is guaranteed to be geometric in the standard Euclidean space (see [ 63 ] for details). A P P E N D I X B S U P P L E M E N T A RY D E FI N I T I O N S A N D FAC T S For completeness, we clarify definitions of dif ferentiability and collect kno wn results that are mainly used in the proof of Proposition III.2 . Definition B.1. Let U be an open subset of ( R N , ⟨· , ·⟩ , ∥ · ∥ 2 ) . A mapping T : U → R M is G ˆ ateaux dif ferentiable at x ∈ U if there exists a linear operator D T ( x ) : R N → R M such that ( ∀ d ∈ R N ) lim 0 = α → 0 T ( x + α d ) − T ( x ) α = ( D T ( x ))( d ) . 6 For example, if N = 3 and I = { 1 , 3 } , then F I : R 2 + → R 2 ++ is given by ( ∀ q ∈ R 2 + ) F I ( q ) := ( f n ( q 1 , 0 , q 2 )) n ∈I . A mapping T : U → R M is Fr ´ echet differentiable at x ∈ U if there exists a linear operator D T ( x ) : R N → R M such that lim 0 = ∥ d ∥ 2 → 0 ∥ T ( x + d ) − T ( x ) − ( D T ( x ))( d ) ∥ 2 ∥ d ∥ 2 = 0 . If a function f : U → R is G ˆ ateaux differentiable at x ∈ U , then there exists a unique vector ∇ f ( x ) ∈ R N such that ( ∀ d ∈ R N ) ( D T ( x ))( d ) = ⟨∇ f ( x ) , d ⟩ , and ∇ f ( x ) is called the G ˆ ateaux gradient of f at x . If a function f : U → R is Fr ´ echet dif ferentiable at x ∈ U , the Fr ´ echet gradient of f at x ∈ U is also defined similarly and denoted by ∇ f ( x ) ∈ R N . Note that G ˆ ateaux and Fr ´ echet differentiability are the same notions for conv e x functions from R N to R (see, e.g., [ 20 , Corollary 17.44]). The following fact is a simplification of [ 20 , Theorem 18.5] to con v ex functions from R N to R . Fact B.1. Let h l : R N → R be a con vex function for every l ∈ { 1 , . . . , L } , and h : R N → R : x 7→ max l ∈{ 1 ,...,L } h l ( x ) . F or every x ∈ R N , define I ( x ) := argmax l ∈{ 1 ,...,L } h l ( x ) . Then we have ( ∀ x ∈ R N ) ∂ h ( x ) = conv( ∪ l ∈I ( x ) ∂ h l ( x )) , wher e the conve x hull of a set S ⊂ R N is denoted by conv( S ) . While the next fact appears (without detailed deriv ation) in the proof of [ 55 , Theorem 4.3], we include a proof in the Supplemental Material for completeness. Fact B.2. Let M ∈ R N × N ++ . Then ϱ M : R N + → R + given in Definition II.2 is F r ´ echet differ entiable at every x ∈ R N ++ , and its gradient is given by ∇ ϱ M ( x ) = diag( η ) M ξ η T ξ , (16) wher e ξ ∈ R N ++ and ζ ∈ R N ++ ar e positive right and left eigen vectors of diag ( x ) M ∈ R N × N ++ corr esponding to the spectral radius ρ (diag( x ) M ) > 0 . R E F E R E N C E S [1] P . C. W eeraddana, M. Codreanu, M. Latv a-aho, A. Ephremides, and C. Fischione, “W eighted sum-rate maximization in wireless networks: A review , ” F ound. Tr ends Netw . , vol. 6, no. 1–2, pp. 1–163, 2012. [2] Z.-Q. Luo and S. Zhang, “Dynamic spectrum management: Complexity and duality , ” IEEE J. Sel. T opics Signal Process. , v ol. 2, no. 1, pp. 57– 73, 2008. [3] L. Liu, R. Zhang, and K.-C. Chua, “ Achieving global optimality for weighted sum-rate maximization in the K-user gaussian interference channel with multiple antennas, ” IEEE T r ans. W ir eless Commun. , vol. 11, no. 5, pp. 1933–1945, 2012. [4] L. Zheng and C. W . T an, “Maximizing sum rates in cognitive radio networks: Conve x relaxation and global optimization algorithms, ” IEEE J. Sel. Areas Commun. , vol. 32, no. 3, pp. 667–680, 2014. [5] C. W . T an, S. Friedland, and S. Low , “Nonnegati ve matrix inequalities and their application to nonconv ex power control optimization, ” SIAM J. Matrix Anal. Appl. , vol. 32, no. 3, pp. 1030–1055, 2011. [6] S. Friedland and C. W . T an, “Maximizing sum rates in gaussian interference-limited channels, ” preprint arXiv:0806.2860, 2008. [7] M. Chiang, C. W . T an, D. P . Palomar, D. O’neill, and D. Julian, “Power control by geometric programming, ” IEEE T rans. W ireless Commun. , vol. 6, no. 7, pp. 2640–2651, 2007. [8] C. W . T an, M. Chiang, and R. Srikant, “Fast algorithms and performance bounds for sum rate maximization in wireless networks, ” IEEE/ACM T r ans. Netw . , vol. 21, no. 3, pp. 706–719, 2013. [9] Q. Shi, M. Razaviyayn, Z.-Q. Luo, and C. He, “ An iteratively weighted MMSE approach to distributed sum-utility maximization for a MIMO interfering broadcast channel, ” IEEE T r ans. Signal Process. , vol. 59, no. 9, pp. 4331–4340, 2011. 12 [10] T . L. Marzetta, E. G. Larsson, H. Y ang, and H. Q. Ngo, Fundamentals of Massive MIMO . Cambridge Univ ersity Press, 2016. [11] E. Bj ¨ ornson, J. Hoydis, and L. Sanguinetti, “Massive MIMO networks: Spectral, energy , and hardware efficiency , ” F ound. T r ends Signal Pr o- cess. , vol. 11, no. 3-4, pp. 154–655, 2017. [12] L. Miretti, E. Bj ¨ ornson, and S. Sta ´ nczak, “T w o-timescale weighted sum- rate maximization for large cellular and cell-free massive MIMO, ” in Pr oc. Int. W orkshop Signal Pr ocess. Adv . W ir eless Commun. , 2024, pp. 656–660. [13] L. Miretti, R. L. G. Cav alcante, and S. Sta ´ nczak, “T wo-timescale joint power control and beamforming design with applications to cell-free massiv e MIMO, ” IEEE Tr ans. Wir eless Commun. , vol. 24, no. 10, pp. 8129–8144, 2025. [14] Y . Xu, T . Le-Ngoc, and S. Panigrahi, “Global concav e minimization for optimal spectrum balancing in multi-user DSL networks, ” IEEE T r ans. Signal Process. , vol. 56, no. 7, pp. 2875–2885, 2008. [15] C. W . T an, M. Chiang, and R. Srikant, “Maximizing sum rate and minimizing MSE on multiuser downlink: Optimality , fast algorithms and equiv alence via max-min SINR, ” IEEE T rans. Signal Process. , vol. 59, no. 12, pp. 6127–6143, 2011. [16] H. V . Cheng, E. Bj ¨ ornson, and E. G. Larsson, “Optimal pilot and payload power control in single-cell massiv e MIMO systems, ” IEEE T r ans. Signal Process. , vol. 65, no. 9, pp. 2363–2378, 2017. [17] R. L. G. Cav alcante, T . Piotrowski, and S. Sta ´ nczak, “Cellular , cell- less, and ev erything in between: A unified framework for utility region analysis in wireless networks, ” preprint arXiv: 2507.23707, 2025. [18] R. D. Y ates, “ A frame work for uplink power control in cellular radio systems, ” IEEE J. Sel. Areas Commun. , vol. 13, no. 7, pp. pp. 1341– 1348, Sept. 1995. [19] M. Schubert and H. Boche, Interfer ence Calculus - A General F r ame- work for Interference Management and Network Utility Optimization . Springer , 2011. [20] H. H. Bauschke and P . L. Combettes, Conve x Analysis and Monotone Operator Theory in Hilbert Spaces , 2nd ed. Springer , 2017. [21] P . L. Combettes and J.-C. Pesquet, “Proximal splitting methods in signal processing, ” in Fixed-P oint Algorithms for Inverse Pr oblems in Science and Engineering , H. H. Bauschke, R. S. Burachik, P . L. Combettes, V . Elser, D. R. Luke, and H. W olkowicz, Eds. Springer, 2011, pp. 185–212. [22] ——, “Fixed point strategies in data science, ” IEEE T rans. Signal Pr ocess. , vol. 69, pp. 3878–3905, 2021. [23] L. Condat, D. Kitahara, A. Contreras, and A. Hirabayashi, “Proximal splitting algorithms for conve x optimization: A tour of recent adv ances, with new twists, ” SIAM Rev . , vol. 65, no. 2, pp. 375–435, 2023. [24] G. Chierchia, N. Pustelnik, B. Pesquet-Popescu, and J.-C. Pesquet, “ A nonlocal structure tensor-based approach for multicomponent image recovery problems, ” IEEE T rans. Image Pr ocess. , vol. 23, no. 12, pp. 5531–5544, 2014. [25] G. Chierchia, N. Pustelnik, J.-C. Pesquet, and B. Pesquet-Popescu, “Epigraphical projection and proximal tools for solving constrained con ve x optimization problems, ” Signal, Image V ideo Process. , vol. 9, no. 8, pp. 1737–1749, 2015. [26] P .-W . W ang, M. W ytock, and Z. Kolter , “Epigraph projections for fast general conve x programming, ” in Proc. Int. Conf. Mach. Learn. , v ol. 48, 2016, pp. 2868–2877. [27] S. Kyochi, S. Ono, and I. Selesnick, “Epigraphical relaxation for minimizing layered mixed norms, ” IEEE Tr ans. Signal Process. , vol. 69, pp. 2923–2938, 2021. [28] I. Y amada and N. Ogura, “Hybrid steepest descent method for variational inequality problem over the fixed point set of certain quasi-nonexpansiv e mappings, ” Numer . Funct. Anal. Optim. , vol. 27, no. 7/8, pp. 619–655, 2004. [29] I. Y amada, “The hybrid steepest descent method for the variational inequality problem over the intersection of fixed point sets of non- expansi ve mappings, ” in Inher ently parallel algorithms in feasibility and optimization and their applications , D. Butnariu, Y . Censor , and S. Reich, Eds. Elsevier , 2001, pp. 473–504. [30] I. Y amada, M. Y ukawa, and M. Y amagishi, “Minimizing the Moreau en velope of nonsmooth con ve x functions over the fixed point set of certain quasi-nonexpansive mappings, ” in Fixed-P oint Algorithms for In verse Pr oblems in Science and Engineering , H. H. Bauschke, R. S. Burachik, P . L. Combettes, V . Elser , D. R. Luke, and H. W olkowicz, Eds. Springer , 2011, pp. 345–390. [31] S. Ono and I. Y amada, “Signal recovery with certain in v olved conve x data-fidelity constraints, ” IEEE T rans. Signal Pr ocess. , vol. 63, no. 22, pp. 6149–6163, 2015. [32] P . L. Combettes, “Strong con ver gence of block-iterative outer approx- imation methods for con ve x optimization, ” SIAM J . Control Optim. , vol. 38, no. 2, pp. 538–565, 2000. [33] ——, “ A block-iterative surrogate constraint splitting method for quadratic signal recovery , ” IEEE T rans. Signal Process. , vol. 51, no. 7, pp. 1771–1782, 2003. [34] S. Gandy , B. Recht, and I. Y amada, “T ensor completion and low-n-rank tensor recov ery via con ve x optimization, ” In verse Pr obl. , vol. 27, no. 2, Jan. 2011, Art. no. 025010. [35] N. Pustelnik, C. Chaux, and J.-C. Pesquet, “Parallel proximal algorithm for image restoration using hybrid regularization, ” IEEE T rans. Image Pr ocess. , vol. 20, no. 9, pp. 2450–2462, 2011. [36] P . L. Combettes and C. L. M ¨ uller , “Perspective maximum likelihood- type estimation via proximal decomposition, ” Electron. J. Statist. , vol. 14, no. 1, pp. 207–238, 2020. [37] H. Kuroda and D. Kitahara, “Block-sparse recovery with optimal block partition, ” IEEE T rans. Signal Process. , vol. 70, pp. 1506–1520, 2022. [38] L. Condat, “Tikhono v regularization of circle-valued signals, ” IEEE T r ans. Signal Process. , vol. 70, pp. 2775–2782, 2022. [39] H. Kuroda, “ A conv ex-noncon vex framework for enhancing minimiza- tion induced penalties, ” J. F r anklin Inst. , vol. 362, no. 15, 2025, Art. no. 107969. [40] M. L. Overton, “On minimizing the maximum eigen v alue of a symmetric matrix, ” SIAM J. Matrix Anal. Appl. , vol. 9, no. 2, pp. 256–268, 1988. [41] L. V andenberghe and S. Boyd, “Semidefinite programming, ” SIAM Rev . , vol. 38, no. 1, pp. 49–95, 1996. [42] A. S. Lewis, “The mathematics of eigen v alue optimization, ” Math. Pr o- gramm. , vol. 97, no. 1, pp. 155–176, 2003. [43] M. Journee, F . Bach, P .-A. Absil, and R. Sepulchre, “Low-rank opti- mization on the cone of positive semidefinite matrices, ” SIAM J. Optim. , vol. 20, no. 5, pp. 2327–2351, 2010. [44] J. M. Borwein and Q. J. Zhu, “V ariational methods in the presence of symmetry , ” Adv . Nonlinear Anal. , vol. 2, no. 3, pp. 271–307, 2013. [45] A. Douik and B. Hassibi, “Manifold optimization over the set of doubly stochastic matrices: A second-order geometry , ” IEEE T rans. Signal Pr ocess. , vol. 67, no. 22, pp. 5761–5774, 2019. [46] A. Benfenati, E. Chouzenoux, and J.-C. Pesquet, “Proximal approaches for matrix optimization problems: Application to robust precision matrix estimation, ” Signal Pr ocess. , vol. 169, 2020, Art. no. 107417. [47] V . D. Blondel and Y . Nesterov , “Polynomial-time computation of the joint spectral radius for some sets of nonnegativ e matrices, ” SIAM J. Matrix Anal. Appl. , vol. 31, no. 3, pp. 865–876, 2010. [48] Y . Nesterov and V . Y . Protasov , “Optimizing the spectral radius, ” SIAM J. Matrix Anal. Appl. , vol. 34, no. 3, pp. 999–1013, 2013. [49] V . Y . Protasov , “Spectral simplex method, ” Math. Progr amm. , vol. 156, no. 1, pp. 485–511, 2016. [50] A. Cvetkovic and V . Y . Protasov , “The greedy strategy for optimizing the Perron eigen v alue, ” Math. Pr ogramm. , vol. 193, no. 1, pp. 1–31, 2022. [51] Y . Nesterov and V . Y . Protasov , “Computing closest stable nonnegativ e matrix, ” SIAM J. Matrix Anal. Appl. , vol. 41, no. 1, pp. 1–28, 2020. [52] B. Lemmens and R. Nussbaum, Nonlinear P err on-Fr obenius Theory . Cambridge University Press, 2012. [53] U. Krause, P ositive Dynamical Systems in Discrete T ime: Theory , Models, and Applications . W alter de Gruyter GmbH & Co KG, 2015. [54] R. D. Nussbaum, “Con ve xity and log conv exity for the spectral radius, ” Linear Algebra Appl. , vol. 73, pp. 59–122, 1986. [55] S. Friedland, “Con vex spectral functions, ” Linear Multilinear Algebra , vol. 9, no. 4, pp. 299–316, 1981. [56] C. R. Johnson and R. L. Smith, “Inv erse M-matrices, ii, ” Linear Algebra Appl. , vol. 435, no. 5, pp. 953–983, 2011. [57] R. A. Horn and C. R. Johnson, Matrix Analysis , 2nd ed. Cambridge Univ ersity Press, 2012. [58] G. H. Golub and C. F . V . Loan, Matrix Computations , 3rd ed. The Johns Hopkins University Press, 1996. [59] L. Miretti, R. L. G. Cav alcante, S. Sta ´ nczak, M. Schubert, R. B ¨ ohnke, and W . Xu, “Closed-form max-min po wer control for some cellular and cell-free massiv e MIMO networks, ” in Proc. IEEE 95th V eh. T ech- nol. Conf. , 2022, pp. 1–7. [60] ¨ O. T . Demir, E. Bj ¨ ornson, and L. Sanguinetti, “Foundations of user- centric cell-free massi ve MIMO, ” F ound. T rends Signal Process. , v ol. 14, pp. 162–472, 2021. [61] C. J. Nuzman, “Contraction approach to power control, with non- monotonic applications, ” in Pr oc. IEEE Global T elecommun. Conf. , 2007, pp. 5283–5287. 13 [62] R. L. G. Cavalcante, Y . Shen, and S. Sta ´ nczak, “Elementary properties of positive concave mappings with applications to network planning and optimization, ” IEEE T rans. Signal Process. , vol. 64, no. 7, pp. 1774– 1783, April 2016. [63] T . Piotrowski and R. L. G. Cavalcante, “The fixed point iteration of positiv e concave mappings conv erges geometrically if a fixed point exists: Implications to wireless systems, ” IEEE T rans. Signal Pr ocess. , vol. 70, pp. 4697–4710, 2022. 14 Supplemental Material A P P E N D I X C P RO O F O F F AC T B . 2 Our proof follows steps similar to the proof of [ 57 , Theorem 6.3.12] that essentially shows G ˆ ateaux differentiability of a simple eigenv alue as a function of a matrix. W e modify it to prove Fr ´ echet differentiability of ϱ M : R N + → R + (see Definitions II.1 and II.2 ) at x ∈ R N ++ , provided that M is a positiv e matrix. Fix x ∈ R N ++ arbitrarily . By assumption, we have A := diag( x ) M ∈ R N × N ++ . By standard Perron-Frobenius theory [ 57 , Theorem 8.2.8], λ := ρ ( A ) > 0 is an algebraically simple eigen v alue of A , and there exist positiv e right and left eigen v ectors ξ and η corresponding to λ = ρ ( A ) . Hence, by [ 57 , Lemma 6.3.10], there exist S 1 ∈ C N × ( N − 1) and Z 1 ∈ C N × ( N − 1) such that S := [ ξ S 1 ] ∈ C N × N is nonsingular , its in v erse is giv en by S − 1 = [ ˜ η Z 1 ] H with ˜ η := η / ( η T ξ ) , and S − 1 AS = λ 0 T 0 A 1 . Since λ is an algebraically simple eigenv alue of A , it is not an eigen v alue of A 1 ∈ C ( N − 1) × ( N − 1) . Hence, by [ 57 , Theorem 8.2.8(e)], there exists µ > 0 such that λ ≥ | ˆ λ | + µ if ˆ λ ∈ C is an eigenv alue of A 1 . (17) Fix ε ∈ (0 , µ/ 7) arbitrarily . Then, by [ 57 , Theorem 2.4.72], there exists a nonsingular matrix S ε ∈ C ( N − 1) × ( N − 1) such that T ε := S − 1 ε A 1 S ε ∈ C ( N − 1) × ( N − 1) is upper triangular , the eigen v alues of A 1 are on the main diagonal of T ε , and ( ∀ i ∈ { 1 , . . . , N − 2 } ) N − 1 X j = i +1 | t ( ε ) i,j | ≤ ε, (18) where t ( ε ) i,j denotes the ( i, j ) entry of T ε . Let d ∈ R N . Since ϱ M ( x + d ) = ρ ( A + diag( d ) M ) , we consider the eigen v alues of A + diag( d ) M . Performing the similarity transformation with S and S ε , we obtain 1 0 T 0 S − 1 ε S − 1 ( A + diag( d ) M ) S 1 0 T 0 S ε = λ + d T diag( ˜ η ) M ξ ˜ η T D M S 1 S ε S − 1 ε Z H 1 D M ξ T ε + S − 1 ε Z H 1 D M S 1 S ε , where we let D := diag ( d ) ∈ R N × N . Let ∥ · ∥ ∞ denotes the ℓ ∞ operator norm, i.e., for ev ery X = [ x i,j ] ∈ C I × J , ∥ X ∥ ∞ := max u ∈ C J \{ 0 } ∥ X u ∥ ∞ ∥ u ∥ ∞ = max i ∈{ 1 ,...,I } J X j =1 | x i,j | . By equiv alence of norms on R N , there exists C > 0 such that ( ∀ u ∈ R N ) ∥ u ∥ ∞ ≤ C ∥ u ∥ 2 . (19) For the arbitrarily fixed ε ∈ (0 , µ/ 7) , there exists r > 0 such that r C ∥ ˜ η T ∥ ∞ ∥ M S 1 S ε ∥ ∞ < ε. (20) Note that we can choose such r independently of d because ˜ η and M S 1 S ε are independent of d . W e perform the final similarity transformation: 1 0 T 0 r − 1 S − 1 ε S − 1 ( A + diag( d ) M ) S 1 0 T 0 r S ε = λ + d T diag( ˜ η ) M ξ r ˜ η T D M S 1 S ε r − 1 S − 1 ε Z H 1 D M ξ T ε + S − 1 ε Z H 1 D M S 1 S ε =: B . Now we consider the Ger ˇ sgorin discs associated with the matrix B . Independently of d , we can choose δ 1 > 0 , δ 2 > 0 , and δ 3 > 0 such that δ 1 ∥ diag( ˜ η ) M ξ ∥ 2 < ε, (21) δ 2 C ∥ r − 1 S − 1 ε Z H 1 ∥ ∞ ∥ M ξ ∥ ∞ < ε, (22) δ 3 C ∥ S − 1 ε Z H 1 ∥ ∞ ∥ M S 1 S ε ∥ ∞ < ε. (23) In addition, since A is positiv e, we can choose δ 4 > 0 such that, if d ∈ R N satisfies ∥ d ∥ 2 < δ 4 , then A + diag( d ) M is also positiv e. Let δ := min { δ 1 , δ 2 , δ 3 , δ 4 , 1 } , and suppose that d ∈ R N satisfies ∥ d ∥ 2 < δ . Then, the Ger ˇ sgorin disc G 1 associated with the first row of the matrix B = [ b i,j ] satisfies G 1 := { z ∈ C | | z − b 1 , 1 | ≤ ∥ r ˜ η T D M S 1 S ε ∥ ∞ } ⊂ { z ∈ C | | z − b 1 , 1 | < ε ∥ d ∥ 2 } (24) because ∥ r ˜ η T D M S 1 S ε ∥ ∞ (a) ≤ r ∥ ˜ η T ∥ ∞ ∥ D ∥ ∞ ∥ M S 1 S ε ∥ ∞ (b) ≤ r C ∥ ˜ η T ∥ ∞ ∥ M S 1 S ε ∥ ∞ ∥ d ∥ 2 (c) < ε ∥ d ∥ 2 , (25) where (a) follows from submultiplicativity of the ℓ ∞ operator norm, (b) from ∥ D ∥ ∞ = ∥ d ∥ ∞ and ( 19 ), and (c) from ( 20 ). Furthermore, since b 1 , 1 = λ + d T diag( ˜ η ) M ξ by construction, we hav e | b 1 , 1 − λ | = | d T diag( ˜ η ) M ξ | (a) ≤ ∥ d ∥ 2 ∥ diag( ˜ η ) M ξ ∥ 2 (b) < ε, where (a) follows from the Cauchy-Schwarz inequality and (b) from ∥ d ∥ 2 < δ 1 and ( 21 ). Combining this inequality with ( 24 ), since ∥ d ∥ 2 < 1 holds by assumption, we obtain G 1 ⊂ { z ∈ C | | z − λ | < 2 ε } . (26) Next, for each i ∈ { 2 , 3 , . . . , N } , we consider the Ger ˇ sgorin disc G i associated with the i th row of B . Similarly to the deriv ation in ( 25 ), using ( 22 ) and ∥ d ∥ 2 < δ 2 , we obtain ∥ r − 1 S − 1 ε Z H 1 D M ξ ∥ ∞ ≤ ∥ d ∥ 2 C ∥ r − 1 S − 1 ε Z H 1 ∥ ∞ ∥ M ξ ∥ ∞ < ε. Similarly , by ( 23 ) and ∥ d ∥ 2 < δ 3 , we hav e ∥ S − 1 ε Z H 1 D M S 1 S ε ∥ ∞ ≤ ∥ d ∥ 2 C ∥ S − 1 ε Z H 1 ∥ ∞ ∥ M S 1 S ε ∥ ∞ < ε. In particular , this inequality implies that an y entry of S − 1 ε Z H 1 D M S 1 S ε is less than ε , and hence ( ∀ i ∈ { 2 , 3 , . . . , N } ) | b i,i − t ( ε ) i − 1 ,i − 1 | < ε. W e recall that T ε is upper triangular, and the inequality sho wn 15 in ( 18 ) holds. Altogether, for ev ery i ∈ { 2 , 3 , . . . , N } , we obtain G i := z ∈ C | z − b i,i | ≤ X j ∈{ 1 ,...,N }\{ i } | b i,j | ⊂ { z ∈ C | | z − b i,i | < 3 ε } ⊂ { z ∈ C | | z − t ( ε ) i − 1 ,i − 1 | < 4 ε } . (27) For ev ery i ∈ { 2 , 3 , . . . , N } , since t ( ε ) i − 1 ,i − 1 is an eigen v alue of A 1 by construction, using the inequality shown in ( 17 ), we deriv e | λ − t ( ε ) i − 1 ,i − 1 | ≥ λ − | t ( ε ) i − 1 ,i − 1 | ≥ µ. Combining this inequality with ( 26 ) and ( 27 ), since 6 ε < µ holds by assumption, we deduce that the Ger ˇ sgorin disc G 1 is disjoint from the other discs G 2 , . . . , G N . Hence, for every d ∈ R N satisfying ∥ d ∥ 2 < δ , by ( 24 ) and the Ger ˇ sgorin theorem [ 57 , Theorem 6.1.1], there exists a unique eigen v alue, say λ ( d ) , of A + diag( d ) M such that λ ( d ) ∈ G 1 and | λ ( d ) − λ − d T diag( ˜ η ) M ξ | < ε ∥ d ∥ 2 . (28) Furthermore, under the assumed conditions, we show that λ ( d ) is the spectral radius of A + diag( d ) M as follows. Let ˆ λ ( d ) be an eigen v alue of A + diag( d ) M such that ˆ λ ( d ) = λ ( d ) . Then, since ˆ λ ( d ) should be contained in one of G 2 , . . . , G N , the relation in ( 27 ) implies ( ∃ i ∈ { 2 , 3 , . . . , N } ) | ˆ λ ( d ) − t ( ε ) i − 1 ,i − 1 | < 4 ε. Hence, we hav e | ˆ λ ( d ) | < | t ( ε ) i − 1 ,i − 1 | + 4 ε (a) < λ − 2 ε (b) < | λ ( d ) | , where (a) holds by ( 17 ) and 6 ε < µ , and (b) by λ ( d ) ∈ G 1 and ( 26 ). W e recall that A + diag( d ) M becomes positi ve if ∥ d ∥ 2 < δ 4 . Altogether , for e very d ∈ R N satisfying ∥ d ∥ 2 < δ , we hav e λ ( d ) = ρ ( A + diag ( d ) M ) > 0 by standard Perron- Frobenius theory . Finally , the desired result is shown by lim 0 = ∥ d ∥ 2 → 0 ϱ M ( x + d ) − ϱ M ( x ) − ⟨ d , diag( ˜ η ) M ξ ⟩ ∥ d ∥ 2 = lim 0 = ∥ d ∥ 2 → 0 ρ ( A + diag( d ) M ) − ρ ( A ) − d T diag( ˜ η ) M ξ ∥ d ∥ 2 = lim 0 = ∥ d ∥ 2 → 0 λ ( d ) − λ − d T diag( ˜ η ) M ξ ∥ d ∥ 2 = 0 , where the last equality follo ws from ( 28 ) since ε can be chosen from (0 , µ/ 7) arbitrarily . A P P E N D I X D O P T I M I Z A T I O N A L G O R I T H M F O R P R O B L E M ( S ) W e present an efficient algorithm that provably con v erges to the global optimal value of Problem ( S ) under the following assumption. Assumption D.1. In addition to Assumption III.1 , we assume that the function ϱ G : R N + → R + giv en in Definition II.2 is con v ex on R N ++ . Note that this assumption is stronger than Assumption III.2 because con ve xity of ϱ G on R N ++ implies con ve xity of ϱ G ◦ E : R N + → R + by [ 17 , Proposition 6]. A. Known results used in algorithm derivation In this subsection, we present the known results used to deriv e the proposed algorithm for Problem ( S ). Fact D.1 ( [ 17 , Proposition 3]) . Let a PHOC mapping G : R N + → R N + satisfy Assumption D.1 . Then, for C G := { x ∈ R N + | ϱ G ( x ) ≤ 1 } and S = conv( C G ∪ − C G ) , wher e − C G := { x ∈ R N | − x ∈ C G } , the Minkowski or gauge functional of S , defined by ( ∀ x ∈ R N ) ∥ x ∥ G := inf { γ > 0 | (1 /γ ) x ∈ S G } , (29) is an or der -pr eserving norm satisfying ( ∀ x ∈ R N + ) ∥ x ∥ G = ϱ G ( x ) . (30) Fact D.2 ( [ 17 , Example 2]) . Consider the setting of Example II.1 . F or every l ∈ { 1 , . . . , L } , let G l : R N + → R N + : x 7→ M l x , and suppose that the function ϱ M l : R N + → R + is con ve x on R N ++ . Then G l satisfies the assumptions in F act D.1 , and hence ( ∀ x ∈ R N + ) ∥ x ∥ G l = ϱ M l ( x ) (31) holds for every l ∈ { 1 , . . . , L } , wher e ∥ · ∥ G l is constructed as in ( 29 ) . Mor eo ver , G given in Example II.1 satisfies Assumption D.1 , and we have ( ∀ x ∈ R N ) ∥ x ∥ G = max l ∈{ 1 ,...,L } ∥ x ∥ G l . (32) B. Reformulation to a problem over Euclidean space T o apply the HSD method, under Assumption D.1 , we translate Problem ( S ) into an equiv alent con vex optimization problem composed of con ve x functions from R N to R . If G satisfies Assumption D.1 , then by Fact D.1 , the function ϱ G : R N + → R + can be extended to the norm ∥ · ∥ G giv en by ( 29 ), which coincides with ϱ G on R N + . Meanwhile, the value of the cost function − P n ∈N w n log(1 + s n ) in Problem ( S ) is well defined if 1 + s n > 0 for every n ∈ N . Note that, since the constraint set S is a subset of R N + (see ( 2 )), it is suf ficient if the extended function coincides with − P n ∈N w n log(1 + s n ) on R N + . Specifically , we use the following simple e xtension. Lemma D .1. W ith w ∈ R N ++ , we define Φ : R N → R : s 7→ X n ∈N w n ϕ ( s n ) , ϕ : R → R : s 7→ ( − log (1 + s ) , if s ≥ 0; − s, if s < 0 . Then Φ satisfies the following pr operties. (i) ( ∀ s ∈ R N + ) Φ( s ) = − P n ∈N w n log(1 + s n ) ; (ii) Φ is differ entiable and con vex on R N , and strictly con vex on R N + ; and (iii) The gradient of Φ on R N + is given by ( s ∈ R N + ) ∇ Φ( s ) = − w n 1 + s n n ∈N , (33) 16 and it is Lipschitz continuous on ( R N + , ∥ · ∥ 2 ) . Pr oof. (i) Clear from the definition of Φ . (ii) It is clear that ϕ is continuously differentiable at s ∈ R \ { 0 } . Since the left and right deri v ati ves of ϕ at 0 are the same, we deduce that ϕ is continuously differentiable on R , implying that Φ is Fr ´ echet dif ferentiable on R N . The deri vati ve of ϕ is giv en by ϕ ′ ( s ) := ( − 1 / (1 + s ) , if s ≥ 0; − 1 , if s < 0 . (34) Since ϕ ′ is increasing on R , ϕ is con v ex on R , and so is Φ on R N because of w ∈ R N ++ . Strict con v exity of Φ on R N + is clear from the definition. (iii) The expression in ( 33 ) is immediate from ( 34 ). For ev ery x and y in R N + , we hav e ∥∇ Φ( x ) − ∇ Φ( y ) ∥ 2 2 = X n ∈N − w n 1 + x n + w n 1 + y n 2 = X n ∈N w n ( x n − y n ) (1 + x n )(1 + y n ) 2 ≤ X n ∈N w 2 n | x n − y n | 2 , showing that ∇ Φ is Lipschitz continuous on ( R N + , ∥ · ∥ 2 ) with Lipschitz constant κ = max n ∈N w n . Under Assumption D.1 , using ∥ · ∥ G giv en in Fact D.1 and Φ giv en in Lemma D.1 , we reformulate Problem ( S ) to miminize Φ( s ) subject to s ∈ [0 , b ] N ∩ lev ≤ 1 ( ∥ · ∥ G ) , (S’) where we also introduce [0 , b ] N similarly to the design of Problem ( R’ ). In the following proposition, under Assumption D.1 , we show that Problems ( S ) and ( S’ ) are guaranteed to hav e the same unique solution if b is large enough. Proposition D.1. Under Assumption D.1 , Pr oblem ( S’ ) has a unique solution for any b > 0 . Mor eo ver , if b is lar ge enough, the unique solution to Pr oblem ( S’ ) is the same as the unique solution to Pr oblem ( S ) . Pr oof. By Fact D.1 , for any b > 0 , [0 , b ] N ∩ lev ≤ 1 ( ∥ · ∥ G ) is a nonempty compact con v ex set. Hence, by Lemma D.1 (ii), Problem ( S’ ) is minimization of the strictly con ve x function ov er the nonempty compact con vex set, implying that a solution uniquely exists for Problem ( S’ ). Next, we show that the solution sets of Problems ( S ) and ( S’ ) are the same if b is sufficiently large. Since the constraint set S defined by ( 2 ) is bounded by Lemma III.3 and a subset of R N + , there exists c > 0 such that ( ∀ b ≥ c ) S = [0 , b ] N ∩ S = [0 , b ] N ∩ lev ≤ 1 ( ∥ · ∥ G ) , where the last equality holds by ( 30 ) in Fact D.1 . By this equality and Lemma D.1 (i), we obtain the desired result. C. Optimization algorithm W e deri ve the proposed algorithm for Problem ( S ) by ap- plying the subgradient-projection-based HSD method shown in Fact II.2 to Problem ( S’ ). With an initial point s 1 ∈ R N ++ and a slowly decreasing step-size ( µ k ) k ∈ N ⊂ R N ++ , the proposed algorithm for Problem ( S’ ) generates the sequence ( s k ) k ∈ N by For k = 1 , 2 , . . . Choose g k ∈ ∂ ∥ · ∥ G ( s k ) , ˆ s k := ( s k − ( ∥ s k ∥ G − 1) g k / ∥ g k ∥ 2 2 , if ∥ s k ∥ G > 1; s k , if ∥ s k ∥ G ≤ 1 , ˜ s k := P [0 ,b ] N ( ˆ s k ) , s k +1 := ˜ s k − µ k ∇ Φ( ˜ s k ) . (35) An efficient method for finding g k in the scenarios considered in Example II.1 is giv en in the next subsection. The projection mapping P [0 ,b ] N : R N → [0 , b ] N can be computed by ( 9 ). For ev ery k ∈ N , since ˜ s k = P [0 ,b ] N ( ˆ s k ) ∈ R N + , the gradient ∇ Φ( ˜ s k ) can be computed by the expression in ( 33 ). Under Assumption D.1 , we establish the con ver gence of ( s k ) k ∈ N to the unique solution s ⋆ to Problem ( S’ ) in the following theorem, where we also show that the sum-rate computed for SINR values ( s k ) k ∈ N con v erges to (the negativ e of) the global optimal value of Problem ( S ) – and hence that of the original problem ( P ) – if b is large enough. Theorem D.1. Let w ∈ R N ++ , b > 0 , s 1 ∈ R N ++ , and ( µ k ) k ∈ N ⊂ R N ++ satisfy lim k →∞ µ k = 0 and P k ∈ N µ k = ∞ . Then, under Assumption D.1 , each of the following holds for ( s k ) k ∈ N generated by ( 35 ) . (i) lim k →∞ s k = s ⋆ ∈ argmin s ∈ [0 ,b ] N ∩ lev ≤ 1 ( ∥·∥ G ) Φ( s ) ; (ii) ( ∀ k ∈ N ) s k ∈ R N ++ ; (iii) lim k →∞ d 2 ( s k , S ) = 0 ; and (iv) If b is larg e enough, then lim k →∞ X n ∈N w n log(1 + s k,n ) = max s ∈S X n ∈N w n log(1 + s n ) , wher e we denote s k = ( s k,n ) n ∈N for every k ∈ N . Pr oof. (i) T o prove this claim via application of Fact II.2 , we first show that conditions (a1) and (a2) in Fact II.2 are satisfied for h = ∥ · ∥ G , K = [0 , b ] N , and Θ = Φ . Since ∥ · ∥ G is a norm by Fact D.1 , it is a conv ex function from R N to R . For any b > 0 , [0 , b ] N is a compact con ve x set, and [0 , b ] N ∩ lev ≤ 1 ( ∥ · ∥ G ) = ∅ . Thus, condition (a1) holds. By Lemma D.1 , Φ satisfies condition (a2). Since the solution set of ( 35 ) is singleton by Proposition D.1 , Fact II.2 implies lim k →∞ s k = s ⋆ . (ii) For ev ery k ∈ N , we hav e ˜ s k = P [0 ,b ] N ( ˆ s k ) ∈ R N + . By assumption, we hav e ( ∀ s ∈ R N + ) − ∇ Φ( s ) = w n 1 + s n n ∈N ∈ R N ++ and ( µ k ) k ∈ N ⊂ R N ++ . Hence, the last step in ( 35 ) ensures s k +1 ∈ R N ++ for ev ery k ∈ N . Note that s 1 ∈ R N ++ holds by assumption. (iii) Since S = R N + ∩ lev ≤ 1 ( ∥ · ∥ G ) by definition of S in ( 2 ) and ( 30 ) in Fact D.1 , this claim follows from the claim (i) and s ⋆ ∈ [0 , b ] N ∩ lev ≤ 1 ( ∥ · ∥ G ) ⊂ S . 17 Algorithm 2: for Problem ( S ) using Example II.1 Input: M l ∈ R N × N ++ ( l = 1 , . . . , L ) , w ∈ R N ++ , s 1 ∈ R N ++ , ( µ k ) k ∈ N ⊂ R N ++ satisfying lim k →∞ µ k = 0 and P k ∈ N µ k = ∞ , and large enough b > 0 . for k = 1 , 2 , . . . do γ k = max l ∈{ 1 ,...,L } ρ (diag( s k ) M l ) ; Choose l ⋆ k ∈ argmax l ∈{ 1 ,...,L } ρ (diag( s k ) M l ) ; if γ k > 1 then Find ξ k ∈ R N ++ s.t. diag( s k ) M l ⋆ k ξ k = γ k ξ k ; Find η k ∈ R N ++ s.t. η T k diag( s k ) M l ⋆ k = γ k η T k ; g k = (diag( η k ) M l ⋆ k ξ ) / ( η T k ξ k ) ; ˆ s k = s k − ( γ k − 1) g k / ∥ g k ∥ 2 2 ; else ˆ s k = s k ; ˜ s k = P [0 ,b ] N ( ˆ s k ) ; // See ( 9 ) s k +1 = ˜ s k − µ k ∇ Φ( ˜ s k ) ; // See ( 33 ) (iv) If b is large enough, the solutions of Problems ( S ) and ( S’ ) are the same by Proposition D.1 . Thus, this claim follows from the claim in (i) by continuity of Φ and Lemma D.1 (i). D. Efficient subgradient computation method In this subsection, by focusing on the mappings G consid- ered in Example II.1 , we present a low-comple xity method for finding a subgradient of ∥ · ∥ G used in Algorithm ( 35 ) for Problem ( S’ ), where we additionally assume the follow- ing condition, which is stronger than Assumption III.3 (see Remark III.3 ). Assumption D.2. In addition to the setting of Example II.1 , we assume that ϱ M l : R N + → R + is con ve x on R N ++ for ev ery l ∈ { 1 , . . . , L } . Remark D.1. If Assumption D.2 holds, then G given in Example II.1 satisfies Assumption D.1 by Fact D.2 . Since s k ∈ R N ++ holds at ev ery iteration k ∈ N by Theorem D.1 (ii), it is sufficient to find a subgradient of ∥ · ∥ G at a point in R N ++ . T o this end, we present the following proposition. Proposition D.2. Consider the setting of Example II.1 . F ix s ∈ R N ++ arbitrarily , and let l ⋆ ∈ argmax l ∈{ 1 ,...,L } ϱ M l ( s ) = argmax l ∈{ 1 ,...,L } ρ (diag( s ) M l ) . Then, under Assumption D.2 , we have diag( η ) M l ⋆ ξ η T ξ ∈ ∂ ∥ · ∥ G ( s ) , (36) wher e ξ ∈ R N ++ and ζ ∈ R N ++ ar e right and left eigen vectors of diag ( s ) M l ⋆ ∈ R N × N ++ corr esponding to the spectral radius ρ (diag( s ) M l ⋆ ) > 0 . Pr oof. Owing to the equality shown in ( 32 ), g ∈ ∂ ∥ · ∥ G l ⋆ ( s ) implies g ∈ ∂ ∥ · ∥ G ( s ) by Fact B.1 in Appendix B . Hence it is sufficient to show that the left-hand side of ( 36 ) is a subgradient of ∥ · ∥ G l ⋆ at s . Since M l ⋆ ∈ R N × N ++ holds in the setting of Example II.1 , by Fact B.2 , ϱ M l ⋆ is differentiable on R N ++ , and its gradient at s ∈ R N ++ is giv en by the left-hand side of ( 36 ). Since the equality in ( 31 ) holds in a neighborhood of s ∈ R N ++ , the left-hand side of ( 36 ) is also the gradient of ∥ · ∥ G l ⋆ at s , and hence is a (unique) subgradient of ∥ · ∥ G l ⋆ at s . Algorithm 2 is a particular instance of ( 35 ) using the subgra- dient characterized in Proposition D.2 under Assumption D.2 . Since Assumption D.2 implies Assumption D.1 , Theorem D.1 applies to Algorithm 2 under Assumption D.2 .
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment