MM Algorithms for Geometric and Signomial Programming

This paper derives new algorithms for signomial programming, a generalization of geometric programming. The algorithms are based on a generic principle for optimization called the MM algorithm. In this setting, one can apply the geometric-arithmetic …

Authors: Kenneth Lange, Hua Zhou

MM Algorithms for Geometric and Signomial Programming
Under review man uscript No. (will b e inserted b y the editor) MM Algorithms f or Geometric and S ignomial Programming Kenneth Lange · Hua Zhou Receiv ed: dat e / Accepted: date Abstract This paper derives new al gorithms for signomial programming, a general- ization of geometric programming. The algorithms are based on a generic principle for optimization called the MM algorithm. In this setting, one can apply the geometric- arithmetic mean inequalit y and a supp orting h yperp lane inequality t o create a s urro- gate function with parameters separated. Thus, unconstrained signomial p rogramming reduces to a sequ ence of one-dimensional minimization problems. Simple examples demonstrate that th e MM al gorithm deriv ed ca n con v erge to a boundary point o r to one p oin t of a con tin uum of minim um points. Conditions under whic h the minim um p oin t is unique or o ccurs in the in terior of parameter space are pro v ed f or geometric programming. C on verg ence to an interi or p oin t o ccurs at a linear rate. Finally , the MM framewo rk easily accommo dates equalit y and inequalit y constraints of signomia l type. F or the most imp ortant special case, constrained quad ratic programming, the MM algorithm i nv olv es very simple up dates. Keywords arithmetic-geometric mean inequality · global con verg ence · MM algorithm · parameter separation · p enalt y method Mathematics Sub je ct Classification (2000) 90C2 5 · 26D07 1 Introduction As a branch of co nv ex optimization theory , geometric progra mming is next in line to linear and quadratic programming in imp ortance [3, 4, 14, 15]. It has applications in Researc h was supported by United States Public Health Service gran ts GM53275 and MH59490. K. Lange Departmen ts of Biomath ematics, Human Genet ics, and Statistics, Univ ersity of California, Los Angeles, CA 90095-1766, USA. E-mail: klange@ucla.edu H. Zhou Departmen t of Statistics, N or th Car olina State Universit y , 2311 Stinson Driv e, Campus Box 8203, R aleigh, NC 27695-8203, USA. E-mail: hua zhou@ucla.edu 2 chemic al equilibrium problems [13], structu ral mechanics [4], integrated circuit design [6], maxim um lik elihoo d estimation [11], stochastic pro cesses [5], and a host of other sub jects [4]. Geometric programming deals with p osynomials, which are functions of the form f ( x ) = X α ∈ S c α n Y i =1 x α i i . (1) Here the index set S ⊂ R n is fin ite, and all co efficien ts c α and all comp onents x 1 , . . . , x n of the argument x of f ( x ) are p ositive. The p ossibly fractional p ow ers α i correspondin g to a particular α ma y be positive, negative, or zero. F or instance, x − 1 1 + 2 x 3 1 x − 2 2 is a p osynomial on R 2 . In geometric programming w e minimize a p osynomial f ( x ) sub ject to posynomial inequality constraints of t he f orm u j ( x ) ≤ 1 for 1 ≤ j ≤ q , where the u j ( x ) are again p osynomials. In some versions of geometric programming, equalit y constrain ts of p osynomial type are permitted [2]. A signomial function has the same form as the p osynomial (1), bu t the co efficients c α are allo wed to b e negative. A signomial program is a generalization of a geo met- ric program, where the ob jective and constraint functions can b e signomial s. F rom a computational p oint of view, signomial programming problems are significantly harder to solv e th an geometric prog ramming problems. After suitable c h ange of vari ables, a geometric program can b e transforme d in to a conv ex optimization p roblem and glob- ally solve d by standard metho ds. In contrast, signomials ma y hav e many local minima. W ang et al. [19] recently derived a path algorithm for solving u nconstrained signomial programs. The theory and practice of geometric programmi ng has b een stable for a genera- tion, so it is hard to imagine sa ying anything nov el about either. The attractions of geometric programming include its beautiful d ualit y theory and its conn ections with the arithmetic-geometric mean inequ alit y . The presen t paper deriv es new algo rithms for b oth geometric and signomial programming based on a generic device for iterative optimization called the MM algorithm [8, 10]. The MM persp ective p ossesses severa l ad- v antages. First it provides a unifi ed framew ork for solving b oth geometric and signomia l programs. The algorithms derived here op erate by separating parameters and reducing minimization of the ob jective function to a sequence of one- dimensional minimization problems. Separation of parameters is apt to b e an adv an tage in high-dimensional prob- lems. Another adv an tage is ease of implemen tation compared to competing methods of unconstrained g eometric and si gnomial programming [19]. Finally , straigh tforw ard generalizations of our MM algorithms extend b eyond signomial programming. W e conclude th is introduction by sketching a roadmap to t h e rest of the p ap er. Sec- tion 2 reviews the MM algorithm. Section 3 deriv es MM al gorithm fo r unconstrained signomial program from tw o simple inequalities. The b ehavior of th e MM algorithm is illustrated on a few numerical examples in Section 4. Section 5 extends th e MM algo- rithm for u nconstrained problems to th e constrained cases using the p enalty metho d. Section 6 sp ecializes to linearly constrained quadratic p rogramming on the p ositive orthant. Conv ergence results are discussed in Section 7. 2 B ac kground on the M M Algorithm The MM principle invo lves ma jorizing the ob jective function f ( x ) by a surrogate func- tion g ( x | x m ) a round the current iterate x m (with i th component x mi ) o f a searc h. 3 Ma jorization is defined by the tw o conditions f ( x m ) = g ( x m | x m ) (2) f ( x ) ≤ g ( x | x m ) , x 6 = x m . In other w ords, the surface x 7→ g ( x | x m ) lie s abov e the surface x 7→ f ( x ) and is tangent to it at the point x = x m . Construction of the ma jorizing fun ct ion g ( x | x m ) constitutes the first M of th e MM algorithm. The second M of the algorithm minimizes the surrogate g ( x | x m ) rather than f ( x ). If x m +1 denotes th e minimizer of g ( x | x m ), th en this action forces the descent prop erty f ( x m +1 ) ≤ f ( x m ). This fact fo llo ws from the inequalities f ( x m +1 ) ≤ g ( x m +1 | x m ) ≤ g ( x m | x m ) = f ( x m ) , reflecting the defi n ition of x m +1 and the t angency conditions (2). The d escent prop erty lends the MM algorithm remark able numerical stability . Strictly sp eaking, i t dep end s only on decreasing g ( x | x m ), not on mi nimizing g ( x | x m ). 3 U nconstrained Signomi al Programming The art in devising an MM algorithm revolv es around in telligen t choice of the ma- jorizing fun ction. F or signomial programming problems, fortunately one can inv oke tw o simple inequalities. F or terms with p ositive co efficien ts c α , we use t he arithmetic- geometric mean inequality n Y i =1 z α i i ≤ n X i =1 α i k α k 1 z k α k 1 i (3) for nonnegative n umbers z i and α i and ℓ 1 norm k α k 1 = P n i =1 | α i | [18]. If we make the c hoice z i = x i /x mi in ineq ualit y (3 ), th en th e ma jorization n Y i =1 x α i i ≤ n Y i =1 x α i mi ! n X i =1 α i k α k 1  x i x mi  k α k 1 , (4) emerges, with equality when x = x m . W e can broaden the scop e of t h e ma jorization (4) to cases with α i < 0 by replacing z i by th e reciprocal ratio x mi /x i whenever α i < 0. Thus, for terms c α Q n i =1 x α i i with c α > 0, we hav e the ma jorization c α n Y i =1 x α i i ≤ c α   n Y j =1 x α j mj   n X i =1 | α i | k α k 1  x i x mi  k α k 1 sgn( α i ) , where sgn( α i ) is the sign funct ion. The terms c α Q n i =1 x α i i with c α < 0 are handled by a different ma jorization. Our p oin t of departure is th e su pp orting hyp erplane minorization z ≥ 1 + ln z 4 at the point z = 1. If we let z = Q n i =1 ( x i /x mi ) α i , then it follo ws that n Y i =1 x α i i ≥ n Y j =1 x α j mj 1 + n X i =1 α i ln x i − n X i =1 α i ln x mi ! (5) is a v alid minorization in x around the p oint x m . Multiplica tion by the negative co- efficien t c α now gives the desired ma jorization. The surrogate f unction separates pa- rameters and is con vex when all of the α i are positive. In summary , the ob jective function (1) is ma jorized up to an irrelev ant additiv e constant by the sum g ( x | x m ) = n X i =1 g i ( x i | x m ) g i ( x i | x m ) = X α ∈ S + c α n Y j =1 x α j mj ! | α i | k α k 1  x i x mi  k α k 1 sgn( α i ) (6) + X α ∈ S − c α n Y j =1 x α j mj ! α i ln x i , where S + = { α : c α > 0 } , and S − = { α : c α < 0 } . T o guaran tee that t h e next iterate is w ell defined and occurs on the interior of the parameter d omain, it is helpful to assume for each i that at least one α ∈ S + has α i p ositiv e and at least on e α ∈ S + has α i negative . Under these conditions each g i ( x i | x m ) is coercive and attains its minim um on the op en interv al (0 , ∞ ). Minimization of the ma jorizing function is straightfo rw ard b ecause the surrogate functions g i ( x i | x m ) are u niv aria te functions. The deriva tive of g i ( x i | x m ) with respect t o its left argument equals g ′ i ( x i | x m ) = X α ∈ S + c α n Y j =1 x α j mj ! α i x − 1 i  x i x mi  k α k 1 sgn( α i ) + X α ∈ S − c α n Y j =1 x α j mj ! α i x − 1 i Assuming that the exp onents α i are integers, this is a rational function of x i , and once w e equ ate it to 0, w e are faced with solving a p olyn omial equ ation. This task ca n be accomplished by bisection or by Newton’s metho d. In a geometric p rogram, the function g ′ i ( x i | x m ) has a s ingle ro ot on the interv al (0 , ∞ ) . F or a pro of of this fact, n ote that making the standard change of v ariables x i = e y i eliminates th e p ositivity constraint x i > 0 and renders the transformed function h i ( y i | x m ) = g i ( x i | x m ) strictly conv ex. Beca use | α i | sgn( α i ) 2 = | α i | , the second deriv ativ e h ′′ i ( y i | x m ) = X α ∈ S + c α n Y j =1 x α j mj ! | α i | · k α k 1 x k α k 1 sgn( α i ) mi e k α k 1 sgn( α i ) y i is p ositive. Hence, h i ( y i | x m ) is strictly conv ex and p ossesses a uniqu e minimum p oin t. These arg uments yield the even sw eeter dividend that the MM iteration map 5 is contin uously d ifferentiable. F rom th e va ntag e p oint of t h e implicit function theorem [7], the stationary condition h ′ i ( y m +1 ,i | x m ) = 0 determines y m +1 ,i , and consequently x m +1 ,i , in terms of x m . Observe here that h ′′ i ( y mi | x m ) 6 = 0 as required by th e implicit function. It is also w orth p ointing out th at even more functions can b e b rought und er th e um- brella of signomial programming. F or instance, ma jorization of t h e funct ions − ln f ( x ) and ln f ( x ) is p ossible for an y p osyn omial f ( x ) = P α c α Q n i =1 x α i i . In the fi rst case, − ln f ( x ) ≤ − X α a m α b m h n X i =1 α i ln x i + ln  c α b m a m α i (7) holds for a m α = c α Q n i =1 x α i mi and b m = P α a m α b ecause Jensen’s inequality applies to the conv ex function − ln t . I n the second case, th e sup p orting hyperplane ineq ualit y applied to the con vex function − ln t implies ln f ( x ) ≤ ln f ( x m ) + 1 f ( x m ) h f ( x ) − f ( x m ) i . This pu ts us bac k in the p osition of needing t o ma jorize a p osynomial, a problem w e hav e already discussed in detail. By our previous remarks, the co efficients c α can be negative as well as p ositive in this case. Similar ma jorizations apply to any comp osition φ ◦ f ( x ) o f a p osyn omial f ( x ) with an arbitrary concave funct ion φ ( y ). 4 Ex amples of Unconstrained Minimization Our first examples demonstrate the robustness of the MM algorithms in minimization and illustrate some of t h e complications that o ccur. In each case we can explicitly calculate the MM up dates. T o start, consider the p osynomial f 1 ( x ) = 1 x 3 1 + 3 x 1 x 2 2 + x 1 x 2 with t he i mplied c onstrain ts x 1 > 0 and x 2 > 0. The ma jorization (4) ap p lied to the third term of f 1 ( x ) yields x 1 x 2 ≤ x m 1 x m 2 " 1 2  x 1 x m 1  2 + 1 2  x 2 x m 2  2 # = x m 2 2 x m 1 x 2 1 + x m 1 2 x m 2 x 2 2 . Applied to the second term of f 1 ( x ) using the recipro cal ratios, i t gives 3 x 1 x 2 2 ≤ 3 x m 1 x 2 m 2 " 1 3  x m 1 x 1  3 + 2 3  x m 2 x 2  3 # = x 2 m 1 x 2 m 2 1 x 3 1 + 2 x m 2 x m 1 1 x 3 2 . 6 The sum g ( x | x m ) of the tw o surrogate functions g 1 ( x 1 | x m ) = 1 x 3 1 + x 2 m 1 x 2 m 2 1 x 3 1 + x m 2 2 x m 1 x 2 1 g 2 ( x 2 | x m ) = 2 x m 2 x m 1 1 x 3 2 + x m 1 2 x m 2 x 2 2 ma jorizes f 1 ( x ). I f w e set th e deriv atives g ′ 1 ( x 1 | x m ) = − 3 x 4 1 − x 2 m 1 x 2 m 2 3 x 4 1 + x m 2 x m 1 x 1 g ′ 2 ( x 1 | x m ) = − 6 x m 2 x m 1 1 x 4 2 + x m 1 x m 2 x 2 of each of these eq ual to 0, th en th e up dates x m +1 , 1 = 5 s 3  x 2 m 1 x 2 m 2 + 1  x m 1 x m 2 , x m +1 , 2 = 5 s 6 x 2 m 2 x 2 m 1 solv e the minimization step of the MM algorithm. It is also obvious that t he point x = ( 5 √ 6 , 5 √ 6) t is a fixed p oint of the up dates, and the reader can chec k that it minimizes f 1 ( x ). It is instru ctive to consider the slight v ariations f 2 ( x ) = 1 x 1 x 2 2 + x 1 x 2 2 f 3 ( x ) = 1 x 1 x 2 2 + x 1 x 2 of this ob jective function. In the fi rst case, the reader can chec k that th e MM algorithm iterates according to x m +1 , 1 = 3 s x 2 m 1 x 2 m 2 , x m +1 , 2 = 3 r x m 2 x m 1 . In the second c ase, it iterates according to x m +1 , 1 = 5 s x 3 m 1 x 3 m 2 , x m +1 , 2 = 5 s 2 x 2 m 2 x 2 m 1 . The ob jective function f 2 ( x ) attains its minimum v alue whenever x 1 x 2 2 = 1. The MM algorithm for f 2 ( x ) conv erges after a single iteration to th e v alue 2, bu t th e conve rged p oin t dep ends on t he initial p oint x 0 . The infimum of f 3 ( x ) is 0. This val ue is attained asymptotically by the MM algori thm, whic h sa tisfies th e iden tities x m 1 x 3 / 2 m 2 = 2 3 / 10 and x m +1 , 2 = 2 2 / 25 x m 2 for all m ≥ 1. These results imply that x m 1 tends to 0 and x m 2 to ∞ in suc h a manner that f 3 ( x m ) tends to 0. One could not h op e for m uc h b etter b ehavior of the MM algorithm in t h ese t w o ex amples. The function f 4 ( x ) = x 2 1 x 2 2 − 2 x 1 x 2 x 3 x 4 + x 2 3 x 2 4 = ( x 1 x 2 − x 3 x 4 ) 2 7 is a signomial but not a p osynomial. The surrogate function (6) reduces to g ( x | x m ) = x 2 m 2 2 x 2 m 1 x 4 1 + x 2 m 1 2 x 2 m 2 x 4 2 + x 2 m 4 2 x 2 m 3 x 4 3 + x 2 m 3 2 x 2 m 4 x 4 4 − 2 x m 1 x m 2 x m 3 x m 4 (ln x 1 + ln 2 + ln x 3 + ln x 4 ) with all v aria bles separated. The MM up dates x m +1 , 1 = 4 s x 3 m 1 x m 3 x m 4 x m 2 , x m +1 , 2 = 4 s x 3 m 2 x m 3 x m 4 x m 1 x m +1 , 3 = 4 s x 3 m 3 x m 1 x m 2 x m 4 , x m +1 , 4 = 4 s x 3 m 4 x m 1 x m 2 x m 3 conv erge in a single itera tion to a solution of f 4 ( x ) = 0. Again th e limit dep ends on the initial p oin t. The function f 5 ( x ) = x 1 x 2 + x 1 x 3 + x 2 x 3 − ln( x 1 + x 2 + x 3 ) is more complicated than a signomial. It also is unbounded b ecause the p oin t x with compon ents x 1 = m and x 2 = x 3 = 1 / m satisfies f 5 ( x ) = 2 + m − 2 − ln( m + 2 /m ). According to the ma jorization (7 ), an app ropriate su rrogate is g ( x | x m ) =  x m 2 2 x m 1 + x m 3 2 x m 1  x 2 1 +  x m 1 2 x m 2 + x m 3 2 x m 2  x 2 2 +  x m 1 2 x m 3 + x m 2 2 x m 3  x 2 3 − x m 1 x m 1 + x m 2 + x m 3 ln x 1 − x m 2 x m 1 + x m 2 + x m 3 ln x 2 − x m 3 x m 1 + x m 2 + x m 3 ln x 3 up to an irrelev an t constant. The MM up dates are x m +1 ,i = s x 2 mi ( P j 6 = i x mj )( x m 1 + x m 2 + x m 3 ) . If the components of the initia l p oint coincide, th en the iterates con v erge i n a single iteration t o the saddle p oint with all comp onents equal to 1 / √ 6. Otherwise, it app ears that f 5 ( x m ) ten ds to −∞ . The follo wing o b jective functions f 6 ( x ) = x 2 1 x 6 2 + x 2 1 x 4 2 − 2 x 2 1 x 3 2 − x 2 1 x 2 2 + 5 . 25 x 1 x 3 2 − 2 x 2 1 x 2 + 4 . 5 x 1 x 2 2 + 3 x 2 1 + 3 x 1 x 2 − 12 . 75 x 1 f 7 ( x ) = 10 X i =1 x 4 i + 2 9 X i =1 x 2 i 10 X j = i +1 x 2 j + (10 − 5 − 0 . 5) 10 X i =1 x 2 i − (2 × 10 − 5 ) 10 X i =7 x i + 1 16 f 8 ( x ) = x 1 x 2 3 x − 1 6 x − 1 7 + x 2 1 x − 1 3 x − 2 5 x − 1 6 x 7 + x 3 1 x 2 2 x − 2 5 x 2 6 + x − 1 2 x − 1 4 x 2 6 + x 3 x 3 5 x − 3 6 f 9 ( x ) = x 1 x 2 4 + x 2 x 3 + x 1 x 2 x 3 x 2 4 + x − 1 1 x − 2 4 8 F un T yp e Initial Point x 0 Min Poin t Min V alue Iters (10 − 9 ) f 1 P (1,2) (1.4310,1.4310) 3.4128 38 f 2 P (1,2) (0.6300,1.2599) 2.0000 2 f 3 P (1,1) diverges 0.0000 f 4 S (0.1,0.2,0.3,0.4) (0.1596,0. 3191,0.1954,0.2606) 0.0000 3 f 5 G (1,1,1) (0.4082,0.4082,0.4082) 0.2973 2 (1,2,3) diverges −∞ f 6 S (1,1) (2.9978,0.4994) -14.2031 558 f 7 S (1 , . . . , 10) 0 . 0255 x 0 0.0000 18 f 8 P (1 , . . . , 7) diverges 0.0000 f 9 P (1,2,3,4) (0.3969,0.0000,0.0000,1.5874) 2.0000 7 T able 1 Numerical examples of unconstrained signomial programming. T est functions f 4 ( x ), f 6 ( x ), f 7 ( x ), f 8 ( x ) and f 9 ( x ) are tak en from [19]. P: p osynomial; S: signomial; G: general function. from the reference [19] are intended for numerical illustration. T able 1 lists initial con- ditions, minimum p oints, minimum va lues, and num ber of iterations until conv ergence under the MM algorithm. Conv ergence is declared when th e relative change in the ob jective fun ction is less than a pre-sp ecified v alue ǫ , in other w ords, when f ( x m ) − f ( x m +1 ) | f ( x m ) | + 1 ≤ ǫ. Optimization of the univ ariate surrogate funct ions easily succu mbs to N ewton’s metho d. The MM al gorithm tak es few er iterations to con verge than th e path algorithm for all of the test fun ctions mentioned in [ 19] except f 6 ( x ). F urthermore, the MM algorithm a voi ds calculation of th e gradient and Hessian and requires no matrix d ecompositions or selection of tu ning constants. As Section 7 observes, MM algorithms typically conv erge at a linear rate. Although slo w conv ergence can occur for functions su ch as the test fun ct ion f 6 ( x ), there are sever al w a ys t o accelerate an MM algorithm. F or example, our pu blished quasi-Newton acceleration [20] often reduces th e necessary number of iterations by one or tw o orders of magnitude. Figure 1 show s the progress of the MM iterates for the test function f 6 ( x ) with and without quasi-Newton accelera tion. Un der a conv ergence criterion of ǫ = 10 − 9 and q = 1 secan t condition, the requ ired number of iterations falls to 30; under the same conv ergence criterion and q = 2 secan t conditions, the required num b er of iterations falls t o 12. It is also worth emphasizing that separation of parameters enables parallel processing in high-d imensional problems. W e h a ve recently argued [2 1] that the b est approac h to parallel processing is through graphics pro cessing units (GPUs). These chea p hardware devices offer one to tw o orders of magnitude acceleration in many MM algorithms with parameters separated. 5 C onstrained Signomial Prog ramming Extending th e MM algorithm to constrained geometric and signomial programming is chal lenging. Box constrain ts a i ≤ x i ≤ b i are consistent with parameter separation as just d evelo p ed , but more complicated posynomial constrai nts that couple parameters are not. P osynomial inequality constrain ts take the form h ( x ) = X β d β n Y i =1 x β i i ≤ 1 . 9 0 2 4 6 0 0.5 1 −15 −10 −5 0 5 10 15 20 x 1 x 2 f(x 1 ,x 2 ) 0 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1 x 2 0 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1 x 2 0 1 2 3 4 5 6 7 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 x 1 x 2 Fig. 1 Upp er lef t: The test function f 6 ( x ). Upper right: 558 MM iterates. Lo w er l eft: 30 accelerated MM iterates ( q = 1 secant conditions). Lo w er r igh t: 1 2 accelerated MM i terates ( q = 2 secant conditions). The corresp onding equality constrain t sets h ( x ) = 1. W e prop ose handling b oth co n- strain ts by p en alt y meth ods. Before we treat these matters in more depth, let us relax the p ositivity restrictions on the d β but enforce th e restrictio n β i ≥ 0. The latter ob- jectiv e can b e achiev ed by m u ltiplying h ( x ) b y x max β {− β i , 0 } i for all i . If w e subtract the tw o sides of t he resulting equality , then the equalit y constrain t h ( x ) = 1 can be rephrased as r ( x ) = P γ e γ Q n i =1 x γ i i = 0, wi th no restriction on the signs of the e γ but with the requirement γ i ≥ 0 in effect. F or example, the equality constraint 1 x 1 + x 1 x 2 2 = 1 b ecomes x 2 1 + x 2 2 − x 1 x 2 2 = 0 . In the quadratic p enalty metho d [12, 16] with ob jective fun ction f ( x ) and a single equality constraint r ( x ) = 0 and a single inequality constraint s ( x ) ≤ 0, one minimizes the sum f λ ( x ) = f ( x ) + λr ( x ) 2 + λs ( x ) 2 + , where s ( x ) + = max { s ( x ) , 0 } . As the p enalty constant λ tends to ∞ , the solution vector x λ typicall y con ver ges to the constrained minim um. In t h e revised ob jective function, the term r ( x ) 2 is a signomial whenev er 10 r ( x ) is a signomial. F or ex amp le, in our toy p roblem the choice r ( x ) = x 2 1 + x 2 2 − x 1 x 2 2 has square r ( x ) 2 = x 4 1 + x 4 2 + x 2 1 x 4 2 + 2 x 2 1 x 2 2 − 2 x 1 x 4 2 − 2 x 3 1 x 2 2 . Of course, the p ow ers in r ( x ) can b e fractional h ere as w ell as integ er. The term s ( x ) 2 + is n ot a signomial and m u st b e sub jected to the ma jorization s ( x ) 2 + ≤ ( [ s ( x ) − s ( x m )] 2 s ( x m ) < 0 s ( x ) 2 s ( x m ) ≥ 0 to achieve this s tatus. In practice, one do es not need to fully minimize f λ ( x ) fo r a ny fixed λ . If one increase s λ slo wly enough, then it usually suffices to merely d ecrease f λ ( x ) at eac h iteration. The MM alg orithm is d esigned to ac hieve precisely this goal. Our ex p osition so far sugges ts that we ma jorize r ( x ) 2 , s ( x ) 2 , and [ s ( x ) − s ( x m )] 2 in exactly the same manner that w e ma jorize f ( x ). Separation of parameters general - izes, and the resulting MM algorithm keeps all parameters p ositive while permitting p ertinent parameters to converg e to 0. Section 7 summarizes some of the converge nce prop erties of this hybrid pro cedure. The quadratic p enalty method traditionally relies on Newton’s metho d to mini- mize the unconstrained functions f λ ( x ). Unfortun ately , th is tactic suffers from round- off errors and n umerical instabilit y . Some of these problems disappear wi th th e MM algorithm. No matrix inv ersions are invo lved, and itera tes enjoy the descent p roperty . Ill-conditioning do es cause harm in the form of slo w conve rgence, but the previously mentio ned quasi-Newton acceleration largely remedies the situation [20]. As an alterna- tive to quadratic p enalties, exact p enalties take th e form λ | r ( x ) | + λs ( x ) + . Remark ably , the exact p enalt y metho d prod uces the constrained minimum, not just in the limit, but for all finite λ b eyond a certain p oint. Although this desirable prop erty av oids th e numerical instabilit y encountered in the q u adratic p enalty metho d, th e k inks in the ob jective functions f ( x ) + λ | r ( x ) | + λs ( x ) + are a nuisance. W e will demonstrate in a future p aper h o w t o harness t he MM a lgorithm to exact pen alization. 6 Nonne gative Quadrat ic Programming As an illustration of constrained signomial programming, consider quadratic program- ming o ver th e p ositiv e orthant. Let f ( x ) = 1 2 x t Qx + c t x b e the ob jective function, E x = d the linear equality constraints, and Ax ≤ b the linear inequality constraints. The symmetric matrix Q can b e negative definite, indefinite, or p ositiv e definite. The q uadratic p enalt y metho d invo lves minimizing the sequence of p enalized ob jective functions f λ ( x ) = 1 2 x t Qx + c t x + λ 2 k ( Ax − b ) + k 2 2 + λ 2 k E x − d k 2 2 as λ tends to ∞ . Based on the obvio us ma jorization x 2 + ≤ ( ( x − x m ) 2 x m < 0 x 2 x m ≥ 0 , 11 the term k ( Ax − b ) + k 2 2 is m a jorized by k Ax − b − r m k 2 2 , where r m = min { Ax m − b , 0 } . A brief calculation sho ws th at f λ ( x ) is ma jorized by the su rrogate function g λ ( x | x m ) = 1 2 x t H λ x + v t λm x up to an irrelev an t constant, where H λ and v λm are defined by H λ = Q + λ ( A t A + E t E ) v λm = c − λ A t ( b + r m ) − λ E t d . It is convenien t to assume t hat the diagonal coefficients 1 2 h λii app earing in the quadratic form 1 2 x T H λ x are positive. This is generally the case for large λ . One can hand le the off-diagonal term h λij x i x j by either the ma jorization (4) or t he majorization (5) ac- cording to the sign of h λij . The reader can c heck that the MM updates reduce to x m +1 ,i = x mi 2    − v λmi h + λmi + v u u t v λmi h + λmi ! 2 − 4 h − λmi h + λmi    , (8) where h + λmi = X j : h λij > 0 h λij x mj , h − λmi = X j : h λij < 0 h λij x mj . When h − λmi = 0, t he u p d ate (8 ) collapses to x m +1 ,i = x mi max n − v λmi h + λmi , 0 o . (9) T o av oid stic ky b oundaries, we rep lace 0 in equation (9) by a small p ositive constant ǫ such as 10 − 9 . Sha et a l. [17] d erived the up date (8) for λ = 0 ignoring the constraints E x = d and Ax ≤ b . F or a numerical example without equ ality constraints take f 10 ( x ) = 1 2 x 2 1 + x 2 2 − x 1 x 2 − 2 x 1 − 6 x 2 A =   1 1 − 1 2 2 1   , b =   2 2 3   . The minimum occurs at the p oint (2 / 3 , 4 / 3) t . T able 2 lists the n umber of iterations until con ve rgence and the conv erged p oint x λ for the sequence of penalty constants λ = 2 k . The quadratic program f 11 ( x ) = − 8 x 1 − 16 x 2 + x 2 1 + 4 x 2 2 A =  1 1 1 0  , b =  4 3  conv erges muc h more slowl y . Its minimum occu rs at the p oint (2 . 4 , 1 . 6) t . T able 3 lists the num bers of iterations unti l con verge nce with ( q = 1) and w ithout ( q = 0) acceler- ation and the con v erged p oint x λ for the same se quence of penalty constants λ = 2 k . F ortunately , quasi-Newton acceleration compen sates for ill conditioning in this test problem. 12 log 2 λ Iters x λ 0 8 (0.9503,1.6464) 1 6 (0.8580,1.5164) 2 5 (0.8138,1.4461) 3 23 (0.7853,1.4067) 4 32 (0.7264,1.3702) 5 31 (0.6967,1.3518) 6 30 (0.6817,1.3426) 7 29 (0.6742,1.3380) 8 28 (0.6704,1.3356) 9 26 (0.6686,1.3345) 10 25 (0.6676,1.3339) 11 23 (0.6671,1.3336) 12 22 (0.6669,1.3335) 13 21 (0.6668,1.3334) 14 19 (0.6667,1.3334) 15 18 (0.6667,1.3334) 16 16 (0.6667,1.3333) 17 15 (0.6667,1.3333) T able 2 Iterates from the quadratic penalt y metho d for the test f unction f 10 ( x ). The con- v ergence criterion for the inner lo ops is 10 − 9 . log 2 λ Iters ( q = 0) Iters ( q = 1) x λ 0 18 5 (3.0000,1.800 0) 1 2 2 (2.8571,1.7143 ) 2 56 6 (2.6667,1.666 7) 3 97 5 (2.5455,1.636 4) 4 167 5 (2.4762,1.6190) 5 312 5 (2.4390,1.6098) 6 541 6 (2.4198,1.6049) 7 955 5 (2.4099,1.6025) 8 1674 4 (2.4050,1.6012 ) 9 2924 3 (2.4025,1.6006 ) 10 4839 3 (2.4013,1.600 3) 11 7959 4 (2.4006,1.600 2) 12 12220 4 (2.4003,1.6001 ) 13 17674 4 (2.4002,1.6000 ) 14 21739 3 (2.4001,1.6000 ) 15 20736 3 (2.4000,1.6000 ) 16 8073 3 (2.4000,1.600 0) 17 111 3 (2.4000,1.6000 ) 18 6 4 (2.4000,1.600 0) 19 5 2 (2.4000,1.600 0) 20 3 2 (2.4000,1.600 0) 21 2 2 (2.4000,1.600 0) T able 3 Iterates from the quadratic penalt y metho d for the test f unction f 11 ( x ). The con- v ergence criterion for the inner lo ops is 10 − 16 . 7 C on verg ence As we h a ve seen, the b ehavio r of t he MM algorithm is intima tely tied t o th e b ehavior of the ob jective function f ( x ). F or the sak e of simplicity , we no w restrict atten tion to un constrained minimization of p osynomials and inv estiga te conditions guaranteeing that f ( x ) p ossesses a unique minimum on its domain. Uniqueness is related to the 13 strict con vexit y of the reparameterization h ( y ) = X α ∈ S c α e α t y of f ( x ), where α t y = P n i =1 α i y i is the inner pro duct of α and y and x i = e y i for each i . The H essian matrix d 2 h ( y ) = X α ∈ S c α e α t y αα t of h ( y ) is p ositiv e semidefinite, so h ( y ) is con v ex. I f w e let T b e the subspace of R n spanned by { α } α ∈ S , then h ( y ) is strictly conv ex if and only if T = R n . Indeed, supp ose the condition holds. F or an y v 6 = 0 , w e th en must ha v e α t v 6 = 0 for some α ∈ S . It follo ws that v t d 2 h ( y ) v = X α ∈ S c α e α t y ( α t v ) 2 > 0 , and d 2 h ( y ) is p ositive d efinite. Con v ersely , suppose T 6 = R n , and take v 6 = 0 with α t v = 0 for every α ∈ S . Then h ( y + t v ) = h ( y ) for every scalar t , which is incompatible with h ( y ) b eing strictly conv ex. Strict conve xity guaran tees uniqueness, not existence, of a minimum point. Coer- civeness ensures existence. T he ob jective function f ( x ) i s co ercive if f ( x ) tends to ∞ whenever any comp onent of x tend s to 0 or ∞ . Un der the reparameterization x i = e y i , this i s eq u iv alent to h ( y ) = f ( x ) tending to ∞ as k y k 2 tends to ∞ . A necess ary an d sufficient condition for this to occur is that max α ∈ S α t v > 0 for every v 6 = 0 . F or a proof, sup p ose the contra ry condition h olds for some v 6 = 0 . Then it is clear that h ( t v ) remains boun ded abov e by h ( 0 ) as the scalar t ten ds to ∞ . Co nv ersely , if the stated condition is true, th en th e function q ( y ) = max α ∈ S α t y is contin uous and achiev es its minim um of d > 0 on th e sphere { y ∈ R n : k y k 2 = 1 } . It f ollo ws that q ( y ) ≥ d k y k 2 and that h ( y ) ≥ max α ∈ S { c α e α t y } ≥  min α ∈ S c α  e d k y k 2 . This low er b ound show s t hat h ( y ) is coercive. The co ercive ness condition is hard to apply in practice. An equiv alen t condition is that t h e origin 0 b elongs to the interio r of the conv ex hull of the set { α } α ∈ S . It is straigh tforw ard to show th at the negations of these t w o conditions are logically equiv alen t. Th us, supp ose q ( v ) = max α ∈ S α t v ≤ 0 for some v 6 = 0 . Every con vex com bination P α p α α then satisfies  P α p α α  t v ≤ 0. If the origi n is in the in terior of the convex hull, th en ǫ v is also for every sufficiently small ǫ > 0. But this leads to th e contradiction ǫ v t v = ǫ k v k 2 2 ≤ 0. Con vers ely , supp ose 0 is not in the interior of the con v ex hull. According to the separating hyperplane theorem for conv ex sets, there exists a un it vector v with v t α ≤ 0 = v t 0 for every α ∈ S . In other words, q ( v ) ≤ 0. The conv ex hull criterion is easier to chec k , but it is n ot constructive. In simple cases suc h as the ob jective fun ction f 1 ( x ) where the p o w er v ectors are α = ( − 3 , 0) t , α = ( − 1 , − 2) t , and α = (1 , 1) t , it is visuall y obvious that the origin is in the interior of th eir con vex hull. One can also c hec k the criterion q ( v ) > 0 for all v 6 = 0 by solving a related geometric programming problem. This problem consists in minimizing the scalar t sub ject to the 14 inequality constrain ts α t y ≤ t for all α ∈ S and the nonlinear equality constraint k y k 2 2 = 1. I f t min ≤ 0, t hen the original criterion fails. In some cases, th e ob jective function f ( x ) do es not attain its minimum on the open domain R n > 0 = { x : x i > 0 , 1 ≤ i ≤ n } . This cond ition is equ iv alent to the correspondin g fun ction ln h ( y ) b eing unb ounded b elow on R n . According to Gordon’s theorem [1, 9], this can happ en if and only if 0 is not in the conv ex h ull of the set { α } α ∈ S . Alternatively , b oth cond itions are equiv alen t to the existence of a v ector v with α t v < 0 for al l α ∈ S . F or the ob jective function f 3 ( x ), the p o w er v ectors are α = ( − 1 , − 2) t and α = (1 , 1) t . The origin (0 , 0) t does not lie on the line segmen t b etw een them, and the vector ( − 3 / 2 , 1) t forms a strictly oblique angle w ith eac h. As predicted, f 3 ( x ) does not att ain its infi m um on R n > 0 . The theoretical developmen t in reference [9] demonstrates that the MM algorithm conv erges at a linear rate to th e unique minim um p oint of the ob jective function f ( x ) when f ( x ) is co ercive a nd its co nv ex reparameterization h ( y ) is strictly con v ex. The theory do es not cov er other cases, and it would b e interesting to inv estigate them. The general conv ergence theory of MM algorithms [9] states that five p roperties of th e ob jective function f ( x ) and MM algorithmic map x 7→ M ( x ) guaran tee con verg ence to a stationary point of f ( x ): (a) f ( x ) is co ercive on its op en domain; ( b ) f ( x ) has only isolated stationary p oin ts; (c) M ( x ) is contin uous; (d ) x ∗ is a fixed p oint of M ( x ) if and only if x ∗ is a stationary p oint of f ( x ) ; and (e) f [ M ( x ∗ )] ≥ f ( x ∗ ), with equality if and only if x ∗ is a fix ed p oint of M ( x ). F or a general sig nomial program, items (a) and (b) are the hardest to c heck. Our examples provide some clues. The standard conv ergence results for the quadratic p enalty metho d are cov ered in the references [9, 12, 16]. T o summarize the p rincipal fin ding, supp ose th at the ob jective function f ( x ) and the constrain t functions r i ( x ) and s i ( x ) are contin uous and that f ( x ) is coerciv e on R n > 0 . If x λ minimizes the p enalized ob jectiv e function f λ ( x ) = f ( x ) + λ X i r i ( x ) 2 + λ X j s j ( x ) 2 + , and x ∞ is a cluster p oint of x λ as λ tends to ∞ , then x ∞ minimizes f ( x ) sub ject to the constraints. In this regard observe that the co erciveness assumption on f ( x ) implies that the solution set { x λ } λ is b ounded and p ossesses at least one cluster p oint. Of course, if the solution set consists of a single p oint, th en x λ tends to that p oint. 8 D iscussion The cu rrent paper presents nov el algorithms for b oth geometric and signomial p rogram- ming. Alth ough our examples are lo w dimensional, the previous exp erience of Sha et al. [17] offers convincing evid en ce that the MM algorithm works we ll for h igh-dimensional quadratic programming with nonnegativity constrain ts. The i deas p ursued here – the MM principle, separation of v ariables, qu asi-Newton acceleration, and penalized op- timization – are surprisingly p otent in large-scale optimization. The MM algori thm deals wi th the ob jective function directly and reduces multiv ariate minimizatio n to a sequence of one-dimensional minimizations. The MM updates are simple to co de and enjo y th e crucial descent prop erty . T reating constrained signomial programming by the pen alt y meth od ex tends the MM algorithm even furth er. Q uadratic programming with linear equalit y and inequalit y constraints is th e most important sp ecial case of 15 constrained signomial p rogramming. Ou r n ew MM algorithm for constrained q uadratic programming deserves consideration in high-dimensional problems. Even though M M algorithms can b e notoriously slow to conv erge, q uasi-Newton acceleration can dramat- ically improv e matters. A cceleration inv olv es no matrix in v ersion, only matrix times vector multiplic ation. Finall y , it is w orth keeping in mind that parameter separated algorithms are ideal candidates for parallel pro cessing. Because geometric programs are ultimately conv ex, it is relative ly easy to p ose and chec k sufficient conditions for global con verge nce of the MM algorithm. I n contrast it is far more difficult to analyze the b ehavior of the MM algori thm for signomial programs. Theoretical progress will p robably b e piecemeal an d require problem-sp ecific information. A ma jor difficulty is un d erstanding the asymptotic nature of th e ob jectiv e function as parameters approach 0 or ∞ . Even in the absence of t heoretical guarantees, the descent p roperty of the MM algorithm makes it an attractiv e solution tec hnique and a dia gnostic to ol for finding co unterexamples. Some of our test problems exp ose the b ehavior of the MM algorithm in non- standard situations. W e w elcome the help of the optimization communit y in unrav eling the m ysteries of the MM algorithm in signomial programming. References 1. J.M. Bor wein and A.S. Lewis, Convex A nalysis and Nonline ar Optimization: The ory and Examples , Springer-V erlag, New Y ork, 2000. 2. S. Boyd, S.- J. Kim, L. V andenberghe, and A. Hassibi , A tutorial on geometric programming. Optimization and Engineering , 8:67–127, 2007. 3. S. Boyd and L. V anden berghe, Convex Optimization, Cam bridge Universit y Press, Cam- bridge, 2004. 4. J.G. Eck er, Geometric programming: metho ds, computations and appli cations, SIAM Re- view 22 (1980) 338–362. 5. P .D. F eigin and U. Passy , The geometric pr ogramm ing dual to t he extinction probabilit y problem i n simple branching pro cesses, Annals Pr ob 9 (1981) 498–503. 6. M. del Mar Hershenson, S.P . Bo yd, and T.H. Lee, Optimal desi gn of a CM OS op-amp via geometric programming, IEEE T r ans Computer-Aide d Design 20 (2001) 1–21. 7. K. Hoffman, Analysis in Euclide an Sp ac e, Pren tice-Hall, Englewood Cliffs, NJ, 1975. 8. D.R. Hunter and K . Lange, A tutorial on MM algori thms, A mer Statistician 58 (2004) 30–37. 9. K. Lange, Optimization , Spr i nger-V erlag, New Y ork, 2004. 10. K. Lange, D.R. Hunt er, and I. Y ang, O ptimization transfer using surrogate ob jective functions (with discussion), J Comput Gr aphic al Stat , 9 (2000) 1–59. 11. M. Mazumdar and T. R. Jefferson, Maximum l i k elihoo d estimates for multinomial pr oba- bilities via geometric programming, Biometrika 70 (1983) 257–261. 12. J. No cedal and S.J. W right. (1999) Numerica l Optimization , Springer. 13. U. Pa ssy and D.J. Wilde, A geometric programming algorithm for solving c hemical equi- librium problems, SIAM J Appl Math , 16 (1968) 363–373. 14. A.L. Peressini, F. E. Sulli v an, and J.J. Uhl Jr. The Mathematics of Nonl ine ar Pr o gr am- ming, Spr i nger-V erlag, New Y ork, 1988. 15. E.L. Peterson, Geometric programming, SIAM R eview 18 (1976) 338-362. 16. A. Ruszczynski. Optimization , 2006, Pri nceton Universit y Press. 17. F. Sha, L.K. Saul, and D.D. Lee. Multiplicative up dates for nonnegativ e quadrat ic pro- gramming in support vect or machines. In S. Be c k er, S. Thrun, and K. Obermay er (eds.), A dvanc es in N eur al Information Pr o c essing Syst ems 15, pages 1065-1073. MIT Press: Cam- bridge, M A. 18. J.M. Steele, The Cauchy-Schwarz Master Class: An Intr o duction to the A rt of Ine qualities, Camb ridge Universit y Press and the Mathematical Association of America, Cambridge, 2004. 19. Y. W ang, K. Zhan g, and P . Shen. (2002) A new type of conde nsation curvili near path algorithm f or unconstrained generalized geometric programmi ng, Math. Comput. Mo del ling 35: 1209–1219. 16 20. H. Zhou, D. Al exander, and K.L. Lange. (2009) A quasi-Newton acceleration method f or high-dimensional optimization algorithms, Statistics and Computing , DOI 10.1007/s11222- 009-9166-3 21. H. Zhou, K.L. Lange, and M.A. Suc hard. (2009 ) Graphical pro cessing units and high- dimensional optimization.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment