A subdifferential characterization via Busemann functions and applications to DC optimization on Hadamard manifolds

This paper investigates the properties of Busemann functions on Hadamard manifolds and their use in optimization algorithms in Riemannian settings. We present a new Busemann-based characterization of the subdifferential, which is particularly well su…

Authors: O. P. Ferreira, D. S. Gonçalves, M. S. Louzeiro

A sub differen tial c haracterization via Busemann functions and applications to DC optimization on Hadamard manifolds O. P . F erreira ∗ D. S. Gon¸ calv es † M. S. Louzeiro ∗ S. Z. N ´ emeth ‡ J. Zh u ‡ F ebruary 25, 2026 Abstract This paper inv estigates the properties of Busemann functions on Hadamard manifolds and their use in optimization algorithms in Riemannian settings. W e present a new Busemann- based c haracterization of the sub differen tial, which is particularly well suited to Riemannian optimization. In the classical Hadamard manifold framework, a subgradient pro vides a global lo wer mo del of a con vex function expressed through the inv erse exp onen tial map. Ho wev er, this model ma y fail to exhibit a useful con v exity or conca vity structure. By con trast, our c haracterization yields a concav e bounding function b y exploiting k ey prop erties of Busemann functions. W e use this concavit y to design and analyze difference-of-conv ex (DC) optimization metho ds on Hadamard manifolds. In particular, w e reformulate the classical DC algorithm (DCA) for Riemannian con texts and study its conv ergence prop erties. W e also rep ort preliminary n umerical exp erimen ts comparing the prop osed Busemann DCA, which leads to geo desically con vex subproblems, with the classical Riemannian DCA. Keyw ords: Hadamard manifolds, Busemann functions, subgradien ts, difference of conv ex algo- rithm. AMS sub ject classification: 90A30 · 90A26 1 In tro duction The Busemann functions plays a crucial role in geometric top ology , particularly in the analysis of manifolds like Hadamard manifolds. Introduced b y H. Busemann in [21], this concept captures the essence of the parallel axiom, facilitating the examination of geodesic geometry and offering insigh ts into the b eha vior as they extend tow ards infinity . Its significance lies in its abilit y to reveal fundamen tal asp ects of the ov erall structure of space. As an essen tial element in the study of Hadamard manifolds, the Busemann functions is exp ected to provide deep er insigh ts as researc h progresses. T o corrob orate this, the discussion presented here demonstrates ho w this function can aid in the design and analysis of optimization algorithms for solving problems in Riemannian settings. F or further exploration of the Busemann functions’s role in geometry , refer to works suc h as [4, 20, 43]. Giv en the w ell-established connections b etw een Riemannian geometry and optimization, as ev- idenced b y b ooks such as [1, 16, 41, 46], and ackno wledging the fundamental role of the Busemann ∗ Institute of Mathematics and Statistics, F ederal Univ ersity of Goias, Av enida Esperan¸ ca, s/n, Campus II, Goiˆ ania, GO - 74690-900, Brazil (E-mail: orizon@ufg.br , mauriciolouzeiro@ufg.br ). † Departamen to de Matem´ atica, Universidade F ederal de Santa Catarina, Florian´ op olis, SC 88040-900, Brazil. (E-mail: douglas.goncalves@ufsc.br ) ‡ Sc ho ol of Mathematics, Universit y of Birmingham, W atson Building, Edgbaston, Birmingham - B15 2TT, United Kingdom (E-mails: s.nemeth@bham.ac.uk , jxz755@bham.ac.uk ). 1 functions in Riemannian geometry , it is natural to anticipate its significance extending in to the field of optimization. Indeed, the Busemann function hav e b egun to attract attention from the optimiza- tion researc h communit y op erating within the Hadamard manifolds con text. This atten tion has spark ed efforts to delv e deep er into Busemann functions, exploring their p oten tial applications and theoretical implications across con tinuous optimization and machine learning domains. F or instance, in [11, 12], the concept of a resolv en t for equilibrium problems is introduced in terms of Busemann functions, alongside an inv estigation into elements of con vex analysis in Hadamard manifold settings. F urthermore, [10] presen ts a F enc hel-type conjugate utilizing Busemann functions, accompanied b y an exploration of several prop erties. Additionally , [33] discusses further developmen ts in a conv ex analysis foundation for con v ex optimization on Hadamard spaces, incorp orating the concept of the Legendre-F enchel conjugate. In [36] a subgradient-t yp e algorithm is prop osed for exploring con vex optimization on Hadamard spaces with the iteration based on the con vexit y of the sublevel sets of the Busemann functions. Although immediate practical applications are not our primary fo cus, w e note that [14] introduces h yp erb olic sliced-W asserstein discrepancies using mac hine-learning to ols. These discrepancies are constructed by pro jecting on to geo desics, for instance through pro jections defined b y level sets of Busemann functions. This con text also encourages further exploration of the Busemann functions, as discussed in another related work [15]. Additionally , [31] prop oses a metho d termed h yp erb olic Busemann learning for a classification problem, whic h aims to apply hierarc hical relations among class lab els to strategically p osition h yp erbolic protot yp es. In this pap er, w e in tro duce a nov el characterization of the sub differen tial of a con vex function, a fundamental concept in nonsmo oth optimization theory , based on Busemann functions. This c haracterization yields structural prop erties that are more suitable for optimization on Hadamard manifolds than those pro vided by the classical definition. T raditionally , in the Riemannian setting, the classical subgradient inequalit y guaran tees that a geo desically conv ex function admits a global lo wer b ound expressed through a subgradien t and the in v erse exp onen tial map. In the Euclidean case, the corresponding low er mo del is affine and therefore is b oth con vex and conca v e. How ever, on a general Hadamard manifold, the low er model induced by this classical definition ma y fail to b e geo desically conv ex and may also fail to be geo desically concav e. This notion, in tro duced in [46, Definition 4.3, p. 73] and widely used in works such as [29, 37, 47], motiv ates the search for supp ort constructions with a more fa vorable geometric structure. In contrast, our Busemann-based c harac- terization yields a conca ve b ounding function obtained from the negative of a Busemann function. This choice pro vides a conv enient geometric structure for algorithmic design on Hadamard man- ifolds and suggests further directions for inv estigation b ey ond the scop e of the present work. In particular, we use these Busemann-based b ounds to address difference-of-conv ex (DC) optimization problems on Hadamard manifolds and to develop optimization metho ds suited to the Hadamard manifold framework. F or instance, we explore our Busemann-based c haracterization to address the DC optimization problem on Hadamard manifolds. W e examine the p oten tial of the Busemann functions in optimization, highlighting its role in dev eloping optimization metho ds designed to the Hadamard manifold framework. Specifically , w e revisit the classic difference of con vex algorithm (DCA) for Hadamard manifolds, initially in tro duced and analyzed in [13] as the Hadamard man- ifold counterpart to the celebrated Euclidean difference of conv ex algorithm (EDCA) introduced in [44], recen tly studied in [2]. How ev er, it is important to note that the function inv olved in the subproblem of the classic DCA on Hadamard manifolds is generally not con vex, presen ting sig- nifican t solution-seeking challenges. T o o vercome this limitation, we use the geometric structure pro vided by Busemann functions. Consequently , we prop ose a reformulation of the classical DCA in whic h the subproblem b ecomes geo desically conv ex, enabling a more effective treatment within the Riemannian setting. The pap er is structured as follows. In Section 1.1, we establish some notations. Section 2 2 serv es as a review of essential concepts, notations, and foundational results concerning Hadamard manifolds. In Section 3, w e delve in to the Busemann functions on Hadamard manifolds, elucidating crucial prop erties, providing necessary notations, and offering illustrativ e examples. F ollowing this, in Section 4, w e in tro duce a nov el c haracterization for sub differen tial based on the Busemann functions. In Section 5, we presen t a Busemann-based algorithm for DC optimization on Hadamard manifolds, motiv ated b y the use of Busemann supp orts as geometric model for linearization and w e b egin by revisiting the classical DCA in this setting. In Section 6, w e presen t n umerical exp erimen ts comparing the practical p erformance of the Busemann DCA with the classic Riemannian DCA. Finally , Section 7 presents our conclusions. 1.1 Notation Let R m b e the m -dimensional Euclidean space. The set of all m × n matrices with real en tries is denoted by R m × n and R m ≡ R m × 1 . F or M ∈ R m × n the matrix M T ∈ R n × m denotes the tr ansp ose of M . The matrix I denotes the n × n iden tity matrix. Giv en v ∈ R n , Diag ( v ) denotes the n × n diagonal matrix with the en tries of v in its diagonal. Denote by R n ++ the p ositive orthan t. Let R : = R ∪ { + ∞} b e the extended real line. In line with [2], we will adopt the following conv entions (+ ∞ ) − (+ ∞ ) = + ∞ , (+ ∞ ) − λ = + ∞ , and λ − (+ ∞ ) = −∞ , (1) for all λ ∈ R . The domain of f : M → R is denoted b y dom f : = { p ∈ M : f ( p ) < + ∞} . Throughout this pap er, w e assume that dom f  = ∅ , indicating that f is pr op er . 2 Basics results ab out Hadamard manifolds In this section, we recall some concepts, notations, and basics results ab out Hadamard manifolds. F or more details see for example, [28, 43]. Thr oughout this p ap er M r epr esents a finite dimensional Hadamar d manifold and T p M the tangent sp ac e of M at p . The corresp onding norm asso ciated to the Riemannian metric ⟨· , ·⟩ is represented by ∥·∥ . W e use ℓ ( γ ) to express the length of a piecewise smo oth curve γ : [ a, b ] → M . The Riemannian distance betw een p and q in M is denoted by d ( p, q ), whic h induces the original top ology on M , namely , ( M , d ), which is a complete metric space. The exp onential mapping exp p : T p M → M is defined b y exp p v = γ p,v (1), where γ p,v is the geo desic defined by its initial p osition p , with velocity v at p . Hence, we hav e γ p,v ( t ) = exp p ( tv ). Thus, we wil l also use the expr ession exp p ( tv ) for denoting the ge o desic γ p,v starting at p ∈ M with velo city v ∈ T p M at p . F or a p ∈ M , the exp onen tial map exp p is a diffeomorphism and log p : M → T p M indicates its in verse. In this case, d ( p, q ) = ∥ log p q ∥ holds, d ( · , q ) : M \{ q } → R is C ∞ for all q ∈ M and its gradient is given b y grad 1 d ( p, q ) = ( − log p q ) /d ( q , p ), for all q  = p , where grad 1 denotes the gradient with respect to first co ordinate. In addition, d 2 ( · , q ) : M → R is C ∞ for all q ∈ M , and grad 1 d 2 ( p, q ) = − 2 log p q . Let ¯ p, ¯ q ∈ M and ( p k ) k ∈ N , ( q k ) k ∈ N ⊂ M b e sequences suc h that lim k → + ∞ p k = ¯ p and lim k → + ∞ q k = ¯ q . Then, for an y q ∈ M , lim k → + ∞ log p k q = log ¯ p q and lim k → + ∞ log q p k = log q ¯ p and lim k → + ∞ log p k q k = log ¯ p ¯ q . Given p, q ∈ M , the symbol γ pq indicates the geodesic segment joining p to q , i.e., γ pq : [0 , 1] → M with γ pq (0) = p and γ pq (1) = q . In the follo wing, we will recall the well-kno wn “comparison theorem” for triangles in Hadamard manifolds, as stated in [43, Prop osition 4.5]. Lemma 1. Let M b e a Hadamard manifold. The follo wing inequality holds: d 2 ( x, y ) + d 2 ( x, z ) − 2 ⟨ log x y , log x z ⟩ ≤ d 2 ( y , z ) , ∀ x, y , z ∈ M . (2) 3 As a consequence, ∥ log x y − log x z ∥ ≤ d ( y , z ) , ∀ x, y , z ∈ M . (3) Moreo ver, if the sectional curv ature of M is identically zero, then b oth inequalities (2) and (3) hold as equalities. Definition 2. Let M b e a Hadamard manifold, and let f : M → R b e a function. W e sa y that f is L -Lipschitz on a subset Ω ⊂ M , for some constant L ≥ 0, if | f ( p ) − f ( q ) | ≤ L d ( p, q ) , for all p, q ∈ Ω ∩ dom f , where d denotes the Riemannian distance on M . The follo wing definition pla ys an imp ortant role in the pap er (see [18, p. 363]). Definition 3. A function f : M → R is said to b e lower semi-c ontinuous ( lsc ) at a p oin t p ∈ M if lim inf q → p f ( q ) = f ( p ). If f is low er semi-contin uous at every p oin t in M , w e simply say that f is lower semi-c ontinuous . W e conclude this section with a useful prop ert y of low er semi-con tinuous functions on Hadamard manifolds, as stated in the next prop osition. Its pro of is similar to that of the Euclidean space and will b e omitted here. Prop osition 4. Let M b e a Hadamard manifold and f : M → R b e a low er semi-contin uous function. If f is co erciv e, i.e., lim d ( p, ¯ p ) → + ∞ f ( p ) = + ∞ , for some fixed ¯ p ∈ M , then f has a global minimizer in M . P erhaps the tw o most imp ortan t examples of Hadamard manifolds in optimization applications, apart from the Euclidean space R n , are the κ -hyperb olic space form H n κ and the space of symmetric p ositiv e definite matrices P ( n ). In the next tw o sections, we provide a brief review of their key prop erties, which will serve as the foundation for the examples and n umerical exp erimen ts dev elop ed in this work. 2.1 Basic results on the κ -h yp erb olic space form In this section, we provide a review of the basic results related to the geometry of the κ -hyperb olic space forms. References to this section include [7, 16, 42]. F or a giv en κ > 0, the n -dimensional κ -hyp erb olic sp ac e form and its tangent hyp erplane at a p oint p are denoted by H n κ := n p ∈ R n +1 : ⟨ p, p ⟩ = − 1 κ , p n +1 > 0 o , T p H n κ :=  v ∈ R n +1 : ⟨ p, v ⟩ = 0  , where, ⟨· , ·⟩ is the L or entzian inner pr o duct ⟨ x, y ⟩ := x T J y and J := Diag(1 , . . . , 1 , − 1) ∈ R ( n +1) × ( n +1) . The L or entzian pr oje ction into T p H n κ is the linear mapping Pro j κ p : R n +1 → T p H n κ defined b y Pro j κ p x := x + κ ⟨ p, x ⟩ p, i.e., Pro j κ p := I + κpp T J, where I ∈ R ( n +1) × ( n +1) is the iden tity matrix. The intrinsic distanc e on the κ -hyp erb olic sp ac e form b et ween tw o p oin ts p, q ∈ H n κ is given by d κ ( p, q ) := 1 √ κ arcosh( − κ ⟨ p, q ⟩ ) . (4) The exp onential mapping exp κ q : T q H n κ → H n κ is given by exp κ q v = q for v = 0, and exp κ q v := cosh( √ κ ∥ v ∥ ) q + sinh( √ κ ∥ v ∥ ) v √ κ ∥ v ∥ , ∀ v ∈ T q H n κ \ { 0 } . 4 The inverse of the exp onential mapping log κ q : H n κ → T q H n κ at q ∈ H n κ is given b y log κ q p = 0, for p = q , and log κ q p := √ κd κ ( q , p ) q κ 2 ⟨ q , p ⟩ 2 − 1 Pro j κ q p = d κ ( q , p ) Pro j κ q p ∥ Pro j κ q p ∥ , p  = q . Let Ω ⊂ H n κ b e an op en set. The R iemannian gr adient on the κ -hyp erb olic sp ac e form of a dif- feren tiable function f : Ω → R is the unique v ector field Ω ∋ p 7→ grad f ( p ) ∈ T p M such that d f ( p ) v = ⟨ grad f ( p ) , v ⟩ , see [16, Prop osition 7-5, p.162]. Therefore, w e hav e grad f ( p ) := Pro j κ p J f ′ ( p ) = J f ′ ( p ) + κ  J f ′ ( p ) , p  p, (5) where f ′ ( p ) ∈ R n +1 is the usual gradient of f at p . 2.2 Basic results on the manifold of symmetric p ositiv e definite matrices In this section, we provide a review of the basic results related to the geometry of the manifold of symmetric p ositiv e definite matrices. References to this section include [16, 20]. Let R n × m denote the space of real matrices of size n × m , S ( n ) ⊂ R n × n the set of symmetric matrices, and P ( n ) ⊂ R n × n the set of symmetric p ositiv e definite matrices. W e equip P ( n ) with the affine- invariant R iemannian metric , which is defined by ⟨ U, V ⟩ := tr  Y − 1 U Y − 1 V  , ∀ Y ∈ P ( n ) , U, V ∈ T Y P ( n ) ≡ S ( n ) , (6) where tr( · ) denotes the trace op erator, and T Y P ( n ) is the tangen t space at Y , identified with S ( n ). This metric endo ws P ( n ) with the structure of a Hadamard manifold. The affine-invariant R iemannian distanc e b et ween tw o p oin ts X , Y ∈ P ( n ) is defined by d ( X , Y ) = tr 1 / 2  Log 2  Y − 1 / 2 X Y − 1 / 2  = tr 1 / 2  Log 2  Y − 1 X  , (7) where Log denotes the usual matrix logarithm. The exp onential map at Y ∈ P ( n ) with resp ect to the affine-inv ariant metric giv en by exp Y ( V ) := Y 1 / 2 Exp  Y − 1 / 2 V Y − 1 / 2  Y 1 / 2 , (8) for all V ∈ T Y P ( n ), where Exp denotes the usual matrix exp onen tial. The in v erse of the exponential map, is denoted by log X Y := X 1 / 2 Log( X − 1 / 2 Y X − 1 / 2 ) X 1 / 2 . (9) The gr adient on the manifold of symmetric p ositiv e definite matrices P ( n ) of a differentiable function f : P ( n ) → R is the unique vector field P ( n ) ∋ X 7→ grad f ( X ) ∈ S ( n ) given b y grad f ( X ) = X f ′ ( X ) X, (10) where f ′ ( X ) ∈ S ( n ) denotes the Euclidean gradient of f at X . W e conclude this section with a useful prop ert y of the Riemannian distance that is instrumental in computing Busemann functions on the manifold of symmetric p ositiv e definite matrices. Its proof follo ws directly from (7) and is therefore omitted. Lemma 5. If X ∈ P ( n ) and V is a non-singular matrix, then Log( V X V − 1 ) = V (Log X ) V − 1 . As a consequence, for given X , Y ∈ P ( n ) and non-singular matrices Z and V such that Z X V and Z − 1 Y V − 1 b oth belong to P ( n ), there holds d ( Z X V , Y ) = d  X , Z − 1 Y V − 1  . 5 2.3 Sub differen tial of a conv ex function on Hadamard manifolds In this section, we recall the sub differen tial of a con vex function on a Hadamard manifold and its main prop erties, as presented in [46] and further dev elop ed in [30, 37]. These classical notions are reviewed to prepare the discussion on the c haracterization of the sub differen tial via Busemann function in the subsequent sections. W e b egin with the definitions of con vex sets and conv ex functions. A set Ω ⊂ M is said to be conv ex, if for all p, q ∈ Ω w e ha ve γ pq ( t ) ∈ Ω, for all t ∈ [0 , 1]. A function f : M → R is σ -str ongly c onvex for σ ≥ 0 if ( f ◦ γ pq )( t ) ≤ (1 − t ) f ( p )+ tf ( q ) − σ 2 t (1 − t ) d 2 ( p, q ), for all p, q ∈ M and t ∈ [0 , 1]. In particular, f is c onvex when σ = 0. F or σ = 0, f is strictly c onvex if the inequality is strict for all p  = q in dom f and all t ∈ (0 , 1). Definition 6. Let f : M → R b e a con vex function and q ∈ dom f . A vector s ∈ T q M is said to b e subgradien t of f at q if f ( p ) ≥ f ( q ) +  s, log q p  , ∀ p ∈ M . (11) The set of all subgradien ts of f at the p oint q is called the sub differen tial and is denoted b y ∂ f ( q ). It is a well-established fact that the sub differen tial set ∂ f ( p ) is nonempt y for every p ∈ int dom f . F or an analytic pro of, refer to [46, Theorem 4.5], and for a geometric pro of, see [30]. Moreo ver, ∂ f ( p ) is recognized as a con vex and compact set, as demonstrated in [46, Theorem 4.6]. T o explore further useful prop erties of ∂ f ( p ), w e denote by f ′ ( p, v ) the dir e ctional derivative of f at p in the dir e ction of v ∈ T p M , as defined in [46, Definition 4.1]. Recall that, for a giv en p ∈ M , we ha v e dom f ′ ( p, · ) :=  v ∈ T p M : ∃ ˆ t > 0; exp p ( tv ) ∈ dom f ∀ t ∈ [0 , ˆ t )  . The pro of of the first part of the next result can b e found in [46, Theorem 4.8]; for additional details, see [24] and [37, Prop osition 3.8(ii)]. The second part is addressed in [37, Prop osition 4.3]. Prop osition 7. Let f : M → R be a conv ex function. Then, for each fixed p ∈ dom f , there holds ∂ f ( p ) = { s ∈ T p M : f ′ ( p, v ) ≥ ⟨ s, v ⟩ , v ∈ T p M } . In addition, if g : M → R is a con vex function suc h that dom f ∩ dom g is con vex, then ∂ ( f + g )( p ) = ∂ f ( p ) + ∂ g ( p ), for each p ∈ (int dom f ) ∩ dom g . The proof of the first claim in the theorem b elo w can b e found in [46, Theorem 4.10, p. 76], with the pro of of the second claim following a similar approac h. Theorem 8. Let f : M → R be a function. Then f is conv ex (resp. σ -strongly con vex) if and only if dom f is conv ex and, for every p ∈ dom f , there exists v ∈ T p M suc h that f ( q ) ≥ f ( p ) + ⟨ v , log p q ⟩ (resp. f ( q ) ≥ f ( p ) + ⟨ v , log p q ⟩ + σ 2 d 2 ( p, q )), for all q ∈ M . In either case, ∂ f ( p )  = ∅ for all p ∈ dom f , and the inequality holds for ev ery v ∈ ∂ f ( p ). The pro of of the follo wing result immediately follo ws from [47, Prop osition 2.5]. Prop osition 9. Let f : M → R b e a conv ex and low er semicon tinuous function. Consider the sequence ( p k ) k ∈ N ⊂ int dom f such that lim k →∞ p k = ¯ p ∈ in t dom f . If ( v k ) k ∈ N is a sequence suc h that v k ∈ ∂ f ( p k ) for every k ∈ N , then ( v k ) k ∈ N is b ounded and its cluster p oin ts b elong to ∂ f ( ¯ p ) . 3 The Busemann functions on Hadamard manifolds In this section, we review Busemann functions on Hadamard manifolds, in tro ducing the notation and collecting the prop erties needed in the sequel. Since our definition is sligh tly more general than the standard one, w e include brief pro ofs of selected results. F or further background, see, for instance, [43]. 6 Let M b e a Hadamard manifold with Riemannian distance d . Giv en a base p oin t q ∈ M and a v ector v ∈ T q M , the asso ciated Busemann function is defined by B q ,v ( p ) : = lim t → + ∞  d  p, exp q ( tv )  − ∥ v ∥ t  , ∀ p ∈ M . (12) F or v = 0, this reduces to B q , 0 ( p ) = d ( q , p ). Moreov er, b y the triangle inequalit y , | B q ,v ( p ) | ≤ d ( q , p ) , ∀ q , p ∈ M , ∀ v ∈ T q M . (13) Remark 1. Classically , Busemann functions are defined for unit vectors v  = 0; see, e.g., [20, Definition 8.17, p. 268] and [43, p. 174]. Since B q ,v = B q ,v/ ∥ v ∥ for v  = 0, w e extend the definition to arbitrary v , including v = 0, whic h is con venien t for our purp oses. The following lemma summ arizes the main regularity prop erties of Busemann functions and pro vides tw o equiv alent expressions for their gradients. Lemma 10. Let q ∈ M and v ∈ T q M with v  = 0. Then B q ,v is conv ex and contin uously differentiable on M , with grad B q ,v ( p ) = − lim t →∞ log p (exp q ( tv )) d ( p, exp q ( tv )) (14) = − 1 ∥ v ∥ lim t →∞ log p (exp q ( tv )) t , ∀ p ∈ M . (15) Moreo ver, ∥ grad B q ,v ( p ) ∥ = 1 for all p ∈ M , and grad B q ,v ( q ) = − v / ∥ v ∥ . Pr o of. The iden tity in (14) is established in the pro of of [43, Lemma 4.12, p. 231]. T o pro ve (15), note that by the triangle inequalit y we ha ve t ∥ v ∥ − d ( p, q ) ≤ d ( p, exp q ( tv )) ≤ t ∥ v ∥ + d ( p, q ) for all t > 0. Thus, dividing by t and then taking the limit as t go es to + ∞ , we conclude that lim t →∞ ( d ( p, exp q ( tv )) /t ) = ∥ v ∥ . Com bining this limit with (14), w e obtain grad B q ,v ( p ) = − 1 ∥ v ∥ lim t →∞ log p (exp q ( tv )) d ( p, exp q ( tv )) lim t →∞ d ( p, exp q ( tv )) t = − 1 ∥ v ∥ lim t →∞ log p (exp q ( tv )) t , whic h completes the pro of. Note that, for v = 0, we ha ve B q , 0 = d ( q , p ). Therefore, we also obtain that B q , 0 is conv ex and con tinuously differen tiable with the gradien t vector field grad B q , 0 satisfying ∥ grad B q , 0 ( p ) ∥ = 1, for all p  = q . The following lemma, whose pro of is straigh tforward and th us omitted, in particular implies that the Busemann functions B q ,v is linear along the geo desic t 7→ exp q ( tv ) that defines it. Lemma 11. Let q ∈ M b e a base p oin t and w ∈ T q M . Then, B q ,v (exp q ( τ v )) = τ ∥ v ∥ for all τ ∈ R . Consequen tly , the Busemann functions B q ,v is unbounded b oth ab o v e and b elow. The following lemma records a contin uity prop ert y of the Busemann functions that will b e particularly useful in Section 5. A pro of can b e obtained by adapting the argument of [23, Lemma 5], see also [3, Chapter I I.1]. Lemma 12. Let ¯ q ∈ M and ¯ w ∈ T ¯ q M . Consider sequences ( q k ) k ∈ N ⊂ M and ( v k ) k ∈ N with v k ∈ T q k M , satisfying lim k → + ∞ q k = ¯ q , and lim k → + ∞ v k = ¯ w . Then, lim k → + ∞ B q k ,v k ( p ) = B ¯ q, ¯ w ( p ) , for all p ∈ M . W e conclude this section with a useful iden tity , obtained directly from (12), whic h pro vides a practical means for computing the Busemann functions. Prop osition 13. Let q ∈ M b e a base p oin t and w ∈ T q M with v  = 0. Then, there holds B q ,v ( p ) = lim t → + ∞ ( d 2  p, exp q ( tv )) − ( ∥ v ∥ t ) 2  2 ∥ v ∥ t , ∀ p ∈ M . 7 3.1 Examples of Busemann functions In this section, w e pro vide examples of the Busemann functions. W e b egin by introducing a funda- men tal property that establishes a stronger inequality than (13) in the case where the Busemann functions is p ositiv e. This prop erty not only serves as motiv ation for the examples presented in Hadamard manifolds with identically zero sectional curv ature but also plays an essential role in subsequen t sections. Lemma 14. Let M b e a Hadamard manifold. Then the Busemann functions B q ,v defined in (12), asso ciated with a base p oin t q ∈ M and a direction v ∈ T q M , satisfies the inequality −  v , log q p  ≤ ∥ v ∥ B q ,v ( p ) , ∀ p ∈ M . (16) Moreo ver, if the sectional curv ature is K ≡ 0 in whole M , then inequality (16) holds as equalit y ∥ v ∥ B q ,v ( p ) = −  v , log q p  , for all p ∈ M . Pr o of. It is immediate that (16) holds for v = 0. W e no w assume v  = 0. By applying Lemma 1 to the geo desic triangle with v ertices x = q , y = p , and z = exp q ( tv ), it follo ws that d 2 ( q , p ) + d 2 ( q , exp q ( tv )) − 2  log q p, log q exp q ( tv )  ≤ d 2 ( p, exp q ( tv )) , ∀ t > 0 . (17) Since log q exp q ( tv ) = tv and d ( q , exp q ( tv )) = t ∥ v ∥ , it follo ws from the last inequalit y that d 2 ( q , p ) − 2 t  log q p, v  ≤ ( d 2 ( p, exp q ( tv )) − ( ∥ v ∥ t ) 2 ) , ∀ t > 0 . After p erforming some algebraic manipulations, the last inequality can b e expressed as follo ws d 2 ( q , p ) 2 t −  log q p, v  ≤ ∥ v ∥ ( d 2  p, exp q ( tv )) − ( ∥ v ∥ t ) 2  2 ∥ v ∥ t , ∀ t > 0 . T aking the limit in the previous inequality as t → + ∞ and using Prop osition 13, w e obtain (16). Moreo ver, b y Lemma 1, if the sectional curv ature satisfies K ≡ 0 on M , then (17) holds with equalit y . Consequen tly , all subsequen t inequalities are equalities, whic h prov es the desired equality . In the follo wing example, we present an explicit form ula for Busemann functions on a Hadamard manifold with identically zero sectional curv ature. Example 1. Let M b e a Hadamard manifold, q ∈ M a base p oin t, and v ∈ T q M . If the sectional curv ature of M is iden tically zero, denoted b y K ≡ 0, then the Busemann function B q ,v is giv en b y B q ,v ( p ) : =    D − v ∥ v ∥ , log q p E , ∀ v ∈ T q M , v  = 0 , d ( q , p ) , v = 0 . (18) Indeed, when K ≡ 0 on M , Lemma 14 yields ∥ v ∥ B q ,v ( p ) = −⟨ v , log q p ⟩ , ∀ p ∈ M , whic h, together with B q , 0 ( p ) = d ( q , p ), implies (18). In particular, for M = R n , we obtain B q ,v ( p ) : =    − D v ∥ v ∥ , p − q E , ∀ v ∈ T q M , v  = 0 , ∥ q − p ∥ , v = 0 . (19) Since log q p = p − q and d ( q , p ) = ∥ p − q ∥ in R n , (19) follows directly from (18). 8 Next, w e use (18) to derive an explicit expression for the Busemann functions on the p ositiv e orthan t endow ed with the Dikin metric. A detailed study of the p ositiv e orthan t with the Dikin metric is given in [38]. Example 2. Let M : = ( R n ++ , G ) b e the p ositiv e orthan t R n ++ endo wed with the Dikin metric ⟨ u, v ⟩ := u T G ( q ) v , where u, v ∈ T q M and G ( q ) ∈ R n × n is the diagonal matrix G ( q ) : = diag  q − 2 1 , . . . , q − 2 n  , and q i denotes the i -th co ordinate of the p oin t q . The exp onen tial map exp q : T q M → M is given by exp q ( v ) =  q 1 e v 1 /q 1 , . . . , q n e v n /q n  , v := ( v 1 , . . . , v n ) ∈ T q M ≡ R n . In addition, direct calculations show that the inv erse of the exp onential log q : M → T q M is given b y log q p = ( q 1 ln( p 1 /q 1 ) , . . . , q n ln( p n /q n )) , p := ( p 1 , . . . , p n ) ∈ M . Therefore, it follows from Example 1 that the Busemann functions B q ,v is given by B q ,v ( p ) :=    − 1 ∥ v ∥ P n i =1 ( v i /q i ) ln( p i /q i ) , ∀ v ∈ T q M , v  = 0 ,  P n i =1  ln( p i /q i )  2  1 / 2 , v = 0 . In the follo wing example, we provide an explicit formula for the Busemann functions on the κ -h yp erb olic space form H n κ . F or the detailed computations, see App endix 1. Example 3. Let H n κ b e the κ -h yp erbolic space form introduced in Section 2.1. Let q ∈ H n κ b e a base p oin t, w ∈ T q H n κ . Then, the Busemann functions B q ,v : H n κ → R is given b y B q ,v ( p ) := ( 1 √ κ ln  − D p, κ q + √ κ 1 ∥ v ∥ v E , ∀ v ∈ T q M , v  = 0 , d ( q , p ) , v = 0 . (20) and its gradient, for v ∈ T q M with v  = 0 is giv en by grad B q ,v ( p ) = 1 √ κ 1 D p, κ q + √ κ 1 ∥ v ∥ v E  κ q + √ κ 1 ∥ v ∥ v + κ  κ q + √ κ 1 ∥ v ∥ v , p  p  . (21) P articularly , if p = q , then due to ⟨ q , q ⟩ = − 1 κ and ⟨ p, v ⟩ = 0, the final equation simplifies to grad B q ,v ( q ) = − 1 √ κ  κ q + √ κ 1 ∥ v ∥ v − κq  = − 1 ∥ v ∥ v . In the next example, we deriv e explicit expressions for the Busemann functions and their Rieman- nian gradients on the manifold of symmetric p ositiv e definite matrices described in Section 2.2. Al- though alternativ e formulas are a v ailable in the literature [20, Prop osition 10.69], [33, Lemma 2.32], w e introduce a new representation that is computationally c heap er and b etter suited for numerical applications. F or completeness, App endix 2 pro vides a direct deriv ation that av oids the general theory of symmetric spaces used in previous approac hes. 9 Example 4. Let P ( n ) b e endow ed with the structure of a Hadamard manifold as introduced in Section 2.2. Given X, Y ∈ P ( n ) and V ∈ S ( n ) \ { 0 } , consider the sp ectral decomp osition Y − 1 / 2 V Y − 1 / 2 = U DU T , D :=      λ 1 I n 1 0 · · · 0 0 λ 2 I n 2 · · · 0 . . . . . . . . . . . . 0 0 · · · λ k I n k      . Here, λ 1 , . . . , λ k are the distinct eigen v alues of the matrix Y − 1 / 2 V Y − 1 / 2 , which are ordered such that λ 1 < · · · < λ k ; n i is the multiplicit y of λ i ; I n i ∈ R n i × n i is the identit y matrix; and U ∈ R n × n is an orthogonal matrix. Let U T Y − 1 / 2 X Y − 1 / 2 U = LL T , b e the Cholesky decomp osition. Then, the Busemann functions B Y ,V ev aluated at X is given by B Y ,V ( X ) = − 2  n 1 λ 2 1 + · · · + n k λ 2 k  − 1 / 2 k X i =1 α i X j = α i − 1 +1 λ i ln ( L j j ) , (22) where L j j > 0 denotes the j -th diagonal en try of L , α 0 = 0, and α i = P i j =1 n j for i = 1 , . . . , k . Moreo ver, the Riemannian gradien t of B Y ,V at X is giv en by grad B Y ,V ( X ) = −  n 1 λ 2 1 + · · · + n k λ 2 k  − 1 / 2 Y 1 / 2 U LDL T U T Y 1 / 2 . (23) F or more explicit examples of Busemann functions, see, for instance [10, 20, 33]. As noted in Remark 1, the examples in these references need to b e adapted to fit our definition. 4 Characterization for the sub differen tial via Busemann functions In this section, we provide a Busemann-function c haracterization of the classical sub differential on Hadamard manifolds. More precisely , w e sho w that the usual subgradient inequalit y can b e equiv alently expressed in terms of support functions built from Busemann functions. This viewpoint yields an intrinsic geometric represen tation of subgradients and equips the resulting supp ort models with a more suitable structure for subsequen t dev elopments. In particular, the Busemann-based supp ort function enjo ys conca vit y prop erties under our conv en tion, a feature that will b e crucial in the algorithmic analysis carried out in the next sections. It is w ell known that when the Hadamard manifold M has iden tically zero sectional curv ature, the function app earing on the right-hand side of (11) in Definition 6, namely , M ∋ p 7→ f ( q ) +  s, log q p  , (24) is affine in the sense that its Riemannian Hessian v anishes identically . This affine supp ort mo del is cen tral to many developmen ts in Euclidean nonsmo oth optimization. How ev er, when the curv ature of M is nonzero, the function (24) is no longer affine and, in general, it is neither geo desically con vex nor geo desically concav e; see [35]. This motiv ates the search for an alternative supp ort represen tation that is more suitable for optimization on Hadamard manifolds. On the other hand, b y Example 1, when the sectional curv ature of M is identically zero, i.e., K ≡ 0, we hav e the follo wing equality  s, log q p  = −∥ s ∥ B q ,s ( p ) . 10 Although in general ⟨ s, log q p ⟩  = −∥ s ∥ B q ,s ( p ), the flat-case iden tit y suggests replacing the model (24) b y the Busemann-based supp ort function p 7→ f ( q ) − ∥ s ∥ B q ,s ( p ) , p ∈ M , (25) with the aim of obtaining an in trinsic characterization of the classical sub differen tial. In this con text, Lemma 10 shows that taking (25) as a support function yields the conca vit y prop erty required in our subsequen t analysis. The next theorem establishes this characterization for σ -strongly conv ex functions; in particular, taking σ = 0 recov ers the con vex case. Theorem 15. Let M b e a Hadamard manifold and let f : M → R b e a σ -strongly con vex function with σ ≥ 0. Then, for ev ery q ∈ dom f , the sub differen tial of f at q admits the c haracterization ∂ f ( q ) =  s ∈ T q M : f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M  . Pr o of. W e begin by pro ving ∂ f ( q ) ⊆  s ∈ T q M : f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M  . Let s ∈ ∂ f ( q ). Since f is σ -strongly conv ex, b y Theorem 8 w e hav e f ( p ) ≥ f ( q ) + ⟨ s, log q p ⟩ + σ 2 d 2 ( p, q ), for all p ∈ M . On the other hand, Lemma 14 guaran tees that −∥ s ∥ B q ,s ( p ) ≤ ⟨ s, log q p ⟩ , for all p ∈ M . Com bining the last t wo inequalities gives f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ), for all p ∈ M , whic h shows that s ∈  s ∈ T q M : f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M  . (26) and the first inclusion follows. W e now pro ve the reverse inclusion. F or that, tak e s ∈  s ∈ T q M : f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M  . If s = 0, then f ( p ) ≥ f ( q ) + σ 2 d 2 ( p, q ) ≥ f ( q ) for all p ∈ M , whic h implies that 0 ∈ ∂ f ( q ). Assume now that s  = 0 and let v ∈ T q M . If v / ∈ dom f ′ ( q , · ), then f ′ ( q ; v ) = + ∞ and th us we conclude that ⟨ s, v ⟩ ≤ f ′ ( q ; v ). Otherwise, take v ∈ dom f ′ ( q , · ). Then there exists ˆ t > 0 such that exp q ( tv ) ∈ dom f for all t ∈ [0 , ˆ t ), and f (exp q ( tv )) ≥ f ( q ) − ∥ s ∥ B q ,s (exp q ( tv )) + σ 2 d 2 (exp q ( tv ) , q ) , ∀ t ∈ [0 , ˆ t ) . Since d (exp q ( tv ) , q ) = t ∥ v ∥ and B q ,s ( q ) = 0, dividing b oth sides b y t yields f (exp q ( tv )) − f ( q ) t ≥ −∥ s ∥ B q ,s (exp q ( tv )) − B q ,s ( q ) t + σ 2 t ∥ v ∥ 2 , ∀ t ∈ [0 , ˆ t ) . T aking the limit as t go es to 0 + and using Lemma 10, namely grad B q ,s ( q ) = − s/ ∥ s ∥ , w e obtain f ′ ( q ; v ) ≥ −∥ s ∥⟨ grad B q ,s ( q ) , v ⟩ = ⟨ s, v ⟩ . Therefore, for all v ∈ T q M we ha ve ⟨ s, v ⟩ ≤ f ′ ( q ; v ). Hence, by Prop osition 7, w e conclude that s ∈ ∂ f ( q ), whic h together with (26) completes the pro of. As a direct consequence of Theorem 15, we obtain a v ariational characterization of subgradients in terms of global minimizers of a Busemann-based supp ort model, with an additional quadratic term accounting for σ -strong conv exit y . Corollary 16. Let M b e a Hadamard manifold, let f : M → R b e a σ -strongly con vex function with σ ≥ 0, and let q ∈ dom f . Then, s ∈ ∂ f ( q ) if and only if q ∈ argmin p ∈ M  f ( p ) + ∥ s ∥ B q ,s ( p ) − σ 2 d 2 ( p, q )  . As a consequence, if s ∈ ∂ f ( q ), then f ( q ) = min p ∈ M  f ( p ) + ∥ s ∥ B q ,s ( p ) − σ 2 d 2 ( p, q )  . 11 Pr o of. Let q ∈ dom f and s ∈ T q M , and define the auxiliary function ψ : M → R by ψ ( p ) := f ( p ) + ∥ s ∥ B q ,s ( p ) − σ 2 d 2 ( p, q ) , p ∈ M . Since B q ,s ( q ) = 0 and d ( q , q ) = 0, we hav e ψ ( q ) = f ( q ). Assume first that s ∈ ∂ f ( q ). Thus, by Theorem 15 we hav e f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M . Rearranging and using the definition of ψ , w e obtain ψ ( p ) ≥ ψ ( q ) for all p ∈ M , which implies that q ∈ argmin p ∈ M ψ ( p ). F or the con v erse, assume that q ∈ argmin p ∈ M ψ ( p ). Then ψ ( p ) ≥ ψ ( q ) for all p ∈ M , that is, f ( p ) + ∥ s ∥ B q ,s ( p ) − σ 2 d 2 ( p, q ) ≥ f ( q ), for all p ∈ M , whic h is equiv alent to f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M . By Theorem 15, the latter implies s ∈ ∂ f ( q ) and the equiv alence is prov ed. F or the last statement, if s ∈ ∂ f ( q ), then by the first part w e ha v e q ∈ argmin p ∈ M ψ ( p ), hence inf p ∈ M ψ ( p ) = ψ ( q ). Since ψ ( q ) = f ( q ), the conclusion follows. W e next present Corollary 16 in p erspective b y comparing its v ariational characterization with the Busemann-subgradient notion of [32], emphasizing the flat case where they coincide and the nonflat case where they may differ. Remark 2. In the context of Hadamard manifolds, Corollary 16 is closely related in nature to the notion of Busemann sub gr adient in [32, Definition 3.1], although the resulting supp orting ob jects differ in general. In the conv ex case σ = 0, Corollary 16 states that s ∈ ∂ f ( q ) if and only if q ∈ arg min p ∈ M  f ( p ) + ∥ s ∥ B q ,s ( p )  . (27) On the other hand, in the Hadamard manifold setting one ma y express [32, Definition 3.1] in our framew ork as follows: a Busemann subgradient at a p oin t x ∈ dom f can b e represented b y a vector ξ ∈ T x M such that x ∈ arg min y ∈ M  f ( y ) − ∥ ξ ∥ B x, − ξ ( y )  , (28) see also [23, Definition 1]. In the flat c ase (sectional curv ature identically zero), Example 1 yields ∥ s ∥ B q ,s ( p ) = −⟨ s, log q p ⟩ , ∥ ξ ∥ B x, − ξ ( y ) = ⟨ ξ , log x y ⟩ , and B x, − ξ = − B x,ξ . Hence (28) is equiv alen t to x ∈ arg min y ∈ M { f ( y )+ ∥ ξ ∥ B x,ξ ( y ) } , whic h coincides with (27) after the identification q = x and s = ξ . In particular, b oth notions recov er the classical Euclidean supp orting-h yp erplane condition and the corresponding supports coincide. In contrast, on nonflat Hadamard manifolds (27) and (28) are not equiv alen t in general. A key p oin t is that [32] requires a glob al supp ort parameterized by an ideal direction and a sp eed, and suc h a support ma y fail to exist even for geo desically conv ex functions; see [32, Example 3.2]. Consequently , if one defines the b -sub differen tial at x by ∂ b f ( x ) :=  ξ ∈ T x M : (28) holds  , or equiv alently b y ∂ b f ( x ) :=  ξ ∈ T x M : f ( y ) ≥ f ( x ) + ∥ s ∥ B x, − ξ ( y ) , ∀ y ∈ M  , then ∂ b f ( x ) ⊂ ∂ f ( x ), and this inclusion can b e strict b ey ond the flat case. As a further consequence of Theorem 15, w e recov er the following classical b ound linking Lips- c hitz contin uit y and the norm of subgradients. 12 Corollary 17. Let f : M → R b e conv ex and L -Lipschitz on M , i.e., | f ( x ) − f ( y ) | ≤ L d ( x, y ) for all x, y ∈ M . Then, for ev ery q ∈ dom f and every s ∈ ∂ f ( q ), there holds ∥ s ∥ ≤ L. Pr o of. Fix q ∈ dom f and s ∈ ∂ f ( q ). If s = 0, the conclusion is trivial. Assume s  = 0, and for τ > 0 set p τ := exp q ( τ s ). By Theorem 15 and Lemma 11, we hav e f ( p τ ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p τ ) = f ( q ) + τ ∥ s ∥ 2 . On the other hand, considering that f is a L -Lipschitz function on M we conclude that f ( p τ ) ≤ f ( q ) + L d ( q , p τ ) = f ( q ) + Lτ ∥ s ∥ . Com bining the tw o inequalities and dividing b y τ > 0 yields ∥ s ∥ 2 ≤ L ∥ s ∥ , hence ∥ s ∥ ≤ L . As a further application of Theorem 15, we obtain an intrinsic c haracterization of σ -strong con vexit y in terms of Busemann-based supp ort inequalities. Prop osition 18. Let f : M → R b e a function. Then f is σ -strongly conv ex with σ ≥ 0 if and only if dom f is conv ex and, for every q ∈ dom f , there exists v ∈ T q M such that f ( p ) ≥ f ( q ) − ∥ v ∥ B q ,v ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M . In this case, ∂ f ( q )  = ∅ for all q ∈ dom f , and the inequalit y holds for ev ery v ∈ ∂ f ( q ). Pr o of. First assume that f is σ -strongly conv ex. Then, by Theorem 8, dom f is conv ex and we ha ve ∂ f ( q )  = ∅ for all q ∈ dom f . Fix q ∈ dom f and take v ∈ ∂ f ( q ). Applying Theorem 15 we obtain f ( p ) ≥ f ( q ) − ∥ v ∥ B q ,v ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M , whic h in particular sho ws the existence of suc h a v ector at each q . Con versely , assume that dom f is conv ex and that for every q ∈ dom f there exists v ∈ T q M suc h that f ( p ) ≥ f ( q ) − ∥ v ∥ B q ,v ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M . By Theorem 15, it follo ws that v ∈ ∂ f ( q ) for ev ery q ∈ dom f . In particular, ∂ f ( q )  = ∅ for all q ∈ dom f . Hence, b y Definition 6, for each q ∈ dom f there exists u ∈ T q M (namely , u := v ) suc h that f ( p ) ≥ f ( q ) +  u, log q p  + σ 2 d 2 ( p, q ) , ∀ p ∈ M . Applying Theorem 8, we conclude that f is σ -strongly conv ex. Finally , once σ -strong conv exity is established, Theorem 15 ensures that the Busemann supp ort inequalit y holds for every v ∈ ∂ f ( q ). W e conclude this section by comparing our Busemann supp ort characterization of geo desic con vexit y with horospherical conv exit y [23], noting that they coincide in the flat case but may differ in the nonflat case. Remark 3. Prop osition 18 is conceptually differen t from the horospherical supp ort inequalit y in [23, Definition 1], although it plays an analogous supp orting-role; it pro vides a geo desic conv exity c haracterization via Busemann-based supp orts. Indeed, when σ = 0, Prop osition 18 asserts that for every q ∈ dom f there exists v ∈ T q M such that f ( p ) − f ( q ) ≥ −∥ v ∥ B q ,v ( p ), for all p ∈ M . F or comparison, the defining inequalit y of h -conv exit y in [23, Eq. (6)] can b e written as f ( p ) − f ( q ) ≥ ∥ v ∥ B q , − v ( p ). Moreo ver, b y Example 1, in the flat c ase (sectional curv ature iden tically zero) one has 13 B q ,v ( p ) = ⟨ v , log q p ⟩ , and, in this setting, B q , − v = − B q ,v , so the t wo supp ort inequalities coincide and b oth reduce to the classical affine supp ort characterization of con vexit y in R n . In con trast, on nonflat Hadamard manifolds the antisymmetry B q , − v = − B q ,v generally fails, so horospherical supp orts built from B q , − v and the Busemann supports in Prop osition 18, which are expressed in terms of − B q ,v , need not coincide; see [23, Sec. 3.5] for a conceptual discussion of the genuinely global nature of h -con vexit y . Finally , Prop osition 18 guaran tees that, in the conv ex case, the supp ort inequality holds for every Riemannian subgradient v ∈ ∂ f ( q ), whereas the horospherical sub differen tial ∂ h f ( q ) in [23] consists only of those directions pro ducing a global horospherical supp ort. In fact, we can sho w that ∂ h f ( q ) ⊂ ∂ f ( q ), and this inclusion can b e strict b ey ond the flat case. 5 The Busemann DC Algorithm for DC optimization In this section, we develop a Busemann-based DC algorithm for DC optimization on Hadamard manifolds. The main idea is to replace the standard linearization step in classical DC schemes by a Busemann-t yp e supp ort term, thereb y producing geo desically con vex subproblems that better re- flect the am bien t geometry . This yields a geometric DCA framework and a practical algorithmic to ol for DC programs on Hadamard manifolds. W e consider the differ enc e-of-c onvex (DC) optimization problem on a Hadamard manifold M , defined b y argmin p ∈ M ϕ ( p ) , where ϕ ( p ) : = g ( p ) − h ( p ) , (29) where g , h : M → R are prop er, low er semicon tinuous, and geo desically conv ex. Despite ϕ being the difference of t w o con vex functions, ϕ generally do es not exhibit conv exity . Ho w ever, we can show that ϕ is lo cally Lipschitz on the in terior of its domain. Consequently , ϕ p ossesses a sub differen tial in the Clark e sense, denoted by ∂ o ϕ , as detailed in [8, 9]. Moreo ver, it can b e demonstrated that ∂ o ϕ ( p ) ⊂ ∂ g ( p ) − ∂ h ( p ). Therefore, a requisite condition for a p oint p ∗ ∈ M to qualify as a lo cal minim um of ϕ = g − h is that 0 ∈ ∂ o ϕ ( p ∗ ) ⊂ ∂ g ( p ∗ ) − ∂ h ( p ∗ ). Thus, leading us to define a critic al p oint of problem (29) as follo ws: Definition 19. A p ∗ ∈ M is called a critical p oin t for the problem (29) if ∂ g ( p ∗ ) ∩ ∂ h ( p ∗ )  = ∅ . F or further discussion on the definition of a critical p oin t for problem (29), see [26] for example. In the following tw o sections, we emplo y the Busemann functions to introduce and analyze tw o v ariants of the EDCA for solving problem (29). In addition to prop erness, lo wer semicon tin uity , and con vexit y , we w ork under the following assumptions: A1) M is a Hadamar d manifold . A2) ϕ inf : = inf x ∈ M ϕ ( x ) > −∞ ; A3) dom g and dom h are con vex and dom g ⊆ in t dom h. Firstly , w e address the assumptions outlined ab ov e. Assumption (A1) is fundamental for the analysis of the algorithms prop osed in the follo wing sections, since it is rep eatedly used through Theorem 15. Under assumption (A2), the domain of ϕ equals the domain of g , that is, dom ϕ = dom g ⊆ dom h . This inclusion is derived from the fact that if dom g ⊆ dom h , then there exists a p oin t p ∈ dom g suc h that p / ∈ dom h . According to (1), this situation implies ϕ ( p ) = g ( p ) − h ( p ) = g ( p ) − (+ ∞ ) = −∞ , whic h is in contradiction with assumption (A2). Consequently , we hav e dom g ⊆ dom h , and th us dom g ⊆ dom ϕ . 14 Con versely , assume for contradiction that dom ϕ ⊆ dom g . Then, there must b e some p ∈ dom ϕ for whic h g ( p ) = + ∞ . Utilizing (1) again, we find ϕ ( p ) = g ( p ) − h ( p ) = (+ ∞ ) − h ( p ) = + ∞ . Ho wev er, this con tradicts the definition of the domain of ϕ , since p would not b e included in dom ϕ if ϕ ( p ) w ere infinity . Thus, dom ϕ ⊆ dom g . Therefore, dom ϕ = dom g , and since dom g ⊆ dom h under assumption (A2), it follo ws that assumption (A3) is only marginally more restrictive than (A2). Additionally , if dom h = M , then assumption (A3) is naturally satisfied. 5.1 The Busemann DC algorithm In this subsection, we b egin b y revisiting the classical difference-of-conv ex algorithm on Hadamard manifolds, introduced and analyzed in [13] as a manifold counterpart of the EDCA. Since the function in v olved in the subproblem of the classic DCA on Hadamard manifolds is not conv ex in general, it presents challenges in seeking solutions. Here, by using Busemann functions, w e prop ose a new version of this metho d to ov ercome this limitation. Sp ecifically , the function in the subproblem is no w conv ex, enabling more effectiv e optimization within the Riemannian con text. T o prop ose and analyze the method, w e assume that the functions in problem (29) satisfy the following hypothesis: (H1) g : M → R and h : M → R are σ -strongly conv ex and lsc, where σ > 0; W e b egin by demonstrating that (H1) is not restrictive. In fact, let q ∈ M and σ > 0. Consider the function M ∋ p 7→ ( σ / 2) d 2 ( q , p ), which is σ -strongly con vex, as demonstrated in [25, Corollary 3.1]. If ˜ g : M → R and ˜ h : M → R are conv ex, then by c ho osing q ∈ M and defining g ( p ) = ˜ g ( p ) + ( σ / 2) d 2 ( q , p ) and h ( p ) = ˜ h ( p ) + ( σ / 2) d 2 ( q , p ), w e obtain tw o σ -strongly conv ex functions g and h on M . F urthermore, it holds that ϕ = ˜ g − ˜ h = g − h . Ho wev er, it is imp erativ e to caution users of the Riemannian difference of conv ex algorithm prop osed b elo w that the selection of the parameter σ > 0 significan tly influences the con vergence rate of the metho ds employ ed to solv e the subproblem, as well as the o verall method. Next, we revisit a Riemannian version of the EDCA, whic h, using the same terminology , we refer to it as the classic Riemannian Differ enc e of Convex A lgorithm (CR-DCA). This algorithm is used to solve the DC problem (29), which is formally stated as follows: Algorithm 1 Classic Riemannian difference of conv ex algorithm (CR-DCA) Step 0. Set p 0 ∈ dom g and set k ← 0; Step 1. T ake s k ∈ ∂ h ( p k ), and define the next iterate p k +1 as follows p k +1 = argmin p ∈ M  g ( p ) −  s k , log p k p  . (30) Step 2. If p k +1 = p k , then stop and return the p oin t p k . Otherwise, set k ← k + 1 and go to Step 1 . It can be shown that in the Euclidean setting, the inv erse of the exp onen tial mapping is giv en b y M ∋ p 7→ log p k p = p − p k . Therefore, Algorithm 1, represents a version in Hadamard manifold of the EDCA. The motiv ation b ehind introducing the EDCA lies in the fact that the function M ∋ p 7→ g ( p ) − ⟨ s k , p − p k ⟩ in the subproblem (30) in the Euclidean setting is con v ex. By doing so, one replaces the solution of the non-conv ex problem (29) with the solution of a sequence of con vex subproblems. Ho wev er, as discussed in Section 4, the function in the subproblem (30) is not con vex on general Hadamard manifolds, p osing c hallenges in its solution. T o address this limitation, we can redefine the CR-DCA by employing a Busemann functions, wherein the function in the subproblem coun terpart of (30) b ecomes conv ex. The formulation of the Busemann Differ enc e of Convex A lgorithm (B-DCA) to solv e the DC problem (29) is outlined b elo w: 15 Algorithm 2 Busemann difference of conv ex algorithm (B-DCA) Step 0. Set p 0 ∈ dom g and set k ← 0; Step 1. T ake s k ∈ ∂ h ( p k ), and define the next iterate p k +1 as follows p k +1 := argmin p ∈ M ( g ( p ) + ∥ s k ∥ B p k ,s k ( p )) . (31) Step 2. If p k +1 = p k , then stop and return the p oin t p k . Otherwise, set k ← k + 1 and go to Step 1 . It is noteworth y that in the case where the curv ature of the Hadamard manifold is K = 0, Example 1 establishes the equiv alence b et ween Algorithm 1 and Algorithm 2. In particular, Exam- ple 1 serv es to illustrate that B-DCA aligns with EDCA. F urthermore, comparing the ob jectiv e in subproblem (30) with that in subproblem (31), w e introduce the function ϕ k : M → R defined by ϕ k ( p ) : = g ( p ) + ∥ s k ∥ B p k ,s k ( p ) . (32) Then, in view of Assumption (H1), the function ϕ k is σ -strongly conv ex. Before pro ceeding with the analysis of Algorithm 2, it is imp erativ e to ackno wledge that the p oin t p k +1 , as a solution of (31), satisfies the following inequalit y: g ( p ) + ∥ s k ∥ B p k ,s k ( p ) ≥ g ( p k +1 ) + ∥ s k ∥ B p k ,s k ( p k +1 ) ∀ p ∈ M . (33) T o further adv ance the analysis of Algorithm 2, we first e stablish that the algorithm is well defined. This fundamen tal prop erty is addressed in the following prop osition. Prop osition 20. Algorithm 2 is w ell defined, i.e., p k ∈ dom g for all k = 0 , 1 , . . . , and the sub- problem in (31) has a unique solution. Moreov er, if p k +1 = p k , then p k is a critical p oin t of ϕ . Pr o of. Assume that p k ∈ dom g . Using assumption (A3) w e conclude that p k ∈ in t dom h . Thus, b y [30, Theorem 3.3], w e ha v e ∂ h ( p k )  = ∅ . Let s k ∈ ∂ h ( p k ). Now we are going to show that the subproblem in (31) has a unique solution. Since, b y (H1), g : M → R is σ -strongly c on vex, it follows from Theorem 8 that g ( p ) ≥ g ( p k ) + ⟨ v , log p k p ⟩ + σ 2 d 2 ( p k , p ) , ∀ p ∈ M , ∀ v ∈ ∂ g ( p k ) . Th us, by considering ϕ k ( p ) = g ( p ) + ∥ s k ∥ B p k ,s k ( p ) and employing the last inequalit y , w e deduce ϕ k ( p ) d ( p k , p ) ≥ g ( p k ) d ( p k , p ) + D v , log p k p d ( p k , p ) E + σ 2 d ( p k , p ) + ∥ s k ∥ B p k ,s k ( p ) d ( p k , p ) , ∀ p ∈ M , ∀ v ∈ ∂ g ( p k ) . (34) Observ e that (13) yields | B p k ,s k ( p ) | ≤ d ( p k , p ), for all p ∈ M , and d ( p k , p ) = ∥ log p k p ∥ . Th us, taking the limit in (34) we obtain that lim d ( p k ,p ) → + ∞ ϕ k ( p ) /d ( p k , p ) = + ∞ . In particular, we conclude that lim d ( p k ,p ) → + ∞ ϕ k ( p ) = + ∞ . Hence, using Prop osition 4 we conclude that ϕ k has a global minimizer. Therefore, the subproblem in (31) has a global solution and, due to ϕ k b e σ -strongly con vex, the solution is unique. Consequently , there exists a unique p k +1 ∈ dom g = dom ϕ satisfying (31), whic h implies that Algorithm 2 is w ell defined. T o pro ve the last statement, we assume that p k +1 = p k . Th us, (33) implies that g ( p ) ≥ g ( p k ) − ∥ s k ∥ B p k ,s k ( p ), for all p ∈ M , which b y using Theorem 15 shows that s k ∈ ∂ g ( p k ). Hence, taking into accoun t that s k ∈ ∂ h ( p k ), we conclude that s k ∈ ∂ g ( p k ) ∩ ∂ h ( p k )  = ∅ . Therefore, it follo ws from Definition 19 that p k is a critical p oin t of ϕ in problem (29). Dra wing on Prop osition 20, we establish the well-defined nature of Algorithm 2, resulting in the generation of a sequence ( p k ) k ∈ N . Given that the termination condition implies the critical point of ϕ is attained in the final iteration, we anticip ate this se quenc e wil l b e infinite, an assumption adopte d henc eforth . 16 5.2 Con v ergence analysis In this section, we conduct a detailed analysis of the B-DCA, designated as Algorithm 2 under the assumptions (A1), (A2), (A3) and (H1). The theoretical results obtained herein corresp ond to those obtained for the EDCA and CR-DCA versions. Nonetheless, it is noteworth y that in the B-DCA v arian t, the utilization of the Busemann functions renders the subproblem conv ex, a notable departure from the CR-DCA version. This fundamental adv ancement simplifies the pro cess of solving subproblems, given that con v ex problems hav e significan tly low er computational complexit y than non-con vex ones. W e b egin b y demonstrating a descen t prop ert y of the algorithm and establishing that the distance b et w een consecutive iterates con verges to zero as k tends to infinit y . Prop osition 21. Let ( p k ) k ∈ N b e a sequence generated by Algorithm 2. Then, it satisfies the inequalit y ϕ ( p k +1 ) ≤ ϕ ( p k ) − σ 2 d 2 ( p k , p k +1 ) , ∀ k ∈ N . (35) As a consequence, the sequence ( ϕ ( p k )) k ∈ N is strictly decreasing and con verges. Moreo ver, we hav e lim k → + ∞ d ( p k , p k +1 ) = 0. Pr o of. Considering that d ( p k , p k ) = 0, from (13) we ha ve B p k ,s k ( p k ) = 0. Th us, using the inequality in (33) with p = p k w e conclude that g ( p k ) ≥ g ( p k +1 ) + ∥ s k ∥ B p k ,s k ( p k +1 ) . Moreo ver, taking into accoun t that h is a σ -strongly conv ex function and s k ∈ ∂ h ( p k ), it follows from Theorem 15 that h ( p k +1 ) ≥ h ( p k ) − ∥ s k ∥ B p k ,s k ( p k +1 ) + ( σ / 2) d 2 ( p k +1 , p k ) . Thus, combining the t w o previous inequal- ities and using that ϕ = g − h , w e derive (35). T o verify the second statemen t, we first observ e that (35) entail s the inequalit y 0 ≤ ( σ / 2) d 2 ( p k , p k +1 ) ≤ ϕ ( p k ) − ϕ ( p k +1 ) , ∀ k ∈ N . (36) Since w e are assuming that ( p k ) k ∈ N is infinite, it follo ws from Prop osition 20 that p k +1  = p k . Hence, considering that σ > 0, we hav e from (36) that ϕ ( p k +1 ) < ϕ ( p k ), for all k ∈ N . Therefore, ( ϕ ( p k )) k ∈ N is strictly decreasing. F urthermore, due to (A2) implying that ( ϕ ( p k )) k ∈ N is b ounded from b elow, w e conclude that it conv erges. Since ( ϕ ( p k )) k ∈ N con verges, taking the limit in (36) we obtain that lim k → + ∞ d ( p k , p k +1 ) = 0, whic h concludes the pro of. The follo wing theorem is the main finding regarding the conv ergence behavior of Algorithm 2, detailing the limiting b eha vior of the produced sequences and establishing their relationship with critical p oin ts of the ob jectiv e function. Theorem 22. Let ( p k ) k ∈ N and ( s k ) k ∈ N b e generated by Algorithm 2. If ¯ p is a cluster p oin t of ( p k ) k ∈ N , then ¯ p ∈ dom g and there exists a cluster p oin t ¯ s of ( s k ) k ∈ N suc h that ¯ s ∈ ∂ g ( ¯ p ) ∩ ∂ h ( ¯ p ). Consequen tly , ev ery cluster p oin t of ( p k ) k ∈ N , if any , is a critical p oin t of ϕ . Pr o of. Let ¯ p ∈ M b e a cluster p oin t of the sequence ( p k ) k ∈ N . W e may assume, without loss of generalit y , that lim k → + ∞ p k = ¯ p . It follows from Prop osition 21 that ( ϕ ( p k )) k ∈ N is strictly decreasing and conv ergent. Moreov er, due to ϕ ( p 0 ) ≥ ϕ ( p k ) = g ( p k ) − h ( p k ) we hav e ϕ ( p 0 ) + h ( p k ) ≥ g ( p k ). Th us, considering that g and h are lsc, w e hav e ϕ ( p 0 ) + h ( ¯ p ) = lim inf k → + ∞ ( ϕ ( p 0 ) + h ( p k )) ≥ lim inf k → + ∞ g ( p k ) ≥ g ( ¯ p ) , whic h implies that ϕ ( p 0 ) ≥ ϕ ( ¯ p ). Th us, due to p 0 ∈ dom g = dom ϕ we conclude that ¯ p ∈ dom ϕ = dom g . Hence, using that ¯ p ∈ dom g and (A3), w e also ha ve ¯ p ∈ int dom h . By Prop osition 20 17 together with (A3), w e kno w that ( p k ) k ∈ N ∈ in t dom h , and Step 1 implies that s k ∈ ∂ h ( p k ), for all k ∈ N . Therefore, if necessary , by considering a subsequence, we can assume, without loss of generalit y , b y inv oking Prop osition 9, that lim k → + ∞ s k = ¯ s ∈ ∂ h ( ¯ p ). On the other hand, due to the p oin t p k +1 b eing a solution of (31), it satisfies (33), whic h considering (13) implies that g ( p ) ≥ g ( p k +1 ) − ∥ s k ∥ B p k ,s k ( p ) − ∥ s k ∥ d ( p k , p k +1 ) ∀ p ∈ M . (37) Hence, by taking the inferior limit in (37), as k go es to + ∞ , and using the facts that lim k → + ∞ p k = ¯ p , lim k → + ∞ s k = ¯ s , lim k → + ∞ d ( p k , p k +1 ) = 0, the function g is lo wer semicon tinuous, and in voking Lemma 12, we conclude that g ( p ) ≥ g ( ¯ p ) − ∥ ¯ s ∥ B ¯ p, ¯ s ( p ) , ∀ p ∈ M . Hence, it follo ws from Theorem 15 that ¯ s ∈ ∂ g ( ¯ p ). Therefore, giv en that w e already kno w ¯ s ∈ ∂ h ( ¯ p ), it follo ws that ¯ s ∈ ∂ g ( ¯ p ) ∩ ∂ h ( ¯ p ). This confirms that ¯ p is a critical p oint of problem (29), thus completing the pro of. In ligh t of Prop osition 20, we can see that the quantit y d ( p k , p k +1 ) can b e interpreted as a measure of the criticality of p oin t p k . The following proposition provides a b ound on the iteration complexit y for this measure. Prop osition 23. Let ( p k ) k ∈ N b e generated b y Algorithm 2. Then, for all N ∈ N , there holds min k =0 , 1 ,...,N d ( p k , p k +1 ) ≤  2( ϕ ( p 0 ) − ϕ inf σ ( N + 1)  1 2 . Pr o of. F rom (35), we ha ve d 2 ( p k , p k +1 ) ≤ (2 /σ ) ( ϕ ( p k ) − ϕ ( p k +1 )), for all k ∈ N . Consequently , ( N + 1) min k =0 , 1 ,...,N  d 2 ( p k , p k +1 )  ≤ N X k =0 2 σ  ϕ ( p k ) − ϕ ( p k +1 )  ≤ 2 σ  ϕ ( p 0 ) − ϕ inf  , for all N ∈ N , where − ϕ inf < + ∞ is determined by (A2). Hence, the desired inequality follows. 6 Numerical exp erimen ts In this section we in vestigate the numerical p erformance of Algorithms 1 (CR–DCA) and 2 (B- DCA) on a collection of DC optimization problems p osed on the κ -h yp erbolic space H n κ and on the manifold of symmetric p ositiv e definite matrices P ( n ). In all exp erimen ts, the function g is differen tiable, and h is differen tiable; the solver uses closed-form expressions for the Riemannian gradien ts of the subproblem ob jectiv es (30) and (31). The empirical comparisons should b e read in light of the analytical guaran tees established for B-DCA in this work, in particular the geo desic con vexit y of the inner subproblem. Algorithms 1 and 2 were implemen ted in Matlab R2019a, and numerical experiments carried out in a Intel Core i5 1.8Ghz, 8 GB Ram, running MAC OS X 10.13.6. W e hav e used Manopt V ersion 8 [17], and the subproblems (30) and (31) w ere solv ed b y the trustregions solv er with its default options; among the solvers a v ailable in Manopt, this c hoice consistently exhibited the most stable b eha vior for these subproblems. The ob jectives w ere scaled by a factor γ = 1 / ( ∥ grad ϕ ( p 0 ) ∥ + 1) where p 0 ∈ M denotes the starting p oin t. As stopping criteria w e used ∥ grad ϕ ( p k ) ∥ ≤ ε or d ( p k +1 , p k ) ≤ ε , with ε = γ × 10 − 4 . Both algorithms are run with the same starting p oin t p 0 , scaling γ , stopping thresholds, and Manopt trustregions options. F or each instance w e rep ort the 18 n umber of outer iterations k , the num b er of inner iterations inn for solving the subproblems, the ratio inn / k , the final function v alue fval , the final scaled norm of the Riemannian gradien t grad , and, when a v ailable, the running time ( time ) in seconds. The pair ( fval , grad ) certifies solution qualit y; ( k , inn , inn /k ) measures solv er workload; time summarizes the net computational cost. Differences in k should b e interpreted together with inn / k . As will b e seen throughout this section, across all rep orted tests CR–DCA and B–DCA attain comparable solution quality , as certified b y the terminal ob jective v alues fval and gradient norms grad . The workload measures k , inn , and inn / k are of the same order for b oth metho ds within eac h problem family; when differences o ccur, they are mainly in the outer count k , while inn / k remains similar. When total run time time is rep orted, it aligns with these workload indicators and shows no systematic dominance. Overall, the tables indicate that B–DCA is consisten tly competitive, matching the accuracy of CR–DCA with comparable solver effort, which underscores its relev ance within the prop osed DC framew ork. The co des and data for the numerical exp erimen ts are av ailable at http://mtm.ufsc.br/ ~ douglas/downloads/BusemannDCA/ . 6.1 T est problems in h yp erb olic space Here we consider a Rosen bro c k-t yp e function in the κ -hyperb olic space form as the DC test problem. W e recall that basic results on H n κ w ere presented in Section 2.1, and prop erties of the Busemann functions in this Hadamard manifold were discussed in Example 3. Before stating the test problems, w e also show a useful result. Consider the function the function p 7→ d κ ( p, q ) := (1 / √ κ ) arcosh( − κ ⟨ p, q ⟩ ). Thus, using (5), w e hav e grad d 2 κ ( p, q ) = − 2 √ κd κ ( p, q ) p κ 2 ⟨ p, q ⟩ 2 − 1 ( q + κ ⟨ p, q ⟩ p ) , p  = q . (38) 6.1.1 DC problem with a θ -order hyperb olic Rosen bro c k-t yp e ob jective The Rosenbrock function is a classical b enc hmark in nonlinear optimization, kno wn for its nonconv ex structure and narrow, curved v alley that make it particularly challenging for iterative algorithms. Motiv ated by its role in the Euclidean setting, we introduce a family of intrinsic analogues defined on hyperb olic space, which preserv e the essen tial geometric and analytical features of the original function. These hyperb olic Rosenbrock-t yp e functions are form ulated as DC functions, making them well suited for ev aluating the p erformance of DC optimization metho ds in negatively curv ed geometries. W e provide closed-form DC decomp ositions, establish the geo desic con vexit y of the comp onen ts, and explicitly characterize the global minimizers. These constructions provide a prin- cipled extension of classical b enc hmarks to non-Euclidean settings, with applications to ev aluating optimization algorithms on manifolds. With this in mind, we now presen t a detailed discussion. F or clarity , our analysis fo cuses on the h yp erb olic space H n 1 =: H n . Let θ ≥ 1 be a fixed real exp onen t. Extending the classical Rosenbrock function w e in tro duce the θ -order ( a, b, θ )-family of hyperb olic Rosenbrock-t yp e function defined on H n b y f ( p ) :=  a − d ( p, ¯ p ) θ  2 + b  d ( p, ¯ q ) θ − d ( p, ¯ p ) 2 θ  2 , p ∈ H n , (39) where a, b > 0 with b ≫ a ≥ 1, and ¯ p, ¯ q ∈ H n are tw o fixed p oin ts satisfying the following conditions a 2 θ − a 1 θ ≤ d ( ¯ p, ¯ q ) ≤ a 2 θ + a 1 θ . (40) Observ e that f ( p ) = g ( p ) − h ( p ) for all p ∈ M , where the comp onen ts g and h are giv en by g ( p ) := a 2 + d ( p, ¯ p ) 2 θ + 2 b d ( p, ¯ q ) 2 θ + 2 b d ( p, ¯ p ) 4 θ , h ( p ) := 2 a d ( p, ¯ p ) θ + b  d ( p, ¯ p ) 2 θ + d ( p, ¯ q ) θ  2 . 19 By [46, Thm. 2.1, p. 111] we know that for any z ∈ H n and α ≥ 1 the map p 7→ d ( p, z ) α is geo desically conv ex. Hence the distance–p o wer terms d ( p, ¯ p ) θ , d ( p, ¯ p ) 2 θ , d ( p, ¯ p ) 4 θ , d ( p, ¯ q ) θ , and d ( p, ¯ q ) 2 θ are geo desically con vex. Moreov er, since t 7→ t 2 is conv ex and nondecreasing on [0 , ∞ ), the square of any nonnegative con vex function is con vex; in particular,  d ( p, ¯ p ) 2 θ + d ( p, ¯ q ) θ  2 is geo desically conv ex. By closure of geodesic con v exity under nonnegative scaling and addition, both g and h are geo desically conv ex, and therefore f = g − h is a DC function. Since f ( p ) ≥ 0 for all p ∈ H n and, b y (39), f in (39) is a sum of squares, an y p oin t p ∗ ∈ H n satisfying d ( p ∗ , ¯ p ) θ = a and d ( p ∗ , ¯ q ) θ = d ( p ∗ , ¯ p ) 2 θ = a 2 is a global minimizer and satisfies f ( p ∗ ) = 0. Equiv alen tly , any global minimizer p ∗ m ust satisfy d ( p ∗ , ¯ p ) = a 1 /θ , d ( p ∗ , ¯ q ) = a 2 /θ . Con versely , any p satisfying these equalities attains f ( p ) = 0 and is a global minimizer. No w, w e define the h yp erb olic sphere cen tered at ¯ p and ¯ q as follo ws S ( ¯ p, a 1 /θ ) := { p ∈ H n : d ( p, ¯ p ) = a 1 /θ } , S ( ¯ q , a 2 /θ ) := { p ∈ H n : d ( p, ¯ q ) = a 2 /θ } . Hence, every point p ∗ in the intersection S ( ¯ p, a 1 /θ ) ∩ S ( ¯ q , a 2 /θ ) is a global minimizer of f , with f ( p ∗ ) = 0. Condition (40) guaran tees that this in tersection is nonempty . F urthermore, we ha ve • If a 2 /θ − a 1 /θ < d ( ¯ p, ¯ q ) < a 2 /θ + a 1 /θ , then the in tersection contains infinitely many points. Note that if n = 2 then just tw o p oin ts. • If d ( ¯ p, ¯ q ) = a 2 /θ − a 1 /θ (in ternal tangency) or d ( ¯ p, ¯ q ) = a 2 /θ + a 1 /θ (external tangency), then the in tersection consists of a single p oin t. In fact, for internal tangency , choose ¯ p int = (sinh a 1 /θ , 0 , cosh a 1 /θ ) , ¯ q int = (sinh a 2 /θ , 0 , cosh a 2 /θ ) , (41) Using the hyperb olic iden tity sinh u sinh v − cosh u cosh v = − cosh( u − v ) , we obtain that ⟨ ¯ p int , ¯ q int ⟩ = − cosh( a 2 /θ − a 1 /θ ) , so d ( ¯ p int , ¯ q int ) = arcosh(cosh( a 2 /θ − a 1 /θ )) = a 2 /θ − a 1 /θ . Th us, S ( ¯ p int , a 1 /θ ) ∩ S ( ¯ q int , a 2 /θ ) = { p ∗ } . F or external tangency , define ¯ p ext and ¯ q ext to a new p oin t ¯ p ext = (sinh a 1 /θ , 0 , cosh a 1 /θ ) , ¯ q ext = ( − sinh a 2 /θ , 0 , cosh a 2 /θ ) . (42) Then, using the h yp erbolic iden tit y sinh u sinh v + cosh u cosh v = cosh( u + v ) , we conclude that the inner pro duct is ⟨ ¯ p ext , ¯ q ext ⟩ = − sinh a 1 /θ sinh a 2 /θ − cosh a 1 /θ cosh a 2 /θ = − cosh( a 1 /θ + a 2 /θ ) , and the hyperb olic distance b ecomes d ( ¯ p ext , ¯ q ext ) = arcosh ( −⟨ ¯ p ext , ¯ q ext ⟩ ) = arcosh  cosh( a 1 /θ + a 2 /θ )  = a 1 /θ + a 2 /θ . Th us, the intersection S ( ¯ p int , a 1 /θ ) ∩ S ( ¯ q int , a 2 /θ ) reduces again to a single p oin t, i.e., S ( ¯ p ext , a 1 /θ ) ∩ S ( ¯ q ext , a 2 /θ ) = { p ∗ } . Note that for θ > 1, the function h is contin uously differentiable on H n , whereas for θ = 1, it fails to b e differen tiable at the p oints p = ¯ p ext and p = ¯ q ext . 20 Gradien t of h : F or p  = ¯ p and p  = ¯ q , using the c hain rule and (38), the gradient of h is giv en by grad h ( p ) = − 2 aθ d ( p, ¯ p ) θ − 1 ⟨ p, ¯ p ⟩ p + ¯ p p ⟨ p, ¯ p ⟩ 2 − 1 − 4 bθ  d ( p, ¯ p ) 2 θ + d ( p, ¯ q ) θ  d ( p, ¯ p ) 2 θ − 1 ⟨ p, ¯ p ⟩ p + ¯ p p ⟨ p, ¯ p ⟩ 2 − 1 − 2 bθ  d ( p, ¯ p ) 2 θ + d ( p, ¯ q ) θ  d ( p, ¯ q ) θ − 1 ⟨ p, ¯ q ⟩ p + ¯ q p ⟨ p, ¯ q ⟩ 2 − 1 . (43) Ob jectiv e functions for subproblems (30) and (31) : Let s k := grad h ( p k ) for p k  = ¯ p, ¯ q . Then, • The ob jective function for the classical subproblem (30) b ecomes ψ k ( p ) = a 2 + d ( p, ¯ p ) 2 θ + 2 b d ( p, ¯ q ) 2 θ + 2 b d ( p, ¯ p ) 4 θ − arcosh( −⟨ p k , p ⟩ ) p ⟨ p k , p ⟩ 2 − 1 ⟨ s k , p ⟩ . • The ob jective function for the Busemann subproblem (31) b ecomes ϕ k ( p ) = a 2 + d ( p, ¯ p ) 2 θ + 2 b d ( p, ¯ q ) 2 θ + 2 b d ( p, ¯ p ) 4 θ + ∥ s k ∥ ln  − D p, p k + s k ∥ s k ∥ E . In b oth cases, the smo oth conv ex comp onen t g app ears explicitly , and the second comp onen t h is “linearized” either via the exp onen tial map or the Busemann approximation. Here, in test problems with the Rosenbrock-t yp e function we used θ = 1 and n = 2. Then, we consider tw o cases for c ho osing ¯ p, ¯ q ∈ H 2 : internal tangency and external tangency . F or the in ternal tangency case, we set a = 1 and b = 100 and tak e ¯ p, ¯ q as in (41). W e rep eated the test from fiv e random starting p oints in H 2 . F rom T able 1, b oth CR–DCA and B–DCA displa y essen tially identical b eha vior: their outer/inner iteration coun ts coincide up to small fluctuations, and the final ob jectiv e v alues and gradien t norms agree to n umerical accuracy . This parit y indicates that the Busemann modeling in tro duces no observ able penalty in this regime, B-DCA is comp etitiv e in b oth accuracy and p er-outer effort. While this problem class do es not separate the metho ds, B–DCA retains structural adv antages that can yield steadier progress on more demanding instances. CR-DCA B-DCA # k inn inn /k fv al grad k inn inn inn /k fv al grad 1 158 223 1.41 1.4E-12 9.8E-07 158 223 1.41 1.4E-12 9.8E-07 2 25 55 2.20 8.6E-02 2.8E-04 25 55 2.20 8.6E-02 2.8E-04 3 153 217 1.42 2.2E-12 9.4E-07 153 217 1.42 2.2E-12 9.4E-07 4 148 211 1.43 3.6E-12 9.3E-07 148 211 1.43 3.6E-12 9.3E-07 5 113 175 1.55 1.6E-09 1.2E-04 113 175 1.55 1.6E-09 1.2E-04 T able 1: Results for Rosenbrock-t yp e ob jectiv e: internal tangency case. F or the external tangency case, w e set a = 1 and b = 2 and take ¯ p, ¯ q as in (42). Using five random starting p oin ts in H 2 , T able 2 sho ws that B–DCA uses more outer iterations than CR–DCA, which translates in to a higher total inner effort on a verage (mean k : 7 . 42 × 10 3 vs. 6 . 25 × 10 3 ; mean inn : 7 . 67 × 10 3 vs. 6 . 43 × 10 3 ). Even so, the p er-outer cost inn / k is essen tially the same in four of the fiv e runs (all b et ween 1 . 00 and 1 . 02 for b oth metho ds), with B–DCA slightly low er in runs 1 and 4; only run 5 is unfav orable to B–DCA (1 . 81 vs. 1 . 49), which raises its a verage (mean inn / k : 1 . 17 vs. 1 . 11). The final ob jective v alues and gradient norms coincide to numerical precision in runs 3–4 and remain v ery close elsewhere, indicating indistinguishable solution qualit y . Overall, this table sho ws 21 that B–DCA is competitive in accuracy and per-outer effort, while CR–DCA is modestly cheaper in total iteration counts on this test. CR-DCA B-DCA # k inn inn /k fv al grad k inn inn /k fv al grad 1 7853 8025 1.02 6.4E-08 4.6E-06 8454 8555 1.01 9.0E-08 5.9E-06 2 4174 4238 1.02 5.3E-07 2.8E-05 4560 4659 1.02 7.4E-07 3.6E-05 3 8989 9012 1.00 2.5E-08 2.0E-06 11300 11317 1.00 2.5E-08 2.0E-06 4 9053 9125 1.01 2.1E-08 1.6E-06 11464 11486 1.00 2.1E-08 1.6E-06 5 1193 1773 1.49 6.5E-05 1.7E-03 1299 2347 1.81 9.1E-05 2.2E-03 T able 2: Results for Rosenbrock-t yp e ob jectiv e: external tangency case. T o close this subsection, w e note that across the θ -order h yp erb olic Rosenbrock tests (b oth in ternal and external tangency , fiv e random starts each), CR–DCA and B–DCA attain essen tially the same solution quality , as reflected by the final scaled ob jectiv e v alues and gradient norms. B– DCA tends to use more outer iterations on some instances, which can raise the total inner effort, y et its p er-outer cost ( inn / k ) remains comparable and is o ccasionally smaller, consistent with the geo desically conv ex inner models (with unique minimizers) that it solv es at eac h step. In scenarios where the inner solve dominates the computational cost, this structural feature can be beneficial. Ov erall, on this class of problems B–DCA is comp etitiv e and reliable in accuracy and p er-outer effort, while CR–DCA is mo destly cheaper in total iteration coun ts on certain runs. 6.2 T est problems in p ositiv e definite matrices space W e no w turn to the manifold of symmetric p ositive definite matrices P ( n ). Preliminaries on P ( n ) and explicit expressions for the Busemann functions and its Riemannian gradient are recalled in Section 2.2 and Example 4. W e consider tw o set of problems on P ( n ): the first is a syn thetic, academically orien ted example designed to isolate geometric effects; the second is a practically motiv ated instance with direct application app eal. The exp erimen tal setup follows the same proto col adopted earlier. 6.2.1 An academic example This academic test, also examined in the numerical exp eriments of [13, Sec. 7.1], is designed to v alidate the implementations of CR–DCA and B-DCA in a con trolled setting with kno wn global minimizers and to compare their b eha vior under the affine–in v arian t geometry of P ( n ). Because the ob jective dep ends only on log det X , the geometric effects app ear solely through the subproblems via log X k and d ( · , · ), yielding a clean b enc hmark. As the results in T able 3 indicate, b oth me thods reac h the global minimum with matc hing iteration counts, and B-DCA typically ac hieves low er run time as n increases, though the trend is not strictly monotonic. It is worth emphasizing that, in this setup, b oth subproblems are con vex; the con v exity of the classical subproblem is established in [13, Example 6.1(i)]. Th us the observed adv antage of B-DCA is due to computational effects, its Busemann mo del yields a well-conditioned tangen t-space solve, reducing costly ev aluations of log X k ( · ) and d ( · , · ) and enabling reuse of eigendecomp ositions across iterations, rather than to any con vexit y gap. W e first recall some notations. Throughout, ln denotes the scalar natural logarithm and e the scalar exp onen tial. W e consider the difference-of-conv ex (DC) ob jectiv e f = g − h on P ( n ) with g ( X ) =  ln det X  4 , h ( X ) =  ln det X  2 . 22 The global minim um v alue of f is f ⋆ = − 1 4 , attained whenever ln det X = ± 1 / √ 2 (e.g., at X ⋆ = e ± 1 / ( √ 2 n ) I n ). F or the initialization we take X 0 = ln( n ) I n + e 1 e ⊤ n + e n e ⊤ 1 , where e 1 := (1 , 0 . . . ) T ∈ R n × 1 and e n := (0 , . . . , 0 , 1) T ∈ R n × 1 , and consider dimensions n ∈ { 4 , 10 , 20 , 50 , 100 } . The stopping criteria and all shared hyperparameters follo w our general exp er- imen tal proto col. Gradien t of h : On P ( n ) the Riemannian gradient of h and its norm are given b y grad h ( X ) = 2lndet( X ) X, ∥ grad h ( X ) ∥ = 2 √ n | lndet( X ) | . Ob jectiv e functions for subproblems (30) and (31) : Let X k b e the current iterate. The ob jective function of the classical subproblem (30) reads ψ k ( X ) =  ln det X  4 − 2ln det( X k )ln det X det X k , and the ob jectiv e function in subproblem (31) is given b y ϕ k ( X ) :=  ln det X  4 + ∥ grad h ( X k ) ∥ B X k , grad h ( X k ) ( X ) , where we use the explicit formula (22) in Example 4 to compute the Busemann functions B X k , grad h ( X k ) . CR-DCA B-DCA n k inn time fv al grad k inn time fv al grad 4 13 25 0.49 -0.25 7.52E-07 13 25 0.43 -0.25 7.52E-07 10 16 43 0.53 -0.25 5.09E-07 16 43 0.34 -0.25 5.09E-07 20 16 49 0.11 -0.25 1.01E-06 16 49 0.08 -0.25 1.01E-06 50 16 55 0.26 -0.25 2.13E-06 16 55 0.14 -0.25 2.13E-06 100 16 59 1.03 -0.25 3.55E-06 16 59 0.46 -0.25 3.55E-06 T able 3: Results in the academic problem for CR-DCA and B-DCA. T able 3 rep orts the outcomes for CR–DCA and B–DCA. W e observe that b oth algorithms displa y identical outer/inner iteration counts across all tested dimensions and attain the same ob jective v alue f ⋆ = − 0 . 25 to the rep orted precision, with final gradien t norms ≤ 3 . 6 × 10 − 6 . As n increases from 4 to 100, the n um b er of outer iterations stabilizes at k = 16, while the inner iterations gro w mo derately (from 25 to 59). In terms of run time, B–DCA is consistently faster, with appro ximate reductions of 12%, 36%, 27%, 46%, and 55% for n = 4 , 10 , 20 , 50 , 100, resp ectiv ely . The adv antage tends to increase with dimension (though not strictly monotonically), indicating a lo wer per–iteration ov erhead in the Busemann-subgradient subproblem on P ( n ). 6.2.2 Con trastive learning via DC optimization in P ( n ) In this section we consider a con trastive DC optimization mo del on the manifold of symmetric p ositiv e definite matrices P ( n ). Given disjoint sets of reference p oin ts P = { ¯ X 1 , . . . , ¯ X m } ⊂ P ( n ) and N = { ¯ Y 1 , . . . , ¯ Y r } ⊂ P ( n ), the goal is to select X ∈ P ( n ) that is close to P and far from N , 23 where proximit y is measured by the affine–in v ariant geo desic distance d ( · , · ) defined in (7). The SPD c ontr astive obje ctive is f ( X ) := m X i =1 λ + i d 2  X , ¯ X i  − r X j =1 λ − j d 2  X , ¯ Y j  , X ∈ P ( n ) , (44) with fixed weigh ts λ + i , λ − j > 0. Since X 7→ d 2 ( X , ¯ Z ) is geo desically strongly conv ex on P ( n ) for eac h fixed ¯ Z , f admits the DC splitting g ( X ) := m X i =1 λ + i d 2  X , ¯ X i  , h ( X ) := r X j =1 λ − j d 2  X , ¯ Y j  , (45) so that f = g − h . This ob jectiv e enco des the contrastiv e principle: g promotes proximit y to p ositiv es, while h penalizes proximit y to negativ es. Such formulations are natural when data are represen ted b y cov ariance or k ernel matrices, e.g., in signal processing, computer vision, and Rie- mannian manifold learning; see [5, 45]. W e use this example to contrast the inner mo dels underlying CR–DCA and B–DCA on P ( n ). The first–order mo del of h used b y CR–DCA does not, in general, preserv e geo desic conv exit y under the affine–in v ariant metric. In B–DCA, h is replaced b y a Busemann-based surrogate; under our con ven tion that the Busemann functions is concav e (hence − B is conv ex), the inner ob jective is geo desically conv ex. Accordingly , the reported performance differences should be read through this con vex B–DCA v ersus nonconv ex CR–DCA contrast and its computational implications (condition- ing and reuse of eigendecomp ositions). Our ob jectives are tw ofold: (i) to v erify that C R–DCA and B–DCA compute meaningful con- trastiv e representativ es X ⋆ ∈ P ( n ) with comparable solution qualit y; and (ii) to compare their n umerical b ehavior under the affine–in v ariant geometry . W e rep ort final ob jectiv e v alues, Rieman- nian gradient norms, iteration counts, and run time, emphasizing the impact of the inner–subproblem mo del on efficiency . The exp erimen ts use n = 10 with unit w eights λ + i = λ − j = 1 and follo w the stopping criteria and shared hyperparameters of our general exp erimen tal proto col. Ob jectiv e functions for subproblems (30) and (31) . Let S k := grad h ( X k ), where X k ∈ P ( n ) is the current iterate. Then, the subproblem ob jectives are: • F or the classical DC subproblem (30), the ob jective b ecomes: Ψ k ( X ) := X i ∈ P λ + i d ( X , ¯ X i ) 2 − ⟨ S k , log X k X ⟩ , (46) where the inner pro duct is as in (6) and log X k X is given b y (9). • F or the Busemann-regularized DC subproblem (31), the ob jective b ecomes: Φ k ( X ) := X i ∈ P λ + i d ( X , ¯ X i ) 2 + ∥ S k ∥ B P k ,S k ( X ) , (47) where we use the explicit formula (22) in Example 4 to compute the Busemann functions B P k ,S k ( X ) 24 CR-DCA B-DCA # k inn inn/ k fv al grad k inn inn/ k fv al grad 1 7 13 1.86 0.1663 1.14E-05 8 16 2.00 0.1663 6.74E-06 2 7 14 2.00 0.1981 1.08E-05 8 15 1.88 0.1981 4.88E-06 3 7 14 2.00 0.1520 1.13E-05 8 15 1.88 0.1520 4.98E-06 4 7 14 2.00 0.1825 1.12E-05 8 16 2.00 0.1825 6.62E-06 5 7 13 1.86 0.2071 1.07E-05 8 15 1.88 0.2071 4.98E-06 6 7 14 2.00 0.2252 1.06E-05 8 15 1.88 0.2252 4.95E-06 7 7 13 1.86 0.1960 1.09E-05 8 15 1.88 0.1960 5.81E-06 8 7 14 2.00 0.1762 1.13E-05 8 16 2.00 0.1762 6.10E-06 9 7 14 2.00 0.1686 1.11E-05 8 15 1.88 0.1686 4.85E-06 10 7 13 1.86 0.1918 1.10E-05 8 15 1.88 0.1918 5.78E-06 T able 4: Results for contrastiv e learning in P (5), with m = 5 and r = 1. T able 4 summarizes ten runs with random initializations in P (5) with m = 5 and r = 1. Both metho ds attain identical ob jectiv e v alues across all restarts, confirming comparable solution qualit y . A clear difference appears in stationarity: B–DCA yields strictly smaller final Riemannian gradien t norms in ev ery trial (mean 5 . 569 × 10 − 6 ) than CR–DCA (mean 1 . 103 × 10 − 5 ), i.e., an almost t w ofold reduction. This tigh ter stationarit y can naturally en tail a sligh tly larger ov erall inner effort: B–DCA uses one additional outer iteration ( k = 8 versus k = 7) and, consequently , a higher av erage total of inner iterations (means 15 . 3 versus 13 . 6). At the same time, the inner burden p er outer step is comparable, if an ything, marginally lo wer for B–DCA, as reflected by the av erage inn/ k ratios (1 . 9125 for B–DCA v ersus 1 . 9429 for CR–DCA) and medians (15 vs. 14). Overall, this exp eriment highligh ts B–DCA’s adv an tage of achieving stronger first–order stationarity while k eeping the inner w orkload p er outer step essentially on par, consisten t with the con vexit y of its inner mo del. CR-DCA B-DCA # k inn inn/ k fv al grad k inn inn/ k fv al grad 1 51 91 1.78 -0.6747 2.03E-05 83 129 1.55 -0.6747 2.12E-05 2 50 91 1.82 -0.6587 2.14E-05 81 125 1.54 -0.6587 1.99E-05 3 49 87 1.78 -0.7142 2.27E-05 80 124 1.55 -0.7142 2.27E-05 4 50 90 1.80 -0.6460 1.82E-05 78 122 1.56 -0.6460 2.06E-05 5 53 96 1.81 -0.6116 1.72E-05 85 136 1.60 -0.6116 1.97E-05 6 52 94 1.81 -0.6051 1.73E-05 83 130 1.57 -0.6051 1.85E-05 7 49 85 1.73 -0.6570 1.80E-05 71 112 1.58 -0.6570 1.99E-05 8 53 97 1.83 -0.5740 1.79E-05 88 140 1.59 -0.5740 1.87E-05 9 53 96 1.81 -0.5892 1.73E-05 87 139 1.60 -0.5892 1.93E-05 10 49 88 1.80 -0.6763 2.21E-05 75 118 1.57 -0.6763 2.13E-05 T able 5: Results for contrastiv e learning in P (5), with m = 5 and r = 4. T able 5 summarizes ten runs with random initializations in P (5) with m = 5, r = 4. Both metho ds deliv er indistinguishable solution quality: ob jective v alues matc h entrywise, and the final Riemannian gradien t norms are of the same order ( ≈ 2 × 10 − 5 ) for b oth algorithms. The principal con trast lies in efficiency . B–DCA requires more outer iterations (mean 81 . 1 vs. 50 . 9 for CR–DCA), but its inner subproblems are consistently easier: the inner-p er-outer ratio inn /k is strictly smaller for B–DCA in all ten trials (av erage 1 . 572 vs. 1 . 797, ab out 12 . 5% low er). The pattern is precisely what the mo deling predicts: B–DCA’s con vex Busemann-based inner model yields cheaper inner solv es (and facilitates eigendecomp osition reuse), so the extra outer iterations are offset b y a reduced p er-iteration burden. 25 T o close this section, w e syn thesize the evidence across the three groups of tests: (i) the academic b enc hmark on P ( n ), (ii) the contrastiv e learning tasks on P ( n ), and (iii) the θ -order h yp erb olic Rosen bro c k problems on H 2 . Although the surrogate ob jectives in (24) and (25) are analytically distinct, they track each other closely along the observed iterates, which plausibly explains the near-iden tical solution quality ac hieved b y CR–DCA and B–DCA in all exp eriments (final ob jectiv e v alues and first-order stationarit y). The main practical difference lies in the structure of the inner problems: the Busemann mo del (31) is geo desically con vex, and this often translates into steadier progress and a comparable or low er p er-outer effort ( inn / k ), even in cases where B–DCA uses more outer iterations. On P ( n ), the academic test shows matching iteration counts and solution quality , with B–DCA frequen tly exhibiting fav orable run time as n increases, consistent with concentrating the geometry in w ell-conditioned tangen t-space solves (fewer exp ensiv e ev aluations of log X k ( · ) and d ( · , · ) and b etter reuse of eigendecomp ositions). In the contrastiv e learning exp erimen ts, the metho ds again deliv er essen tially the same accuracy; B–DCA often requires more outer steps but displa ys lo wer inn / k and more regular progress across restarts, an adv an tage that b ecomes relev ant when inner solves dominate the computational budget. F or the hyperb olic Rosenbr o ck family , the internal tangency tests show virtually identical b ehavior, and the external tangency tests are mixed: CR–DCA is sometimes c heap er in total counts, whereas the p er-outer cost remains comparable and the attained solutions are indistinguishable in quality , so B–DCA remains comp etitiv e. T aken together, these results presen t CR–DCA as a strong baseline and B–DCA as a comp etitive alternativ e that can b e particularly attractiv e when (i) inner subproblem cost dominates runtime, (ii) n umerical stabilit y of the inner model matters, or (iii) one wishes to exploit the geo desic conv exit y in (31) for steadier progress. Beyond these exp erimen ts, the Busemann mo deling and the side-by-side comparison of (24) and (25) introduce new ideas into DC optimization on manifolds and suggest a ven ues to refine conv ergence theory for DCA-type schemes. 7 Conclusions In summary , this pap er in vestigates fundamental prop erties of Busemann functions on Hadamard manifolds and highlights their role in the design and analysis of optimization algorithms on Rie- mannian manifolds. This is ac hieved through the introduction of a nov el Busemann-based char- acterization of the classical sub differen tial in Riemannian optimization. The prop osed approach addresses challenges arising from noncon vex subproblem functions on Hadamard manifolds, which commonly o ccur in classical difference-of-conv ex algorithms. By replacing the inner pro duct with Busemann functions, the resulting reform ulation guaran tees the strong con vexit y of the subprob- lem functions, thereby improving b oth optimization p erformance and algorithmic efficiency in the Hadamard manifold setting. W e are confident that the characterization dev elop ed here for DC problems will also b e useful in broader areas of contin uous optimization. F or instance, in an up coming pap er, we in tend to employ it to dev elop algorithms utilizing the Bregman distance concept. T o illustrate, let us explore ho w we can establish the Bregman distance concept using the Busemann functions, which forms the basis of our discussion: Let M b e a Hadamard manifold and S ⊆ M b e an op en and con vex set and ¯ S its closure. Consider a pr op er c onvex r e al function ψ : ¯ S → R ∪ { + ∞} , whic h is differentiable on the set S , and let D ψ ( · , · ) : ¯ S × S → R ∪ { + ∞} b e a function asso ciated to ψ defined b y D ψ ( p, q ) := ψ ( p ) − ψ ( q ) + ∥ grad ψ ( q ) ∥ B q , grad ψ ( q ) ( p ) , (48) where B q , grad ψ ( q ) is the Busemann functions with a base p oin t q ∈ M and asso ciated direction grad ψ ( q ) ∈ T q M . T o introduce the subsequent definition, let us define the notation for the p artial 26 level sets of D ψ as follows: for an y α ∈ R , consider L 1 ( α, q ) := { p ∈ ¯ S : D ψ ( p, q ) ≤ α } , L 2 ( p, α ) := { q ∈ S : D ψ ( p, q ) ≤ α } . The extension of the Bregman distance concept to Hadamard manifolds utilizing a Busemann functions is as follows: Definition 24. The function ψ is called a Br e gman-Busemann function and D ψ a Br e gman- Busemann distanc e induc e d by ψ if the follo wing conditions hold: ( i ) ψ is contin uously differentiable on S ; ( ii ) ψ is strictly conv ex and contin uous on ¯ S ; ( iii ) F or all α ∈ R the partial lev el sets L 1 ( α, q ) and L 2 ( p, α ) are b ounded for every q ∈ S and p ∈ ¯ S , resp ectively . ( iv ) If ( q k ) k ∈ N ⊂ S con verges to ¯ q then D ψ ( ¯ q , q k ) conv erges to 0. ( v ) If ( p k ) k ∈ N ⊂ ¯ S and ( q k ) k ∈ N ⊂ S are sequences such that ( p k ) k ∈ N is bounded, lim q k = ¯ q , and lim k →∞ D ψ ( p k , q k ) = 0, then lim p k = ¯ q . W e pro ceed with some comments regarding Definition 24. The set S is referred to as the zone of ψ . Utilizing (13), we deduce that ( iv ) and ( v ) hold when ¯ q ∈ S , as a consequence of (i), (ii), and (iii), thereby necessitating their verification solely at p oin ts on the b oundary ∂ S of S . In the follo wing prop osition, we demonstrate that the Bregman-Busemann distance is indeed a conv ex distance in the first co ordinate. Prop osition 25. Let ψ b e a Bregman-Busemann function with zone S and D ψ b e the Bregman- Busemann distance induced by ψ . Then, the following statements hold: (i) D ψ ( p, q ) ≥ 0, for all p ∈ ¯ S and q ∈ S ; (ii) D ψ ( · , q ) : ¯ S → R ∪ { + ∞} is strictly con vex, for all q ∈ S ; Pr o of. If ψ and D ψ satisfy Definition 24, then it follows from Theorem 15 that function (48) satisfies item ( i ). It follows from Lemma 10 that B q , grad ψ ( q ) is conv ex. Thus, by using item ( ii ) of Definition 24 the pro of of item ( ii ) follows. The Bregman distance, in tro duced as a fundamental concept in [19], has b een extensively studied in the Euclidean con text, as evidenced b y v arious references including [6, 22, 27, 34]. The idea of dev eloping the Bregman-Busemann distance is inspired b y Example 1, which illustrates how our Definition 24 serves as a natural extension of the established concept of the Bregman distance in Euclidean spaces. F urthermore, it is noteworth y that the concept of Bregman functions has b een previously in tro duced in the con text of Hadamard manifolds, utilizing the function (24) in its definition, as men tioned in [40, 39]. How ev er, this definition results in a Bregman function that lac ks con vexit y in the first co ordinate. This limitation has been ackno wledged in the pap ers [40, 39], where it led to erroneous outcomes, thereby restricting its applicability . It is imp ortan t to highlight that the introduction of the Bregman-Busemann distance in Definition 24 addresses this limitation, as demonstrated in Prop osition 25. 27 8 App endix App endix 1. Busemann functions on hyb erb olic sp ac e in Example 3: T o simplify the notations we set α ( t ) := − κ  p, exp q ( tv )  with v  = 0. Since ⟨ p, q ⟩ ≤ − 1 /κ , for all p, q ∈ H n κ , w e hav e α ( t ) ≥ 1. Th us, arcosh( α ( t )) = ln  α ( t ) + p α ( t ) 2 − 1  , or equiv alently , arcosh( α ( t )) = ln ( α ( t )(1 + β ( t ))) , where β ( t ) :=  1 − (1 /α ( t )) 2  1 / 2 . (49) By using (4) we obtain that d κ ( p, exp q ( tv )) = (1 / √ κ )arcosh( α ( t )). Hence, taking into account that (1 / √ κ ) ln e − √ κ ∥ v ∥ t = −∥ v ∥ t , it follo ws from (49) that d κ ( p, exp q ( tv )) − ∥ v ∥ t = 1 √ κ ln  e − √ κ ∥ v ∥ t α ( t ) (1 + β ( t ))  . (50) Due to α ( t ) := − κ  p, exp q ( tv )  , b y the definitions of cosh and sinh we obtain that α ( t ) = − κ 1 2  e √ κ ∥ v ∥ t + e − √ κ ∥ v ∥ t  ⟨ p, q ⟩ − √ κ 1 2  e √ κ ∥ v ∥ t − e − √ κ ∥ v ∥ t  1 ∥ v ∥ ⟨ p, v ⟩ = − 1 2 e √ κ ∥ v ∥ t  κ  1 + e − 2 √ κ ∥ v ∥ t  ⟨ p, q ⟩ + √ κ  1 − e − 2 √ κ ∥ v ∥ t  1 ∥ v ∥ ⟨ p, v ⟩  . Hence, we ha ve lim t → + ∞ α ( t ) 2 = + ∞ , lim t →∞ e − √ κ ∥ v ∥ t α ( t ) = − 1 2  κ ⟨ p, q ⟩ + √ κ 1 ∥ v ∥ ⟨ p, v ⟩  and lim t →∞ β ( t ) = 1. Thus, we conclude from (50) that lim t → + ∞  d  p, exp q ( tv )  − ∥ v ∥ t  = 1 √ κ ln  −  p, κ q + √ κ 1 ∥ v ∥ v  . Therefore, the last equality together with (12) gives (20) the desired equality . No w, w e are going to compute the gradient of Busemann functions. It follows from (5) that grad B q ,v ( p ) := Pro j κ p J B ′ q ,v ( p ) = J B ′ q ,v ( p ) + κ  J B ′ q ,v ( p ) , p  p. (51) T o simplify the notations w e set B q ,v ( p ) := (1 / √ κ ) ln( η ( p )), where η ( p ) := − D p, κ q + √ κ 1 ∥ v ∥ v E . Th us, taking the Euclidean deriv ative we ha ve B ′ q ,v ( p ) = 1 √ κ η ′ ( p ) η ( p ) , η ′ ( p ) = − J  κ q + √ κ 1 ∥ v ∥ v  . (52) Substituting the last equalities (52) into (51), after some algebraic manipulations, we obtain that grad B q ,v ( p ) = 1 √ κ 1 η ( p )  J η ′ ( p ) + κ  J η ′ ( p ) , p  p  . Substituting (52) in to the last equalit y and considering that JJ = I , w e obtain (21). Note that some calculations sho w that (21) implies that ∥ grad B q ,v ( p ) ∥ = 1, as stated in Lemma 10. Particularly , if p = q , then due to ⟨ q , q ⟩ = − 1 /κ and ⟨ p, v ⟩ = 0, the final equation simplifies to grad B q ,v ( q ) = − 1 √ κ  κ q + √ κ 1 ∥ v ∥ v − κq  = − 1 ∥ v ∥ v , whic h is in accordance with the last statement of Lemma 10. 28 App endix 2. W e now presen t the detailed computation of the explicit formula giv en in Example 4 for the Busemann functions and its gradient on the manifold of symmetric positive definite matrices. It is w orth noting that this computation do es not require any prior knowledge of the theory of symmetric spaces. T o this end, we first pro ve tw o auxiliary lemmas. Lemma 26. T ake X ∈ P ( n ) and V ∈ S ( n ). If X comm utes with V , then B I ,V ( X ) = − 1 ∥ V ∥ tr ( V Log ( X )) . Pr o of. Since X commutes with V , applying Prop osition 13 and equation (7), and after some alge- braic manipulations, we obtain B I ,V ( X ) = 1 2 ∥ V ∥ lim t → + ∞ d 2 (Exp( tV ) , X ) − ( t ∥ V ∥ ) 2 t = 1 2 ∥ V ∥ lim t → + ∞ 1 t tr  Log( X − 1 / 2 Exp( tV ) X − 1 / 2 )  2 − ( tV ) 2  = 1 2 ∥ V ∥ lim t → + ∞ 1 t tr  Log( X − 1 / 2 Exp( tV ) X − 1 / 2 ) − tV  Log( X − 1 / 2 Exp( tV ) X − 1 / 2 ) + tV  . Since X comm utes with V , w e ha ve Log( X − 1 / 2 Exp( tV ) X − 1 / 2 ) = Log(Exp( tV )) − Log( X ). Th us, the last equality can b e written as B I ,V ( X ) = 1 2 ∥ V ∥ lim t → + ∞ 1 t tr ([Log (Exp( tV )) − Log ( X ) − tV ] [Log (Exp( tV )) − Log ( X ) + tV ]) = 1 2 ∥ V ∥ lim t → + ∞ 1 t tr ([ − Log ( X )] [2 tV − Log ( X )]) = − 1 ∥ V ∥ tr ( V Log ( X )) , whic h is the desired equalit y . Let L b e the low er triangular matrix given in Example 4. F or our analysis it is con v enient to use the decomp osition of the form L = W Z , with W :=      I n 1 0 · · · 0 W 21 I n 2 · · · 0 . . . . . . . . . . . . W k 1 W k 2 · · · I n k      Z :=      L n 1 0 · · · 0 0 L n 2 · · · 0 . . . . . . . . . . . . 0 0 · · · L n k      . (53) Lemma 27. Let D and L matrices as defined in Example 4, and W and Z matrices as defined in (53). Thus, the follo wing equalities hold: (i) lim t → + ∞ d  W − 1 Exp( tD )[ W T ] − 1 , Exp( tD )  = 0; (ii) lim t → + ∞ d  L − 1 Exp ( tD )  L T  − 1 , Exp ( tD )  = d ( I , Z Z T ). Pr o of. W e prov e item (ii) only , as item (i) is analogous. The second part of Lemma 5 implies that d  L − 1 Exp ( tD )  L T  − 1 , Exp ( tD )  = d  I , Exp − 1 / 2 ( tD ) L Exp( tD ) L T Exp − 1 / 2 ( tD )  . (54) T o simplify the notation set A ( t ) := Exp − 1 / 2 ( tD ) L Exp 1 / 2 ( tD ). Note that A ( t ) can b e written as A ( t ) :=       L n 1 0 · · · 0 e t ( λ 1 − λ 2 ) 2 L 21 L n 2 · · · 0 . . . . . . . . . . . . e t ( λ 1 − λ k ) 2 L k 1 e t ( λ 2 − λ k ) 2 L k 2 · · · L n k       . 29 Th us, equality (54) is equiv alent to d  L − 1 Exp ( tD )  L T  − 1 , Exp ( tD )  = d  I , A ( t ) A ( t ) T  . Since λ 1 < · · · < λ k , we can conclude that lim t → + ∞ A ( t ) = Z , whic h completes the proof. Busemann functions on P ( n ) in Example 4: Let the sp ectral and Cholesky factorizations b e Y − 1 / 2 V Y − 1 / 2 = U DU T , U T Y − 1 / 2 X Y − 1 / 2 U = LL T = W Z Z T W T , (55) with L = W Z . Since exp Y ( tV ) = Y 1 / 2 Exp  tY − 1 / 2 V Y − 1 / 2  Y 1 / 2 = Y 1 / 2 U Exp( tD ) U T Y 1 / 2 , Lemma 5 and (55) imply that d  exp Y ( tV ) , X  = d  Exp( tD ) , U T Y − 1 / 2 X Y − 1 / 2 U  = d  Exp( tD ) , W Z Z T W T  = d  W − 1 Exp( tD )[ W T ] − 1 , Z Z T  . (56) On the other hand, by applying the triangle inequality , w e obtain d  Exp( tD ) , Z Z T  − d  W − 1 Exp( tD )[ W T ] − 1 , Exp( tD )  ≤ d  W − 1 Exp( tD )[ W T ] − 1 , Z Z T  (57) ≤ d  Exp( tD ) , Z Z T  + d  W − 1 Exp( tD )[ W T ] − 1 , Exp( tD )  . Adding −∥ tV ∥ to ev ery term in (57) and using item (i) of Lemma 27 w e obtain that lim t →∞  d ( W − 1 Exp( tD )[ W T ] − 1 , Z Z T ) − ∥ tV ∥  = lim t →∞  d (Exp( tD ) , Z Z T ) − ∥ tV ∥  . Hence, by (12) and (56), and using the fact that ∥ tV ∥ = ∥ tD ∥ I , we hav e B Y ,V ( X ) = lim t →∞  d (exp Y ( tV ) , X ) − ∥ tV ∥  = lim t →∞  d (Exp( tD ) , Z Z T ) − ∥ tD ∥ I  = B I ,D ( Z Z T ) . Th us, since Z Z T comm utes with D , Lemma 26 gives B Y ,V ( X ) = − 1 ∥ D ∥ I tr  D Log( Z Z T )  . (58) T o conclude the pro of, it remains to show that the right-hand side ab o ve coincides with the righ t- hand side of (22). F or that, first observ e that D Log  Z Z T  =      λ 1 Log  L n 1 L T n 1  0 · · · 0 0 λ 2 Log  L n 2 L T n 2  · · · 0 . . . . . . . . . . . . 0 0 · · · λ k Log  L n k L T n k       , and therefore tr  D Log( Z Z T )  = k X i =1 λ i tr  Log( L n i L T n i )  = k X i =1 λ i ln  det  L n i L T n i   = 2 k X i =1 λ i α i X j = α i − 1 +1 ln  ( L ) j j  . Com bining the last equalit y with (58) and taking into accoun t that ∥ D ∥ I =  n 1 λ 2 1 + · · · + n k λ 2 k  1 / 2 , w e obtain the desired equalit y . 30 Gr adient of Busemann functions on P ( n ) in Example 4: Since ¯ U := L T [ X − 1 / 2 Y 1 / 2 U ] T is an or- thogonal matrix, it follows from (6), (8), and (9) that ⟨ log X (exp Y ( tV )) , K ⟩ X = tr  X − 1 / 2 Log  X − 1 / 2 Y 1 / 2 Exp  tU DU T  Y 1 / 2 X − 1 / 2  X − 1 / 2 K  = tr  Log   X − 1 / 2 Y 1 / 2 U  Exp ( tD )  X − 1 / 2 Y 1 / 2 U  T  X − 1 / 2 K X − 1 / 2  = tr  Log  ¯ U T L − 1 Exp ( tD )  L T  − 1 ¯ U  X − 1 / 2 K X − 1 / 2  = tr  ¯ U T Log  L − 1 Exp ( tD )  L T  − 1  ¯ U X − 1 / 2 K X − 1 / 2  = tr  Log  L − 1 Exp ( tD )  L T  − 1  ¯ U X − 1 / 2 K X − 1 / 2 ¯ U T  for all K ∈ S ( n ). Substituting K := − (1 / ∥ D ∥ I )  X 1 / 2 ¯ U T D ¯ U X 1 / 2  in to the ab ov e equality and p erforming some algebraic simplifications, w e obtain ⟨ log X (exp Y ( tV )) , K ⟩ X = − 1 ∥ D ∥ I tr  Log  L − 1 Exp ( tD )  L T  − 1  D  . Applying this identit y , together with further algebraic manipulations and the Cauc hy–Sc h warz inequalit y , w e conclude that lim t →∞ 1 t ⟨ log X (exp Y ( tV )) , K ⟩ X = 1 ∥ D ∥ I lim t →∞ 1 t tr h tD − Log  L − 1 Exp ( tD )  L T  − 1 i D  − ∥ D ∥ I ≤ 1 ∥ D ∥ I lim t →∞ 1 t    tD − Log  L − 1 Exp ( tD )  L T  − 1     I ∥ D ∥ I − ∥ D ∥ I = lim t →∞ 1 t    log I (Exp ( tD )) − log I  L − 1 Exp ( tD )  L T  − 1     I − ∥ D ∥ I . Using (3) and (9) together with item (ii) of Lemma 27 we obtain that lim t →∞ 1 t    log I (Exp ( tD )) − log I  L − 1 Exp ( tD )  L T  − 1     I ≤ lim t →∞ 1 t d  Exp ( tD ) , L − 1 Exp ( tD )  L T  − 1  = 0 . Hence, it follows from t wo last inequalities that lim t →∞ 1 t ⟨ log X (exp Y ( tV )) , K ⟩ X ≤ −∥ D ∥ I . Therefore, applying Lemma 10 and taking into accoun t that ∥ V ∥ = ∥ D ∥ I and ∥ grad B Y ,V ( X ) ∥ X = 1 = ∥ K ∥ X , we obtain ∥ grad B Y ,V ( X ) − K ∥ 2 = ∥ grad B Y ,V ( X ) ∥ 2 X − 2 ⟨ grad B Y ,V ( X ) , K ⟩ X + ∥ K ∥ 2 X = 2 − 2 ⟨ grad B Y ,V ( X ) , K ⟩ X = 2 + 2 ∥ D ∥ I lim t →∞ 1 t ⟨ log X (exp Y ( tV )) , K ⟩ X ≤ 0 , whic h completes the pro of of (23), since ∥ D ∥ I =  n 1 λ 2 1 + · · · + n k λ 2 k  1 / 2 . 31 References [1] P .-A. Absil, R. Mahony , and R. Sepulchre. Optimization algorithms on matrix manifolds . Princeton Univ ersity Press, Princeton, NJ, 2008. With a foreword by P aul V an Dooren. [2] F. J. Arag´ on Artacho and P . T. V uong. The b oosted difference of conv ex functions algorithm for nonsmo oth functions. SIAM J. Optim. , 30(1):980–1006, 2020. [3] W. Ballmann. L e ctur es on Sp ac es of Nonp ositive Curvatur e , volume 25 of DMV Seminar . Birkh¨ auser, Basel, 1995. [4] W. Ballmann, M. Gromo v, and V. Schroeder. Manifolds of nonp ositive curvatur e , v olume 61 of Pr o g. Math. Birkh¨ auser, Cham, 1985. [5] Y. Bao, R. W ang, T. Xu, X. W u, and J. Kittler. SymCL: Riemannian con trastive learning on the symmetric p ositiv e definite manifold for visual classification, 2024. [6] H. H. Bausc hke and J. M. Borwein. Legendre functions and the metho d of random Bregman pro jections. J. Convex Anal. , 4(1):27–67, 1997. [7] R. Benedetti and C. P etronio. L e ctur es on hyp erb olic ge ometry . Univ ersitext. Springer-V erlag, Berlin, 1992. [8] G. C. Ben to, O. P . F erreira, and P . R. Oliv eira. Local con vergence of the proximal p oin t metho d for a special class of noncon vex functions on Hadamard manifolds. Nonline ar A nal., The ory Metho ds Appl., Ser. A, The ory Metho ds , 73(2):564–572, 2010. [9] G. C. Bento, O. P . F erreira, and P . R. Oliveira. Proximal p oint metho d for a sp ecial class of noncon vex functions on Hadamard manifolds. Optimization , 64(2):289–319, 2015. [10] G. d. C. Bento, J. a. Cruz Neto, and I. D. L. Melo. F enchel conjugate via Busemann function on Hadamard manifolds. Appl. Math. Optim. , 88(3):Paper No. 83, 29, 2023. [11] G. d. C. Ben to, J. X. Cruz Neto, and ´ I. D. L. Melo. Combinatorial con v exity in Hadamard manifolds: existence for equilibrium problems. J. Optim. The ory Appl. , 195(3):1087–1105, 2022. [12] G. d. C. Ben to, J. X. C. Neto, J. O. Lop es, ´ I. D. L. Melo, and P . da Silv a Filho. A new approac h ab out equilibrium problems via Busemann functions. J. Optim. The ory Appl. , 200(1):428–436, 2024. [13] R. Bergmann, O. F erreira, E. M. Santos, and J. C. O. Souza. The difference of con vex algorithm on Hadamard manifolds. J. Optim. The ory Appl. , 201(1):221–251, 2024. [14] C. Bonet, L. Chap el, L. Drumetz, and N. Court y . Hyp erb olic sliced-Wasserstein via geo desic and horospherical pro jections. In T op olo gic al, A lgebr aic and Ge ometric L e arning Workshops 2023 , pages 334–370. PMLR, 2023. [15] C. Bonet, B. Mal´ ezieux, A. Rakotomamonjy , L. Drumetz, T. Moreau, M. Kow alski, and N. Court y . Sliced-Wasserstein on symmetric positive definite matrices for m/eeg signals. In International Confer enc e on Machine L e arning , pages 2777–2805. PMLR, 2023. [16] N. Boumal. An intr o duction to optimization on smo oth manifolds . Cam bridge Univ ersity Press, Cam bridge, 2023. 32 [17] N. Boumal, B. Mishra, P .-A. Absil, and R. Sepulchre. Manopt, a Matlab to olb o x for optimiza- tion on manifolds. Journal of Machine L e arning R ese ar ch , 15(42):1455–1459, 2014. [18] N. Bourbaki. Gener al T op olo gy . Springer Berlin Heidelb erg, 1955. [19] L. M. Bregman. The relaxation metho d of finding the common point of con vex sets and its application to the solution of problems in con vex programming. Zh. Vychisl. Mat. Mat. Fiz. , 7:620–631, 1967. [20] M. R. Bridson and A. Haefliger. Metric sp ac es of non-p ositive curvatur e , volume 319 of Grund lehr en der mathematischen Wissenschaften [F undamental Principles of Mathematic al Scienc es] . Springer-V erlag, Berlin, 1999. [21] H. Busemann. The ge ometry of ge o desics , v olume 6 of Pur e Appl. Math., A c ademic Pr ess . Academic Press, New Y ork, NY, 1955. [22] G. Chen and M. T eb oulle. Conv ergence analysis of a proximal-lik e minimization algorithm using Bregman functions. SIAM J. Optim. , 3(3):538–543, 1993. [23] C. Criscitiello and J. Kim. Horospherically conv ex optimization on Hadamard manifolds. Part I: Analysis and algorithms. arXiv , 2505:16970, 2025, 2505.16970. Av ailable at https://arxiv. org/abs/2505.16970 . [24] J. X. da Cruz Neto, O. P . F erreira, and L. R. Lucambio P´ erez. Monotone p oin t-to-set v ector fields. Balkan J. Ge om. Appl. , 5(1):69–79, 2000. [25] J. X. da Cruz Neto, O. P . F erreira, and L. R. Lucam bio P´ erez. Contributions to the study of monotone vector fields. A cta Math. Hung. , 94(4):307–320, 2002. [26] W. de Oliveira. Sequen tial difference-of-conv ex programming. J. Optim. The ory Appl. , 186(3):936–959, 2020. [27] A. R. de Pierro and A. N. Iusem. A relaxed version of Bregman’s metho d for con vex program- ming. J. Optim. The ory Appl. , 51:421–440, 1986. [28] M. P . do Carmo. Riemannian ge ometry . Mathematics: Theory & Applications. Birkh¨ auser Boston Inc., Boston, MA, 1992. T ranslated from the second Portuguese edition by F rancis Flahert y . [29] O. P . F erreira, M. S. Louzeiro, and L. F. Pruden te. Iteration-complexity of the subgradien t metho d on Riemannian manifolds with lo wer bounded curv ature. Optimization , 68(4):713–729, 2019. [30] O. P . F erreira and P . R. Oliveira. Proximal p oin t algorithm on Riemannian manifolds. Opti- mization , 51(2):257–270, 2002. [31] M. Ghadimi Atigh, M. Keller-Ressel, and P . Mettes. Hyp erb olic Busemann learning with ideal protot yp es. A dvanc es in Neur al Information Pr o c essing Systems , 34:103–115, 2021. [32] A. Go odwin, A. S. Lewis, G. L´ opez-Acedo, and A. Nicolae. A subgradient splitting algorithm for optimization on nonp ositively curved metric spaces. arXiv , 2412:06730, 2024, 2412.06730. Av ailable at . 33 [33] H. Hirai. Con vex analysis on Hadamard spaces and scaling problems. F oundations of Compu- tational Mathematics , pages 1–38, 2023. [34] K. C. Kiwiel. Proximal minimization metho ds with generalized Bregman functions. SIAM J. Contr ol Optim. , 35(4):1142–1168, 1997. [35] A. Krist´ aly , C. Li, G. L´ opez-Acedo, and A. Nicolae. What do ‘conv exities’ imply on Hadamard manifolds? J. Optim. The ory Appl. , 170(3):1068–1074, 2016. [36] A. S. Lewis, G. Lop ez-Acedo, and A. Nicolae. Horoballs and the subgradient metho d. arXiv pr eprint arXiv:2403.15749 , 2024. [37] C. Li, B. S. Mordukhovic h, J. W ang, and J.-C. Y ao. W eak sharp minima on Riemannian manifolds. SIAM J. Optim. , 21(4):1523–1560, 2011. [38] Y. E. Nesterov and M. J. T o dd. On the Riemannian geometry defined b y self-concordan t barriers and interior-point metho ds. F ound. Comput. Math. , 2(4):333–361, 2002. [39] E. A. Papa Quiroz. An extension of the proximal point algorithm with Bregman distances on Hadamard manifolds. J. Glob. Optim. , 56(1):43–59, 2013. [40] E. A. P apa Quiroz and P . R. Oliveira. Proximal p oint metho ds for quasicon vex and con v ex functions with Bregman distances on Hadamard manifolds. J. Convex A nal. , 16(1):49–69, 2009. [41] T. Rap cs´ ak. Smo oth nonline ar optimization in R n , volume 19 of Nonc onvex Optimization and its Applic ations . Kluw er Academic Publishers, Dordrec ht, 1997. [42] J. G. Ratcliffe. F oundations of hyp erb olic manifolds , v olume 149 of Gr aduate T exts in Mathe- matics . Springer, Cham, third edition, 2019. [43] T. Sak ai. Riemannian ge ometry , volume 149 of T r anslations of Mathematic al Mono gr aphs . American Mathematical So ciet y , Pro vidence, RI, 1996. T ranslated from the 1992 Japanese original b y the author. [44] P . D. T ao and S. El Bernoussi. Algorithms for solving a class of noncon vex optimiza- tion problems. Metho ds of subgradien ts. F ermat days 85: Mathematics for optimization, T oulouse/F rance 1985, North-Holland Math. Stud. 129, 249-271 (1986)., 1986. [45] A. Tibermacine, I. E. Tib ermacine, M. Zouai, and A. Rab ehi. EEG classification using con- trastiv e learning and Riemannian tangen t space representations. In 2024 International Con- fer enc e on T ele c ommunic ations and Intel ligent Systems (ICTIS) , pages 1–7, 2024. [46] C. Udri¸ ste. Convex functions and optimization metho ds on Riemannian manifolds , volume 297 of Mathematics and its Applic ations . Kluw er Academic Publishers Group, Dordrech t, 1994. [47] X. W ang, C. Li, J. W ang, and J.-C. Y ao. Linear conv ergence of subgradient algorithm for con vex feasibilit y on Riemannian manifolds. SIAM J. Optim. , 25(4):2334–2358, 2015. 34

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment