A subdifferential characterization via Busemann functions and applications to DC optimization on Hadamard manifolds

A sub diﬀeren tial c haracterization via Busemann functions and applications to DC optimization on Hadamard manifolds O. P . F erreira ∗ D. S. Gon¸ calv es † M. S. Louzeiro ∗ S. Z. N ´ emeth ‡ J. Zh u ‡ F ebruary 25, 2026 Abstract This paper inv estigates the properties of Busemann functions on Hadamard manifolds and their use in optimization algorithms in Riemannian settings. W e present a new Busemann- based c haracterization of the sub diﬀeren tial, which is particularly well suited to Riemannian optimization. In the classical Hadamard manifold framework, a subgradient pro vides a global lo wer mo del of a con vex function expressed through the inv erse exp onen tial map. Ho wev er, this model ma y fail to exhibit a useful con v exity or conca vity structure. By con trast, our c haracterization yields a concav e bounding function b y exploiting k ey prop erties of Busemann functions. W e use this concavit y to design and analyze diﬀerence-of-conv ex (DC) optimization metho ds on Hadamard manifolds. In particular, w e reformulate the classical DC algorithm (DCA) for Riemannian con texts and study its conv ergence prop erties. W e also rep ort preliminary n umerical exp erimen ts comparing the prop osed Busemann DCA, which leads to geo desically con vex subproblems, with the classical Riemannian DCA. Keyw ords: Hadamard manifolds, Busemann functions, subgradien ts, diﬀerence of conv ex algo- rithm. AMS sub ject classiﬁcation: 90A30 · 90A26 1 In tro duction The Busemann functions plays a crucial role in geometric top ology , particularly in the analysis of manifolds like Hadamard manifolds. Introduced b y H. Busemann in [21], this concept captures the essence of the parallel axiom, facilitating the examination of geodesic geometry and oﬀering insigh ts into the b eha vior as they extend tow ards inﬁnity . Its signiﬁcance lies in its abilit y to reveal fundamen tal asp ects of the ov erall structure of space. As an essen tial element in the study of Hadamard manifolds, the Busemann functions is exp ected to provide deep er insigh ts as researc h progresses. T o corrob orate this, the discussion presented here demonstrates ho w this function can aid in the design and analysis of optimization algorithms for solving problems in Riemannian settings. F or further exploration of the Busemann functions’s role in geometry , refer to works suc h as [4, 20, 43]. Giv en the w ell-established connections b etw een Riemannian geometry and optimization, as ev- idenced b y b ooks such as [1, 16, 41, 46], and ackno wledging the fundamental role of the Busemann ∗ Institute of Mathematics and Statistics, F ederal Univ ersity of Goias, Av enida Esperan¸ ca, s/n, Campus II, Goiˆ ania, GO - 74690-900, Brazil (E-mail: orizon@ufg.br , mauriciolouzeiro@ufg.br ). † Departamen to de Matem´ atica, Universidade F ederal de Santa Catarina, Florian´ op olis, SC 88040-900, Brazil. (E-mail: douglas.goncalves@ufsc.br ) ‡ Sc ho ol of Mathematics, Universit y of Birmingham, W atson Building, Edgbaston, Birmingham - B15 2TT, United Kingdom (E-mails: s.nemeth@bham.ac.uk , jxz755@bham.ac.uk ). 1 functions in Riemannian geometry , it is natural to anticipate its signiﬁcance extending in to the ﬁeld of optimization. Indeed, the Busemann function hav e b egun to attract attention from the optimiza- tion researc h communit y op erating within the Hadamard manifolds con text. This atten tion has spark ed eﬀorts to delv e deep er into Busemann functions, exploring their p oten tial applications and theoretical implications across con tinuous optimization and machine learning domains. F or instance, in [11, 12], the concept of a resolv en t for equilibrium problems is introduced in terms of Busemann functions, alongside an inv estigation into elements of con vex analysis in Hadamard manifold settings. F urthermore, [10] presen ts a F enc hel-type conjugate utilizing Busemann functions, accompanied b y an exploration of several prop erties. Additionally , [33] discusses further developmen ts in a conv ex analysis foundation for con v ex optimization on Hadamard spaces, incorp orating the concept of the Legendre-F enchel conjugate. In [36] a subgradient-t yp e algorithm is prop osed for exploring con vex optimization on Hadamard spaces with the iteration based on the con vexit y of the sublevel sets of the Busemann functions. Although immediate practical applications are not our primary fo cus, w e note that [14] introduces h yp erb olic sliced-W asserstein discrepancies using mac hine-learning to ols. These discrepancies are constructed by pro jecting on to geo desics, for instance through pro jections deﬁned b y level sets of Busemann functions. This con text also encourages further exploration of the Busemann functions, as discussed in another related work [15]. Additionally , [31] prop oses a metho d termed h yp erb olic Busemann learning for a classiﬁcation problem, whic h aims to apply hierarc hical relations among class lab els to strategically p osition h yp erbolic protot yp es. In this pap er, w e in tro duce a nov el characterization of the sub diﬀeren tial of a con vex function, a fundamental concept in nonsmo oth optimization theory , based on Busemann functions. This c haracterization yields structural prop erties that are more suitable for optimization on Hadamard manifolds than those pro vided by the classical deﬁnition. T raditionally , in the Riemannian setting, the classical subgradient inequalit y guaran tees that a geo desically conv ex function admits a global lo wer b ound expressed through a subgradien t and the in v erse exp onen tial map. In the Euclidean case, the corresponding low er mo del is aﬃne and therefore is b oth con vex and conca v e. How ever, on a general Hadamard manifold, the low er model induced by this classical deﬁnition ma y fail to b e geo desically conv ex and may also fail to be geo desically concav e. This notion, in tro duced in [46, Deﬁnition 4.3, p. 73] and widely used in works such as [29, 37, 47], motiv ates the search for supp ort constructions with a more fa vorable geometric structure. In contrast, our Busemann-based c harac- terization yields a conca ve b ounding function obtained from the negative of a Busemann function. This choice pro vides a conv enient geometric structure for algorithmic design on Hadamard man- ifolds and suggests further directions for inv estigation b ey ond the scop e of the present work. In particular, we use these Busemann-based b ounds to address diﬀerence-of-conv ex (DC) optimization problems on Hadamard manifolds and to develop optimization metho ds suited to the Hadamard manifold framework. F or instance, we explore our Busemann-based c haracterization to address the DC optimization problem on Hadamard manifolds. W e examine the p oten tial of the Busemann functions in optimization, highlighting its role in dev eloping optimization metho ds designed to the Hadamard manifold framework. Speciﬁcally , w e revisit the classic diﬀerence of con vex algorithm (DCA) for Hadamard manifolds, initially in tro duced and analyzed in [13] as the Hadamard man- ifold counterpart to the celebrated Euclidean diﬀerence of conv ex algorithm (EDCA) introduced in [44], recen tly studied in [2]. How ev er, it is important to note that the function inv olved in the subproblem of the classic DCA on Hadamard manifolds is generally not con vex, presen ting sig- niﬁcan t solution-seeking challenges. T o o vercome this limitation, we use the geometric structure pro vided by Busemann functions. Consequently , we prop ose a reformulation of the classical DCA in whic h the subproblem b ecomes geo desically conv ex, enabling a more eﬀective treatment within the Riemannian setting. The pap er is structured as follows. In Section 1.1, we establish some notations. Section 2 2 serv es as a review of essential concepts, notations, and foundational results concerning Hadamard manifolds. In Section 3, w e delve in to the Busemann functions on Hadamard manifolds, elucidating crucial prop erties, providing necessary notations, and oﬀering illustrativ e examples. F ollowing this, in Section 4, w e in tro duce a nov el c haracterization for sub diﬀeren tial based on the Busemann functions. In Section 5, we presen t a Busemann-based algorithm for DC optimization on Hadamard manifolds, motiv ated b y the use of Busemann supp orts as geometric model for linearization and w e b egin by revisiting the classical DCA in this setting. In Section 6, w e presen t n umerical exp erimen ts comparing the practical p erformance of the Busemann DCA with the classic Riemannian DCA. Finally , Section 7 presents our conclusions. 1.1 Notation Let R m b e the m -dimensional Euclidean space. The set of all m × n matrices with real en tries is denoted by R m × n and R m ≡ R m × 1 . F or M ∈ R m × n the matrix M T ∈ R n × m denotes the tr ansp ose of M . The matrix I denotes the n × n iden tity matrix. Giv en v ∈ R n , Diag ( v ) denotes the n × n diagonal matrix with the en tries of v in its diagonal. Denote by R n ++ the p ositive orthan t. Let R : = R ∪ { + ∞} b e the extended real line. In line with [2], we will adopt the following conv entions (+ ∞ ) − (+ ∞ ) = + ∞ , (+ ∞ ) − λ = + ∞ , and λ − (+ ∞ ) = −∞ , (1) for all λ ∈ R . The domain of f : M → R is denoted b y dom f : = { p ∈ M : f ( p ) < + ∞} . Throughout this pap er, w e assume that dom f  = ∅ , indicating that f is pr op er . 2 Basics results ab out Hadamard manifolds In this section, we recall some concepts, notations, and basics results ab out Hadamard manifolds. F or more details see for example, [28, 43]. Thr oughout this p ap er M r epr esents a ﬁnite dimensional Hadamar d manifold and T p M the tangent sp ac e of M at p . The corresp onding norm asso ciated to the Riemannian metric ⟨· , ·⟩ is represented by ∥·∥ . W e use ℓ ( γ ) to express the length of a piecewise smo oth curve γ : [ a, b ] → M . The Riemannian distance betw een p and q in M is denoted by d ( p, q ), whic h induces the original top ology on M , namely , ( M , d ), which is a complete metric space. The exp onential mapping exp p : T p M → M is deﬁned b y exp p v = γ p,v (1), where γ p,v is the geo desic deﬁned by its initial p osition p , with velocity v at p . Hence, we hav e γ p,v ( t ) = exp p ( tv ). Thus, we wil l also use the expr ession exp p ( tv ) for denoting the ge o desic γ p,v starting at p ∈ M with velo city v ∈ T p M at p . F or a p ∈ M , the exp onen tial map exp p is a diﬀeomorphism and log p : M → T p M indicates its in verse. In this case, d ( p, q ) = ∥ log p q ∥ holds, d ( · , q ) : M \{ q } → R is C ∞ for all q ∈ M and its gradient is given b y grad 1 d ( p, q ) = ( − log p q ) /d ( q , p ), for all q  = p , where grad 1 denotes the gradient with respect to ﬁrst co ordinate. In addition, d 2 ( · , q ) : M → R is C ∞ for all q ∈ M , and grad 1 d 2 ( p, q ) = − 2 log p q . Let ¯ p, ¯ q ∈ M and ( p k ) k ∈ N , ( q k ) k ∈ N ⊂ M b e sequences suc h that lim k → + ∞ p k = ¯ p and lim k → + ∞ q k = ¯ q . Then, for an y q ∈ M , lim k → + ∞ log p k q = log ¯ p q and lim k → + ∞ log q p k = log q ¯ p and lim k → + ∞ log p k q k = log ¯ p ¯ q . Given p, q ∈ M , the symbol γ pq indicates the geodesic segment joining p to q , i.e., γ pq : [0 , 1] → M with γ pq (0) = p and γ pq (1) = q . In the follo wing, we will recall the well-kno wn “comparison theorem” for triangles in Hadamard manifolds, as stated in [43, Prop osition 4.5]. Lemma 1. Let M b e a Hadamard manifold. The follo wing inequality holds: d 2 ( x, y ) + d 2 ( x, z ) − 2 ⟨ log x y , log x z ⟩ ≤ d 2 ( y , z ) , ∀ x, y , z ∈ M . (2) 3 As a consequence, ∥ log x y − log x z ∥ ≤ d ( y , z ) , ∀ x, y , z ∈ M . (3) Moreo ver, if the sectional curv ature of M is identically zero, then b oth inequalities (2) and (3) hold as equalities. Deﬁnition 2. Let M b e a Hadamard manifold, and let f : M → R b e a function. W e sa y that f is L -Lipschitz on a subset Ω ⊂ M , for some constant L ≥ 0, if | f ( p ) − f ( q ) | ≤ L d ( p, q ) , for all p, q ∈ Ω ∩ dom f , where d denotes the Riemannian distance on M . The follo wing deﬁnition pla ys an imp ortant role in the pap er (see [18, p. 363]). Deﬁnition 3. A function f : M → R is said to b e lower semi-c ontinuous ( lsc ) at a p oin t p ∈ M if lim inf q → p f ( q ) = f ( p ). If f is low er semi-contin uous at every p oin t in M , w e simply say that f is lower semi-c ontinuous . W e conclude this section with a useful prop ert y of low er semi-con tinuous functions on Hadamard manifolds, as stated in the next prop osition. Its pro of is similar to that of the Euclidean space and will b e omitted here. Prop osition 4. Let M b e a Hadamard manifold and f : M → R b e a low er semi-contin uous function. If f is co erciv e, i.e., lim d ( p, ¯ p ) → + ∞ f ( p ) = + ∞ , for some ﬁxed ¯ p ∈ M , then f has a global minimizer in M . P erhaps the tw o most imp ortan t examples of Hadamard manifolds in optimization applications, apart from the Euclidean space R n , are the κ -hyperb olic space form H n κ and the space of symmetric p ositiv e deﬁnite matrices P ( n ). In the next tw o sections, we provide a brief review of their key prop erties, which will serve as the foundation for the examples and n umerical exp erimen ts dev elop ed in this work. 2.1 Basic results on the κ -h yp erb olic space form In this section, we provide a review of the basic results related to the geometry of the κ -hyperb olic space forms. References to this section include [7, 16, 42]. F or a giv en κ > 0, the n -dimensional κ -hyp erb olic sp ac e form and its tangent hyp erplane at a p oint p are denoted by H n κ := n p ∈ R n +1 : ⟨ p, p ⟩ = − 1 κ , p n +1 > 0 o , T p H n κ :=  v ∈ R n +1 : ⟨ p, v ⟩ = 0  , where, ⟨· , ·⟩ is the L or entzian inner pr o duct ⟨ x, y ⟩ := x T J y and J := Diag(1 , . . . , 1 , − 1) ∈ R ( n +1) × ( n +1) . The L or entzian pr oje ction into T p H n κ is the linear mapping Pro j κ p : R n +1 → T p H n κ deﬁned b y Pro j κ p x := x + κ ⟨ p, x ⟩ p, i.e., Pro j κ p := I + κpp T J, where I ∈ R ( n +1) × ( n +1) is the iden tity matrix. The intrinsic distanc e on the κ -hyp erb olic sp ac e form b et ween tw o p oin ts p, q ∈ H n κ is given by d κ ( p, q ) := 1 √ κ arcosh( − κ ⟨ p, q ⟩ ) . (4) The exp onential mapping exp κ q : T q H n κ → H n κ is given by exp κ q v = q for v = 0, and exp κ q v := cosh( √ κ ∥ v ∥ ) q + sinh( √ κ ∥ v ∥ ) v √ κ ∥ v ∥ , ∀ v ∈ T q H n κ \ { 0 } . 4 The inverse of the exp onential mapping log κ q : H n κ → T q H n κ at q ∈ H n κ is given b y log κ q p = 0, for p = q , and log κ q p := √ κd κ ( q , p ) q κ 2 ⟨ q , p ⟩ 2 − 1 Pro j κ q p = d κ ( q , p ) Pro j κ q p ∥ Pro j κ q p ∥ , p  = q . Let Ω ⊂ H n κ b e an op en set. The R iemannian gr adient on the κ -hyp erb olic sp ac e form of a dif- feren tiable function f : Ω → R is the unique v ector ﬁeld Ω ∋ p 7→ grad f ( p ) ∈ T p M such that d f ( p ) v = ⟨ grad f ( p ) , v ⟩ , see [16, Prop osition 7-5, p.162]. Therefore, w e hav e grad f ( p ) := Pro j κ p J f ′ ( p ) = J f ′ ( p ) + κ  J f ′ ( p ) , p  p, (5) where f ′ ( p ) ∈ R n +1 is the usual gradient of f at p . 2.2 Basic results on the manifold of symmetric p ositiv e deﬁnite matrices In this section, we provide a review of the basic results related to the geometry of the manifold of symmetric p ositiv e deﬁnite matrices. References to this section include [16, 20]. Let R n × m denote the space of real matrices of size n × m , S ( n ) ⊂ R n × n the set of symmetric matrices, and P ( n ) ⊂ R n × n the set of symmetric p ositiv e deﬁnite matrices. W e equip P ( n ) with the aﬃne- invariant R iemannian metric , which is deﬁned by ⟨ U, V ⟩ := tr  Y − 1 U Y − 1 V  , ∀ Y ∈ P ( n ) , U, V ∈ T Y P ( n ) ≡ S ( n ) , (6) where tr( · ) denotes the trace op erator, and T Y P ( n ) is the tangen t space at Y , identiﬁed with S ( n ). This metric endo ws P ( n ) with the structure of a Hadamard manifold. The aﬃne-invariant R iemannian distanc e b et ween tw o p oin ts X , Y ∈ P ( n ) is deﬁned by d ( X , Y ) = tr 1 / 2  Log 2  Y − 1 / 2 X Y − 1 / 2  = tr 1 / 2  Log 2  Y − 1 X  , (7) where Log denotes the usual matrix logarithm. The exp onential map at Y ∈ P ( n ) with resp ect to the aﬃne-inv ariant metric giv en by exp Y ( V ) := Y 1 / 2 Exp  Y − 1 / 2 V Y − 1 / 2  Y 1 / 2 , (8) for all V ∈ T Y P ( n ), where Exp denotes the usual matrix exp onen tial. The in v erse of the exponential map, is denoted by log X Y := X 1 / 2 Log( X − 1 / 2 Y X − 1 / 2 ) X 1 / 2 . (9) The gr adient on the manifold of symmetric p ositiv e deﬁnite matrices P ( n ) of a diﬀerentiable function f : P ( n ) → R is the unique vector ﬁeld P ( n ) ∋ X 7→ grad f ( X ) ∈ S ( n ) given b y grad f ( X ) = X f ′ ( X ) X, (10) where f ′ ( X ) ∈ S ( n ) denotes the Euclidean gradient of f at X . W e conclude this section with a useful prop ert y of the Riemannian distance that is instrumental in computing Busemann functions on the manifold of symmetric p ositiv e deﬁnite matrices. Its proof follo ws directly from (7) and is therefore omitted. Lemma 5. If X ∈ P ( n ) and V is a non-singular matrix, then Log( V X V − 1 ) = V (Log X ) V − 1 . As a consequence, for given X , Y ∈ P ( n ) and non-singular matrices Z and V such that Z X V and Z − 1 Y V − 1 b oth belong to P ( n ), there holds d ( Z X V , Y ) = d  X , Z − 1 Y V − 1  . 5 2.3 Sub diﬀeren tial of a conv ex function on Hadamard manifolds In this section, we recall the sub diﬀeren tial of a con vex function on a Hadamard manifold and its main prop erties, as presented in [46] and further dev elop ed in [30, 37]. These classical notions are reviewed to prepare the discussion on the c haracterization of the sub diﬀeren tial via Busemann function in the subsequent sections. W e b egin with the deﬁnitions of con vex sets and conv ex functions. A set Ω ⊂ M is said to be conv ex, if for all p, q ∈ Ω w e ha ve γ pq ( t ) ∈ Ω, for all t ∈ [0 , 1]. A function f : M → R is σ -str ongly c onvex for σ ≥ 0 if ( f ◦ γ pq )( t ) ≤ (1 − t ) f ( p )+ tf ( q ) − σ 2 t (1 − t ) d 2 ( p, q ), for all p, q ∈ M and t ∈ [0 , 1]. In particular, f is c onvex when σ = 0. F or σ = 0, f is strictly c onvex if the inequality is strict for all p  = q in dom f and all t ∈ (0 , 1). Deﬁnition 6. Let f : M → R b e a con vex function and q ∈ dom f . A vector s ∈ T q M is said to b e subgradien t of f at q if f ( p ) ≥ f ( q ) +  s, log q p  , ∀ p ∈ M . (11) The set of all subgradien ts of f at the p oint q is called the sub diﬀeren tial and is denoted b y ∂ f ( q ). It is a well-established fact that the sub diﬀeren tial set ∂ f ( p ) is nonempt y for every p ∈ int dom f . F or an analytic pro of, refer to [46, Theorem 4.5], and for a geometric pro of, see [30]. Moreo ver, ∂ f ( p ) is recognized as a con vex and compact set, as demonstrated in [46, Theorem 4.6]. T o explore further useful prop erties of ∂ f ( p ), w e denote by f ′ ( p, v ) the dir e ctional derivative of f at p in the dir e ction of v ∈ T p M , as deﬁned in [46, Deﬁnition 4.1]. Recall that, for a giv en p ∈ M , we ha v e dom f ′ ( p, · ) :=  v ∈ T p M : ∃ ˆ t > 0; exp p ( tv ) ∈ dom f ∀ t ∈ [0 , ˆ t )  . The pro of of the ﬁrst part of the next result can b e found in [46, Theorem 4.8]; for additional details, see [24] and [37, Prop osition 3.8(ii)]. The second part is addressed in [37, Prop osition 4.3]. Prop osition 7. Let f : M → R be a conv ex function. Then, for each ﬁxed p ∈ dom f , there holds ∂ f ( p ) = { s ∈ T p M : f ′ ( p, v ) ≥ ⟨ s, v ⟩ , v ∈ T p M } . In addition, if g : M → R is a con vex function suc h that dom f ∩ dom g is con vex, then ∂ ( f + g )( p ) = ∂ f ( p ) + ∂ g ( p ), for each p ∈ (int dom f ) ∩ dom g . The proof of the ﬁrst claim in the theorem b elo w can b e found in [46, Theorem 4.10, p. 76], with the pro of of the second claim following a similar approac h. Theorem 8. Let f : M → R be a function. Then f is conv ex (resp. σ -strongly con vex) if and only if dom f is conv ex and, for every p ∈ dom f , there exists v ∈ T p M suc h that f ( q ) ≥ f ( p ) + ⟨ v , log p q ⟩ (resp. f ( q ) ≥ f ( p ) + ⟨ v , log p q ⟩ + σ 2 d 2 ( p, q )), for all q ∈ M . In either case, ∂ f ( p )  = ∅ for all p ∈ dom f , and the inequality holds for ev ery v ∈ ∂ f ( p ). The pro of of the follo wing result immediately follo ws from [47, Prop osition 2.5]. Prop osition 9. Let f : M → R b e a conv ex and low er semicon tinuous function. Consider the sequence ( p k ) k ∈ N ⊂ int dom f such that lim k →∞ p k = ¯ p ∈ in t dom f . If ( v k ) k ∈ N is a sequence suc h that v k ∈ ∂ f ( p k ) for every k ∈ N , then ( v k ) k ∈ N is b ounded and its cluster p oin ts b elong to ∂ f ( ¯ p ) . 3 The Busemann functions on Hadamard manifolds In this section, we review Busemann functions on Hadamard manifolds, in tro ducing the notation and collecting the prop erties needed in the sequel. Since our deﬁnition is sligh tly more general than the standard one, w e include brief pro ofs of selected results. F or further background, see, for instance, [43]. 6 Let M b e a Hadamard manifold with Riemannian distance d . Giv en a base p oin t q ∈ M and a v ector v ∈ T q M , the asso ciated Busemann function is deﬁned by B q ,v ( p ) : = lim t → + ∞  d  p, exp q ( tv )  − ∥ v ∥ t  , ∀ p ∈ M . (12) F or v = 0, this reduces to B q , 0 ( p ) = d ( q , p ). Moreov er, b y the triangle inequalit y , | B q ,v ( p ) | ≤ d ( q , p ) , ∀ q , p ∈ M , ∀ v ∈ T q M . (13) Remark 1. Classically , Busemann functions are deﬁned for unit vectors v  = 0; see, e.g., [20, Deﬁnition 8.17, p. 268] and [43, p. 174]. Since B q ,v = B q ,v/ ∥ v ∥ for v  = 0, w e extend the deﬁnition to arbitrary v , including v = 0, whic h is con venien t for our purp oses. The following lemma summ arizes the main regularity prop erties of Busemann functions and pro vides tw o equiv alent expressions for their gradients. Lemma 10. Let q ∈ M and v ∈ T q M with v  = 0. Then B q ,v is conv ex and contin uously diﬀerentiable on M , with grad B q ,v ( p ) = − lim t →∞ log p (exp q ( tv )) d ( p, exp q ( tv )) (14) = − 1 ∥ v ∥ lim t →∞ log p (exp q ( tv )) t , ∀ p ∈ M . (15) Moreo ver, ∥ grad B q ,v ( p ) ∥ = 1 for all p ∈ M , and grad B q ,v ( q ) = − v / ∥ v ∥ . Pr o of. The iden tity in (14) is established in the pro of of [43, Lemma 4.12, p. 231]. T o pro ve (15), note that by the triangle inequalit y we ha ve t ∥ v ∥ − d ( p, q ) ≤ d ( p, exp q ( tv )) ≤ t ∥ v ∥ + d ( p, q ) for all t > 0. Thus, dividing by t and then taking the limit as t go es to + ∞ , we conclude that lim t →∞ ( d ( p, exp q ( tv )) /t ) = ∥ v ∥ . Com bining this limit with (14), w e obtain grad B q ,v ( p ) = − 1 ∥ v ∥ lim t →∞ log p (exp q ( tv )) d ( p, exp q ( tv )) lim t →∞ d ( p, exp q ( tv )) t = − 1 ∥ v ∥ lim t →∞ log p (exp q ( tv )) t , whic h completes the pro of. Note that, for v = 0, we ha ve B q , 0 = d ( q , p ). Therefore, we also obtain that B q , 0 is conv ex and con tinuously diﬀeren tiable with the gradien t vector ﬁeld grad B q , 0 satisfying ∥ grad B q , 0 ( p ) ∥ = 1, for all p  = q . The following lemma, whose pro of is straigh tforward and th us omitted, in particular implies that the Busemann functions B q ,v is linear along the geo desic t 7→ exp q ( tv ) that deﬁnes it. Lemma 11. Let q ∈ M b e a base p oin t and w ∈ T q M . Then, B q ,v (exp q ( τ v )) = τ ∥ v ∥ for all τ ∈ R . Consequen tly , the Busemann functions B q ,v is unbounded b oth ab o v e and b elow. The following lemma records a contin uity prop ert y of the Busemann functions that will b e particularly useful in Section 5. A pro of can b e obtained by adapting the argument of [23, Lemma 5], see also [3, Chapter I I.1]. Lemma 12. Let ¯ q ∈ M and ¯ w ∈ T ¯ q M . Consider sequences ( q k ) k ∈ N ⊂ M and ( v k ) k ∈ N with v k ∈ T q k M , satisfying lim k → + ∞ q k = ¯ q , and lim k → + ∞ v k = ¯ w . Then, lim k → + ∞ B q k ,v k ( p ) = B ¯ q, ¯ w ( p ) , for all p ∈ M . W e conclude this section with a useful iden tity , obtained directly from (12), whic h pro vides a practical means for computing the Busemann functions. Prop osition 13. Let q ∈ M b e a base p oin t and w ∈ T q M with v  = 0. Then, there holds B q ,v ( p ) = lim t → + ∞ ( d 2  p, exp q ( tv )) − ( ∥ v ∥ t ) 2  2 ∥ v ∥ t , ∀ p ∈ M . 7 3.1 Examples of Busemann functions In this section, w e pro vide examples of the Busemann functions. W e b egin by introducing a funda- men tal property that establishes a stronger inequality than (13) in the case where the Busemann functions is p ositiv e. This prop erty not only serves as motiv ation for the examples presented in Hadamard manifolds with identically zero sectional curv ature but also plays an essential role in subsequen t sections. Lemma 14. Let M b e a Hadamard manifold. Then the Busemann functions B q ,v deﬁned in (12), asso ciated with a base p oin t q ∈ M and a direction v ∈ T q M , satisﬁes the inequality −  v , log q p  ≤ ∥ v ∥ B q ,v ( p ) , ∀ p ∈ M . (16) Moreo ver, if the sectional curv ature is K ≡ 0 in whole M , then inequality (16) holds as equalit y ∥ v ∥ B q ,v ( p ) = −  v , log q p  , for all p ∈ M . Pr o of. It is immediate that (16) holds for v = 0. W e no w assume v  = 0. By applying Lemma 1 to the geo desic triangle with v ertices x = q , y = p , and z = exp q ( tv ), it follo ws that d 2 ( q , p ) + d 2 ( q , exp q ( tv )) − 2  log q p, log q exp q ( tv )  ≤ d 2 ( p, exp q ( tv )) , ∀ t > 0 . (17) Since log q exp q ( tv ) = tv and d ( q , exp q ( tv )) = t ∥ v ∥ , it follo ws from the last inequalit y that d 2 ( q , p ) − 2 t  log q p, v  ≤ ( d 2 ( p, exp q ( tv )) − ( ∥ v ∥ t ) 2 ) , ∀ t > 0 . After p erforming some algebraic manipulations, the last inequality can b e expressed as follo ws d 2 ( q , p ) 2 t −  log q p, v  ≤ ∥ v ∥ ( d 2  p, exp q ( tv )) − ( ∥ v ∥ t ) 2  2 ∥ v ∥ t , ∀ t > 0 . T aking the limit in the previous inequality as t → + ∞ and using Prop osition 13, w e obtain (16). Moreo ver, b y Lemma 1, if the sectional curv ature satisﬁes K ≡ 0 on M , then (17) holds with equalit y . Consequen tly , all subsequen t inequalities are equalities, whic h prov es the desired equality . In the follo wing example, we present an explicit form ula for Busemann functions on a Hadamard manifold with identically zero sectional curv ature. Example 1. Let M b e a Hadamard manifold, q ∈ M a base p oin t, and v ∈ T q M . If the sectional curv ature of M is iden tically zero, denoted b y K ≡ 0, then the Busemann function B q ,v is giv en b y B q ,v ( p ) : =    D − v ∥ v ∥ , log q p E , ∀ v ∈ T q M , v  = 0 , d ( q , p ) , v = 0 . (18) Indeed, when K ≡ 0 on M , Lemma 14 yields ∥ v ∥ B q ,v ( p ) = −⟨ v , log q p ⟩ , ∀ p ∈ M , whic h, together with B q , 0 ( p ) = d ( q , p ), implies (18). In particular, for M = R n , we obtain B q ,v ( p ) : =    − D v ∥ v ∥ , p − q E , ∀ v ∈ T q M , v  = 0 , ∥ q − p ∥ , v = 0 . (19) Since log q p = p − q and d ( q , p ) = ∥ p − q ∥ in R n , (19) follows directly from (18). 8 Next, w e use (18) to derive an explicit expression for the Busemann functions on the p ositiv e orthan t endow ed with the Dikin metric. A detailed study of the p ositiv e orthan t with the Dikin metric is given in [38]. Example 2. Let M : = ( R n ++ , G ) b e the p ositiv e orthan t R n ++ endo wed with the Dikin metric ⟨ u, v ⟩ := u T G ( q ) v , where u, v ∈ T q M and G ( q ) ∈ R n × n is the diagonal matrix G ( q ) : = diag  q − 2 1 , . . . , q − 2 n  , and q i denotes the i -th co ordinate of the p oin t q . The exp onen tial map exp q : T q M → M is given by exp q ( v ) =  q 1 e v 1 /q 1 , . . . , q n e v n /q n  , v := ( v 1 , . . . , v n ) ∈ T q M ≡ R n . In addition, direct calculations show that the inv erse of the exp onential log q : M → T q M is given b y log q p = ( q 1 ln( p 1 /q 1 ) , . . . , q n ln( p n /q n )) , p := ( p 1 , . . . , p n ) ∈ M . Therefore, it follows from Example 1 that the Busemann functions B q ,v is given by B q ,v ( p ) :=    − 1 ∥ v ∥ P n i =1 ( v i /q i ) ln( p i /q i ) , ∀ v ∈ T q M , v  = 0 ,  P n i =1  ln( p i /q i )  2  1 / 2 , v = 0 . In the follo wing example, we provide an explicit formula for the Busemann functions on the κ -h yp erb olic space form H n κ . F or the detailed computations, see App endix 1. Example 3. Let H n κ b e the κ -h yp erbolic space form introduced in Section 2.1. Let q ∈ H n κ b e a base p oin t, w ∈ T q H n κ . Then, the Busemann functions B q ,v : H n κ → R is given b y B q ,v ( p ) := ( 1 √ κ ln  − D p, κ q + √ κ 1 ∥ v ∥ v E , ∀ v ∈ T q M , v  = 0 , d ( q , p ) , v = 0 . (20) and its gradient, for v ∈ T q M with v  = 0 is giv en by grad B q ,v ( p ) = 1 √ κ 1 D p, κ q + √ κ 1 ∥ v ∥ v E  κ q + √ κ 1 ∥ v ∥ v + κ  κ q + √ κ 1 ∥ v ∥ v , p  p  . (21) P articularly , if p = q , then due to ⟨ q , q ⟩ = − 1 κ and ⟨ p, v ⟩ = 0, the ﬁnal equation simpliﬁes to grad B q ,v ( q ) = − 1 √ κ  κ q + √ κ 1 ∥ v ∥ v − κq  = − 1 ∥ v ∥ v . In the next example, we deriv e explicit expressions for the Busemann functions and their Rieman- nian gradients on the manifold of symmetric p ositiv e deﬁnite matrices described in Section 2.2. Al- though alternativ e formulas are a v ailable in the literature [20, Prop osition 10.69], [33, Lemma 2.32], w e introduce a new representation that is computationally c heap er and b etter suited for numerical applications. F or completeness, App endix 2 pro vides a direct deriv ation that av oids the general theory of symmetric spaces used in previous approac hes. 9 Example 4. Let P ( n ) b e endow ed with the structure of a Hadamard manifold as introduced in Section 2.2. Given X, Y ∈ P ( n ) and V ∈ S ( n ) \ { 0 } , consider the sp ectral decomp osition Y − 1 / 2 V Y − 1 / 2 = U DU T , D :=      λ 1 I n 1 0 · · · 0 0 λ 2 I n 2 · · · 0 . . . . . . . . . . . . 0 0 · · · λ k I n k      . Here, λ 1 , . . . , λ k are the distinct eigen v alues of the matrix Y − 1 / 2 V Y − 1 / 2 , which are ordered such that λ 1 < · · · < λ k ; n i is the multiplicit y of λ i ; I n i ∈ R n i × n i is the identit y matrix; and U ∈ R n × n is an orthogonal matrix. Let U T Y − 1 / 2 X Y − 1 / 2 U = LL T , b e the Cholesky decomp osition. Then, the Busemann functions B Y ,V ev aluated at X is given by B Y ,V ( X ) = − 2  n 1 λ 2 1 + · · · + n k λ 2 k  − 1 / 2 k X i =1 α i X j = α i − 1 +1 λ i ln ( L j j ) , (22) where L j j > 0 denotes the j -th diagonal en try of L , α 0 = 0, and α i = P i j =1 n j for i = 1 , . . . , k . Moreo ver, the Riemannian gradien t of B Y ,V at X is giv en by grad B Y ,V ( X ) = −  n 1 λ 2 1 + · · · + n k λ 2 k  − 1 / 2 Y 1 / 2 U LDL T U T Y 1 / 2 . (23) F or more explicit examples of Busemann functions, see, for instance [10, 20, 33]. As noted in Remark 1, the examples in these references need to b e adapted to ﬁt our deﬁnition. 4 Characterization for the sub diﬀeren tial via Busemann functions In this section, we provide a Busemann-function c haracterization of the classical sub diﬀerential on Hadamard manifolds. More precisely , w e sho w that the usual subgradient inequalit y can b e equiv alently expressed in terms of support functions built from Busemann functions. This viewpoint yields an intrinsic geometric represen tation of subgradients and equips the resulting supp ort models with a more suitable structure for subsequen t dev elopments. In particular, the Busemann-based supp ort function enjo ys conca vit y prop erties under our conv en tion, a feature that will b e crucial in the algorithmic analysis carried out in the next sections. It is w ell known that when the Hadamard manifold M has iden tically zero sectional curv ature, the function app earing on the right-hand side of (11) in Deﬁnition 6, namely , M ∋ p 7→ f ( q ) +  s, log q p  , (24) is aﬃne in the sense that its Riemannian Hessian v anishes identically . This aﬃne supp ort mo del is cen tral to many developmen ts in Euclidean nonsmo oth optimization. How ev er, when the curv ature of M is nonzero, the function (24) is no longer aﬃne and, in general, it is neither geo desically con vex nor geo desically concav e; see [35]. This motiv ates the search for an alternative supp ort represen tation that is more suitable for optimization on Hadamard manifolds. On the other hand, b y Example 1, when the sectional curv ature of M is identically zero, i.e., K ≡ 0, we hav e the follo wing equality  s, log q p  = −∥ s ∥ B q ,s ( p ) . 10 Although in general ⟨ s, log q p ⟩  = −∥ s ∥ B q ,s ( p ), the ﬂat-case iden tit y suggests replacing the model (24) b y the Busemann-based supp ort function p 7→ f ( q ) − ∥ s ∥ B q ,s ( p ) , p ∈ M , (25) with the aim of obtaining an in trinsic characterization of the classical sub diﬀeren tial. In this con text, Lemma 10 shows that taking (25) as a support function yields the conca vit y prop erty required in our subsequen t analysis. The next theorem establishes this characterization for σ -strongly conv ex functions; in particular, taking σ = 0 recov ers the con vex case. Theorem 15. Let M b e a Hadamard manifold and let f : M → R b e a σ -strongly con vex function with σ ≥ 0. Then, for ev ery q ∈ dom f , the sub diﬀeren tial of f at q admits the c haracterization ∂ f ( q ) =  s ∈ T q M : f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M  . Pr o of. W e begin by pro ving ∂ f ( q ) ⊆  s ∈ T q M : f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M  . Let s ∈ ∂ f ( q ). Since f is σ -strongly conv ex, b y Theorem 8 w e hav e f ( p ) ≥ f ( q ) + ⟨ s, log q p ⟩ + σ 2 d 2 ( p, q ), for all p ∈ M . On the other hand, Lemma 14 guaran tees that −∥ s ∥ B q ,s ( p ) ≤ ⟨ s, log q p ⟩ , for all p ∈ M . Com bining the last t wo inequalities gives f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ), for all p ∈ M , whic h shows that s ∈  s ∈ T q M : f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M  . (26) and the ﬁrst inclusion follows. W e now pro ve the reverse inclusion. F or that, tak e s ∈  s ∈ T q M : f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M  . If s = 0, then f ( p ) ≥ f ( q ) + σ 2 d 2 ( p, q ) ≥ f ( q ) for all p ∈ M , whic h implies that 0 ∈ ∂ f ( q ). Assume now that s  = 0 and let v ∈ T q M . If v / ∈ dom f ′ ( q , · ), then f ′ ( q ; v ) = + ∞ and th us we conclude that ⟨ s, v ⟩ ≤ f ′ ( q ; v ). Otherwise, take v ∈ dom f ′ ( q , · ). Then there exists ˆ t > 0 such that exp q ( tv ) ∈ dom f for all t ∈ [0 , ˆ t ), and f (exp q ( tv )) ≥ f ( q ) − ∥ s ∥ B q ,s (exp q ( tv )) + σ 2 d 2 (exp q ( tv ) , q ) , ∀ t ∈ [0 , ˆ t ) . Since d (exp q ( tv ) , q ) = t ∥ v ∥ and B q ,s ( q ) = 0, dividing b oth sides b y t yields f (exp q ( tv )) − f ( q ) t ≥ −∥ s ∥ B q ,s (exp q ( tv )) − B q ,s ( q ) t + σ 2 t ∥ v ∥ 2 , ∀ t ∈ [0 , ˆ t ) . T aking the limit as t go es to 0 + and using Lemma 10, namely grad B q ,s ( q ) = − s/ ∥ s ∥ , w e obtain f ′ ( q ; v ) ≥ −∥ s ∥⟨ grad B q ,s ( q ) , v ⟩ = ⟨ s, v ⟩ . Therefore, for all v ∈ T q M we ha ve ⟨ s, v ⟩ ≤ f ′ ( q ; v ). Hence, by Prop osition 7, w e conclude that s ∈ ∂ f ( q ), whic h together with (26) completes the pro of. As a direct consequence of Theorem 15, we obtain a v ariational characterization of subgradients in terms of global minimizers of a Busemann-based supp ort model, with an additional quadratic term accounting for σ -strong conv exit y . Corollary 16. Let M b e a Hadamard manifold, let f : M → R b e a σ -strongly con vex function with σ ≥ 0, and let q ∈ dom f . Then, s ∈ ∂ f ( q ) if and only if q ∈ argmin p ∈ M  f ( p ) + ∥ s ∥ B q ,s ( p ) − σ 2 d 2 ( p, q )  . As a consequence, if s ∈ ∂ f ( q ), then f ( q ) = min p ∈ M  f ( p ) + ∥ s ∥ B q ,s ( p ) − σ 2 d 2 ( p, q )  . 11 Pr o of. Let q ∈ dom f and s ∈ T q M , and deﬁne the auxiliary function ψ : M → R by ψ ( p ) := f ( p ) + ∥ s ∥ B q ,s ( p ) − σ 2 d 2 ( p, q ) , p ∈ M . Since B q ,s ( q ) = 0 and d ( q , q ) = 0, we hav e ψ ( q ) = f ( q ). Assume ﬁrst that s ∈ ∂ f ( q ). Thus, by Theorem 15 we hav e f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M . Rearranging and using the deﬁnition of ψ , w e obtain ψ ( p ) ≥ ψ ( q ) for all p ∈ M , which implies that q ∈ argmin p ∈ M ψ ( p ). F or the con v erse, assume that q ∈ argmin p ∈ M ψ ( p ). Then ψ ( p ) ≥ ψ ( q ) for all p ∈ M , that is, f ( p ) + ∥ s ∥ B q ,s ( p ) − σ 2 d 2 ( p, q ) ≥ f ( q ), for all p ∈ M , whic h is equiv alent to f ( p ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M . By Theorem 15, the latter implies s ∈ ∂ f ( q ) and the equiv alence is prov ed. F or the last statement, if s ∈ ∂ f ( q ), then by the ﬁrst part w e ha v e q ∈ argmin p ∈ M ψ ( p ), hence inf p ∈ M ψ ( p ) = ψ ( q ). Since ψ ( q ) = f ( q ), the conclusion follows. W e next present Corollary 16 in p erspective b y comparing its v ariational characterization with the Busemann-subgradient notion of [32], emphasizing the ﬂat case where they coincide and the nonﬂat case where they may diﬀer. Remark 2. In the context of Hadamard manifolds, Corollary 16 is closely related in nature to the notion of Busemann sub gr adient in [32, Deﬁnition 3.1], although the resulting supp orting ob jects diﬀer in general. In the conv ex case σ = 0, Corollary 16 states that s ∈ ∂ f ( q ) if and only if q ∈ arg min p ∈ M  f ( p ) + ∥ s ∥ B q ,s ( p )  . (27) On the other hand, in the Hadamard manifold setting one ma y express [32, Deﬁnition 3.1] in our framew ork as follows: a Busemann subgradient at a p oin t x ∈ dom f can b e represented b y a vector ξ ∈ T x M such that x ∈ arg min y ∈ M  f ( y ) − ∥ ξ ∥ B x, − ξ ( y )  , (28) see also [23, Deﬁnition 1]. In the ﬂat c ase (sectional curv ature identically zero), Example 1 yields ∥ s ∥ B q ,s ( p ) = −⟨ s, log q p ⟩ , ∥ ξ ∥ B x, − ξ ( y ) = ⟨ ξ , log x y ⟩ , and B x, − ξ = − B x,ξ . Hence (28) is equiv alen t to x ∈ arg min y ∈ M { f ( y )+ ∥ ξ ∥ B x,ξ ( y ) } , whic h coincides with (27) after the identiﬁcation q = x and s = ξ . In particular, b oth notions recov er the classical Euclidean supp orting-h yp erplane condition and the corresponding supports coincide. In contrast, on nonﬂat Hadamard manifolds (27) and (28) are not equiv alen t in general. A key p oin t is that [32] requires a glob al supp ort parameterized by an ideal direction and a sp eed, and suc h a support ma y fail to exist even for geo desically conv ex functions; see [32, Example 3.2]. Consequently , if one deﬁnes the b -sub diﬀeren tial at x by ∂ b f ( x ) :=  ξ ∈ T x M : (28) holds  , or equiv alently b y ∂ b f ( x ) :=  ξ ∈ T x M : f ( y ) ≥ f ( x ) + ∥ s ∥ B x, − ξ ( y ) , ∀ y ∈ M  , then ∂ b f ( x ) ⊂ ∂ f ( x ), and this inclusion can b e strict b ey ond the ﬂat case. As a further consequence of Theorem 15, w e recov er the following classical b ound linking Lips- c hitz contin uit y and the norm of subgradients. 12 Corollary 17. Let f : M → R b e conv ex and L -Lipschitz on M , i.e., | f ( x ) − f ( y ) | ≤ L d ( x, y ) for all x, y ∈ M . Then, for ev ery q ∈ dom f and every s ∈ ∂ f ( q ), there holds ∥ s ∥ ≤ L. Pr o of. Fix q ∈ dom f and s ∈ ∂ f ( q ). If s = 0, the conclusion is trivial. Assume s  = 0, and for τ > 0 set p τ := exp q ( τ s ). By Theorem 15 and Lemma 11, we hav e f ( p τ ) ≥ f ( q ) − ∥ s ∥ B q ,s ( p τ ) = f ( q ) + τ ∥ s ∥ 2 . On the other hand, considering that f is a L -Lipschitz function on M we conclude that f ( p τ ) ≤ f ( q ) + L d ( q , p τ ) = f ( q ) + Lτ ∥ s ∥ . Com bining the tw o inequalities and dividing b y τ > 0 yields ∥ s ∥ 2 ≤ L ∥ s ∥ , hence ∥ s ∥ ≤ L . As a further application of Theorem 15, we obtain an intrinsic c haracterization of σ -strong con vexit y in terms of Busemann-based supp ort inequalities. Prop osition 18. Let f : M → R b e a function. Then f is σ -strongly conv ex with σ ≥ 0 if and only if dom f is conv ex and, for every q ∈ dom f , there exists v ∈ T q M such that f ( p ) ≥ f ( q ) − ∥ v ∥ B q ,v ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M . In this case, ∂ f ( q )  = ∅ for all q ∈ dom f , and the inequalit y holds for ev ery v ∈ ∂ f ( q ). Pr o of. First assume that f is σ -strongly conv ex. Then, by Theorem 8, dom f is conv ex and we ha ve ∂ f ( q )  = ∅ for all q ∈ dom f . Fix q ∈ dom f and take v ∈ ∂ f ( q ). Applying Theorem 15 we obtain f ( p ) ≥ f ( q ) − ∥ v ∥ B q ,v ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M , whic h in particular sho ws the existence of suc h a v ector at each q . Con versely , assume that dom f is conv ex and that for every q ∈ dom f there exists v ∈ T q M suc h that f ( p ) ≥ f ( q ) − ∥ v ∥ B q ,v ( p ) + σ 2 d 2 ( p, q ) , ∀ p ∈ M . By Theorem 15, it follo ws that v ∈ ∂ f ( q ) for ev ery q ∈ dom f . In particular, ∂ f ( q )  = ∅ for all q ∈ dom f . Hence, b y Deﬁnition 6, for each q ∈ dom f there exists u ∈ T q M (namely , u := v ) suc h that f ( p ) ≥ f ( q ) +  u, log q p  + σ 2 d 2 ( p, q ) , ∀ p ∈ M . Applying Theorem 8, we conclude that f is σ -strongly conv ex. Finally , once σ -strong conv exity is established, Theorem 15 ensures that the Busemann supp ort inequalit y holds for every v ∈ ∂ f ( q ). W e conclude this section by comparing our Busemann supp ort characterization of geo desic con vexit y with horospherical conv exit y [23], noting that they coincide in the ﬂat case but may diﬀer in the nonﬂat case. Remark 3. Prop osition 18 is conceptually diﬀeren t from the horospherical supp ort inequalit y in [23, Deﬁnition 1], although it plays an analogous supp orting-role; it pro vides a geo desic conv exity c haracterization via Busemann-based supp orts. Indeed, when σ = 0, Prop osition 18 asserts that for every q ∈ dom f there exists v ∈ T q M such that f ( p ) − f ( q ) ≥ −∥ v ∥ B q ,v ( p ), for all p ∈ M . F or comparison, the deﬁning inequalit y of h -conv exit y in [23, Eq. (6)] can b e written as f ( p ) − f ( q ) ≥ ∥ v ∥ B q , − v ( p ). Moreo ver, b y Example 1, in the ﬂat c ase (sectional curv ature iden tically zero) one has 13 B q ,v ( p ) = ⟨ v , log q p ⟩ , and, in this setting, B q , − v = − B q ,v , so the t wo supp ort inequalities coincide and b oth reduce to the classical aﬃne supp ort characterization of con vexit y in R n . In con trast, on nonﬂat Hadamard manifolds the antisymmetry B q , − v = − B q ,v generally fails, so horospherical supp orts built from B q , − v and the Busemann supports in Prop osition 18, which are expressed in terms of − B q ,v , need not coincide; see [23, Sec. 3.5] for a conceptual discussion of the genuinely global nature of h -con vexit y . Finally , Prop osition 18 guaran tees that, in the conv ex case, the supp ort inequality holds for every Riemannian subgradient v ∈ ∂ f ( q ), whereas the horospherical sub diﬀeren tial ∂ h f ( q ) in [23] consists only of those directions pro ducing a global horospherical supp ort. In fact, we can sho w that ∂ h f ( q ) ⊂ ∂ f ( q ), and this inclusion can b e strict b ey ond the ﬂat case. 5 The Busemann DC Algorithm for DC optimization In this section, we develop a Busemann-based DC algorithm for DC optimization on Hadamard manifolds. The main idea is to replace the standard linearization step in classical DC schemes by a Busemann-t yp e supp ort term, thereb y producing geo desically con vex subproblems that better re- ﬂect the am bien t geometry . This yields a geometric DCA framework and a practical algorithmic to ol for DC programs on Hadamard manifolds. W e consider the diﬀer enc e-of-c onvex (DC) optimization problem on a Hadamard manifold M , deﬁned b y argmin p ∈ M ϕ ( p ) , where ϕ ( p ) : = g ( p ) − h ( p ) , (29) where g , h : M → R are prop er, low er semicon tinuous, and geo desically conv ex. Despite ϕ being the diﬀerence of t w o con vex functions, ϕ generally do es not exhibit conv exity . Ho w ever, we can show that ϕ is lo cally Lipschitz on the in terior of its domain. Consequently , ϕ p ossesses a sub diﬀeren tial in the Clark e sense, denoted by ∂ o ϕ , as detailed in [8, 9]. Moreo ver, it can b e demonstrated that ∂ o ϕ ( p ) ⊂ ∂ g ( p ) − ∂ h ( p ). Therefore, a requisite condition for a p oint p ∗ ∈ M to qualify as a lo cal minim um of ϕ = g − h is that 0 ∈ ∂ o ϕ ( p ∗ ) ⊂ ∂ g ( p ∗ ) − ∂ h ( p ∗ ). Thus, leading us to deﬁne a critic al p oint of problem (29) as follo ws: Deﬁnition 19. A p ∗ ∈ M is called a critical p oin t for the problem (29) if ∂ g ( p ∗ ) ∩ ∂ h ( p ∗ )  = ∅ . F or further discussion on the deﬁnition of a critical p oin t for problem (29), see [26] for example. In the following tw o sections, we emplo y the Busemann functions to introduce and analyze tw o v ariants of the EDCA for solving problem (29). In addition to prop erness, lo wer semicon tin uity , and con vexit y , we w ork under the following assumptions: A1) M is a Hadamar d manifold . A2) ϕ inf : = inf x ∈ M ϕ ( x ) > −∞ ; A3) dom g and dom h are con vex and dom g ⊆ in t dom h. Firstly , w e address the assumptions outlined ab ov e. Assumption (A1) is fundamental for the analysis of the algorithms prop osed in the follo wing sections, since it is rep eatedly used through Theorem 15. Under assumption (A2), the domain of ϕ equals the domain of g , that is, dom ϕ = dom g ⊆ dom h . This inclusion is derived from the fact that if dom g ⊆ dom h , then there exists a p oin t p ∈ dom g suc h that p / ∈ dom h . According to (1), this situation implies ϕ ( p ) = g ( p ) − h ( p ) = g ( p ) − (+ ∞ ) = −∞ , whic h is in contradiction with assumption (A2). Consequently , we hav e dom g ⊆ dom h , and th us dom g ⊆ dom ϕ . 14 Con versely , assume for contradiction that dom ϕ ⊆ dom g . Then, there must b e some p ∈ dom ϕ for whic h g ( p ) = + ∞ . Utilizing (1) again, we ﬁnd ϕ ( p ) = g ( p ) − h ( p ) = (+ ∞ ) − h ( p ) = + ∞ . Ho wev er, this con tradicts the deﬁnition of the domain of ϕ , since p would not b e included in dom ϕ if ϕ ( p ) w ere inﬁnity . Thus, dom ϕ ⊆ dom g . Therefore, dom ϕ = dom g , and since dom g ⊆ dom h under assumption (A2), it follo ws that assumption (A3) is only marginally more restrictive than (A2). Additionally , if dom h = M , then assumption (A3) is naturally satisﬁed. 5.1 The Busemann DC algorithm In this subsection, we b egin b y revisiting the classical diﬀerence-of-conv ex algorithm on Hadamard manifolds, introduced and analyzed in [13] as a manifold counterpart of the EDCA. Since the function in v olved in the subproblem of the classic DCA on Hadamard manifolds is not conv ex in general, it presents challenges in seeking solutions. Here, by using Busemann functions, w e prop ose a new version of this metho d to ov ercome this limitation. Sp eciﬁcally , the function in the subproblem is no w conv ex, enabling more eﬀectiv e optimization within the Riemannian con text. T o prop ose and analyze the method, w e assume that the functions in problem (29) satisfy the following hypothesis: (H1) g : M → R and h : M → R are σ -strongly conv ex and lsc, where σ > 0; W e b egin by demonstrating that (H1) is not restrictive. In fact, let q ∈ M and σ > 0. Consider the function M ∋ p 7→ ( σ / 2) d 2 ( q , p ), which is σ -strongly con vex, as demonstrated in [25, Corollary 3.1]. If ˜ g : M → R and ˜ h : M → R are conv ex, then by c ho osing q ∈ M and deﬁning g ( p ) = ˜ g ( p ) + ( σ / 2) d 2 ( q , p ) and h ( p ) = ˜ h ( p ) + ( σ / 2) d 2 ( q , p ), w e obtain tw o σ -strongly conv ex functions g and h on M . F urthermore, it holds that ϕ = ˜ g − ˜ h = g − h . Ho wev er, it is imp erativ e to caution users of the Riemannian diﬀerence of conv ex algorithm prop osed b elo w that the selection of the parameter σ > 0 signiﬁcan tly inﬂuences the con vergence rate of the metho ds employ ed to solv e the subproblem, as well as the o verall method. Next, we revisit a Riemannian version of the EDCA, whic h, using the same terminology , we refer to it as the classic Riemannian Diﬀer enc e of Convex A lgorithm (CR-DCA). This algorithm is used to solve the DC problem (29), which is formally stated as follows: Algorithm 1 Classic Riemannian diﬀerence of conv ex algorithm (CR-DCA) Step 0. Set p 0 ∈ dom g and set k ← 0; Step 1. T ake s k ∈ ∂ h ( p k ), and deﬁne the next iterate p k +1 as follows p k +1 = argmin p ∈ M  g ( p ) −  s k , log p k p  . (30) Step 2. If p k +1 = p k , then stop and return the p oin t p k . Otherwise, set k ← k + 1 and go to Step 1 . It can be shown that in the Euclidean setting, the inv erse of the exp onen tial mapping is giv en b y M ∋ p 7→ log p k p = p − p k . Therefore, Algorithm 1, represents a version in Hadamard manifold of the EDCA. The motiv ation b ehind introducing the EDCA lies in the fact that the function M ∋ p 7→ g ( p ) − ⟨ s k , p − p k ⟩ in the subproblem (30) in the Euclidean setting is con v ex. By doing so, one replaces the solution of the non-conv ex problem (29) with the solution of a sequence of con vex subproblems. Ho wev er, as discussed in Section 4, the function in the subproblem (30) is not con vex on general Hadamard manifolds, p osing c hallenges in its solution. T o address this limitation, we can redeﬁne the CR-DCA by employing a Busemann functions, wherein the function in the subproblem coun terpart of (30) b ecomes conv ex. The formulation of the Busemann Diﬀer enc e of Convex A lgorithm (B-DCA) to solv e the DC problem (29) is outlined b elo w: 15 Algorithm 2 Busemann diﬀerence of conv ex algorithm (B-DCA) Step 0. Set p 0 ∈ dom g and set k ← 0; Step 1. T ake s k ∈ ∂ h ( p k ), and deﬁne the next iterate p k +1 as follows p k +1 := argmin p ∈ M ( g ( p ) + ∥ s k ∥ B p k ,s k ( p )) . (31) Step 2. If p k +1 = p k , then stop and return the p oin t p k . Otherwise, set k ← k + 1 and go to Step 1 . It is noteworth y that in the case where the curv ature of the Hadamard manifold is K = 0, Example 1 establishes the equiv alence b et ween Algorithm 1 and Algorithm 2. In particular, Exam- ple 1 serv es to illustrate that B-DCA aligns with EDCA. F urthermore, comparing the ob jectiv e in subproblem (30) with that in subproblem (31), w e introduce the function ϕ k : M → R deﬁned by ϕ k ( p ) : = g ( p ) + ∥ s k ∥ B p k ,s k ( p ) . (32) Then, in view of Assumption (H1), the function ϕ k is σ -strongly conv ex. Before pro ceeding with the analysis of Algorithm 2, it is imp erativ e to ackno wledge that the p oin t p k +1 , as a solution of (31), satisﬁes the following inequalit y: g ( p ) + ∥ s k ∥ B p k ,s k ( p ) ≥ g ( p k +1 ) + ∥ s k ∥ B p k ,s k ( p k +1 ) ∀ p ∈ M . (33) T o further adv ance the analysis of Algorithm 2, we ﬁrst e stablish that the algorithm is well deﬁned. This fundamen tal prop erty is addressed in the following prop osition. Prop osition 20. Algorithm 2 is w ell deﬁned, i.e., p k ∈ dom g for all k = 0 , 1 , . . . , and the sub- problem in (31) has a unique solution. Moreov er, if p k +1 = p k , then p k is a critical p oin t of ϕ . Pr o of. Assume that p k ∈ dom g . Using assumption (A3) w e conclude that p k ∈ in t dom h . Thus, b y [30, Theorem 3.3], w e ha v e ∂ h ( p k )  = ∅ . Let s k ∈ ∂ h ( p k ). Now we are going to show that the subproblem in (31) has a unique solution. Since, b y (H1), g : M → R is σ -strongly c on vex, it follows from Theorem 8 that g ( p ) ≥ g ( p k ) + ⟨ v , log p k p ⟩ + σ 2 d 2 ( p k , p ) , ∀ p ∈ M , ∀ v ∈ ∂ g ( p k ) . Th us, by considering ϕ k ( p ) = g ( p ) + ∥ s k ∥ B p k ,s k ( p ) and employing the last inequalit y , w e deduce ϕ k ( p ) d ( p k , p ) ≥ g ( p k ) d ( p k , p ) + D v , log p k p d ( p k , p ) E + σ 2 d ( p k , p ) + ∥ s k ∥ B p k ,s k ( p ) d ( p k , p ) , ∀ p ∈ M , ∀ v ∈ ∂ g ( p k ) . (34) Observ e that (13) yields | B p k ,s k ( p ) | ≤ d ( p k , p ), for all p ∈ M , and d ( p k , p ) = ∥ log p k p ∥ . Th us, taking the limit in (34) we obtain that lim d ( p k ,p ) → + ∞ ϕ k ( p ) /d ( p k , p ) = + ∞ . In particular, we conclude that lim d ( p k ,p ) → + ∞ ϕ k ( p ) = + ∞ . Hence, using Prop osition 4 we conclude that ϕ k has a global minimizer. Therefore, the subproblem in (31) has a global solution and, due to ϕ k b e σ -strongly con vex, the solution is unique. Consequently , there exists a unique p k +1 ∈ dom g = dom ϕ satisfying (31), whic h implies that Algorithm 2 is w ell deﬁned. T o pro ve the last statement, we assume that p k +1 = p k . Th us, (33) implies that g ( p ) ≥ g ( p k ) − ∥ s k ∥ B p k ,s k ( p ), for all p ∈ M , which b y using Theorem 15 shows that s k ∈ ∂ g ( p k ). Hence, taking into accoun t that s k ∈ ∂ h ( p k ), we conclude that s k ∈ ∂ g ( p k ) ∩ ∂ h ( p k )  = ∅ . Therefore, it follo ws from Deﬁnition 19 that p k is a critical p oin t of ϕ in problem (29). Dra wing on Prop osition 20, we establish the well-deﬁned nature of Algorithm 2, resulting in the generation of a sequence ( p k ) k ∈ N . Given that the termination condition implies the critical point of ϕ is attained in the ﬁnal iteration, we anticip ate this se quenc e wil l b e inﬁnite, an assumption adopte d henc eforth . 16 5.2 Con v ergence analysis In this section, we conduct a detailed analysis of the B-DCA, designated as Algorithm 2 under the assumptions (A1), (A2), (A3) and (H1). The theoretical results obtained herein corresp ond to those obtained for the EDCA and CR-DCA versions. Nonetheless, it is noteworth y that in the B-DCA v arian t, the utilization of the Busemann functions renders the subproblem conv ex, a notable departure from the CR-DCA version. This fundamental adv ancement simpliﬁes the pro cess of solving subproblems, given that con v ex problems hav e signiﬁcan tly low er computational complexit y than non-con vex ones. W e b egin b y demonstrating a descen t prop ert y of the algorithm and establishing that the distance b et w een consecutive iterates con verges to zero as k tends to inﬁnit y . Prop osition 21. Let ( p k ) k ∈ N b e a sequence generated by Algorithm 2. Then, it satisﬁes the inequalit y ϕ ( p k +1 ) ≤ ϕ ( p k ) − σ 2 d 2 ( p k , p k +1 ) , ∀ k ∈ N . (35) As a consequence, the sequence ( ϕ ( p k )) k ∈ N is strictly decreasing and con verges. Moreo ver, we hav e lim k → + ∞ d ( p k , p k +1 ) = 0. Pr o of. Considering that d ( p k , p k ) = 0, from (13) we ha ve B p k ,s k ( p k ) = 0. Th us, using the inequality in (33) with p = p k w e conclude that g ( p k ) ≥ g ( p k +1 ) + ∥ s k ∥ B p k ,s k ( p k +1 ) . Moreo ver, taking into accoun t that h is a σ -strongly conv ex function and s k ∈ ∂ h ( p k ), it follows from Theorem 15 that h ( p k +1 ) ≥ h ( p k ) − ∥ s k ∥ B p k ,s k ( p k +1 ) + ( σ / 2) d 2 ( p k +1 , p k ) . Thus, combining the t w o previous inequal- ities and using that ϕ = g − h , w e derive (35). T o verify the second statemen t, we ﬁrst observ e that (35) entail s the inequalit y 0 ≤ ( σ / 2) d 2 ( p k , p k +1 ) ≤ ϕ ( p k ) − ϕ ( p k +1 ) , ∀ k ∈ N . (36) Since w e are assuming that ( p k ) k ∈ N is inﬁnite, it follo ws from Prop osition 20 that p k +1  = p k . Hence, considering that σ > 0, we hav e from (36) that ϕ ( p k +1 ) < ϕ ( p k ), for all k ∈ N . Therefore, ( ϕ ( p k )) k ∈ N is strictly decreasing. F urthermore, due to (A2) implying that ( ϕ ( p k )) k ∈ N is b ounded from b elow, w e conclude that it conv erges. Since ( ϕ ( p k )) k ∈ N con verges, taking the limit in (36) we obtain that lim k → + ∞ d ( p k , p k +1 ) = 0, whic h concludes the pro of. The follo wing theorem is the main ﬁnding regarding the conv ergence behavior of Algorithm 2, detailing the limiting b eha vior of the produced sequences and establishing their relationship with critical p oin ts of the ob jectiv e function. Theorem 22. Let ( p k ) k ∈ N and ( s k ) k ∈ N b e generated by Algorithm 2. If ¯ p is a cluster p oin t of ( p k ) k ∈ N , then ¯ p ∈ dom g and there exists a cluster p oin t ¯ s of ( s k ) k ∈ N suc h that ¯ s ∈ ∂ g ( ¯ p ) ∩ ∂ h ( ¯ p ). Consequen tly , ev ery cluster p oin t of ( p k ) k ∈ N , if any , is a critical p oin t of ϕ . Pr o of. Let ¯ p ∈ M b e a cluster p oin t of the sequence ( p k ) k ∈ N . W e may assume, without loss of generalit y , that lim k → + ∞ p k = ¯ p . It follows from Prop osition 21 that ( ϕ ( p k )) k ∈ N is strictly decreasing and conv ergent. Moreov er, due to ϕ ( p 0 ) ≥ ϕ ( p k ) = g ( p k ) − h ( p k ) we hav e ϕ ( p 0 ) + h ( p k ) ≥ g ( p k ). Th us, considering that g and h are lsc, w e hav e ϕ ( p 0 ) + h ( ¯ p ) = lim inf k → + ∞ ( ϕ ( p 0 ) + h ( p k )) ≥ lim inf k → + ∞ g ( p k ) ≥ g ( ¯ p ) , whic h implies that ϕ ( p 0 ) ≥ ϕ ( ¯ p ). Th us, due to p 0 ∈ dom g = dom ϕ we conclude that ¯ p ∈ dom ϕ = dom g . Hence, using that ¯ p ∈ dom g and (A3), w e also ha ve ¯ p ∈ int dom h . By Prop osition 20 17 together with (A3), w e kno w that ( p k ) k ∈ N ∈ in t dom h , and Step 1 implies that s k ∈ ∂ h ( p k ), for all k ∈ N . Therefore, if necessary , by considering a subsequence, we can assume, without loss of generalit y , b y inv oking Prop osition 9, that lim k → + ∞ s k = ¯ s ∈ ∂ h ( ¯ p ). On the other hand, due to the p oin t p k +1 b eing a solution of (31), it satisﬁes (33), whic h considering (13) implies that g ( p ) ≥ g ( p k +1 ) − ∥ s k ∥ B p k ,s k ( p ) − ∥ s k ∥ d ( p k , p k +1 ) ∀ p ∈ M . (37) Hence, by taking the inferior limit in (37), as k go es to + ∞ , and using the facts that lim k → + ∞ p k = ¯ p , lim k → + ∞ s k = ¯ s , lim k → + ∞ d ( p k , p k +1 ) = 0, the function g is lo wer semicon tinuous, and in voking Lemma 12, we conclude that g ( p ) ≥ g ( ¯ p ) − ∥ ¯ s ∥ B ¯ p, ¯ s ( p ) , ∀ p ∈ M . Hence, it follo ws from Theorem 15 that ¯ s ∈ ∂ g ( ¯ p ). Therefore, giv en that w e already kno w ¯ s ∈ ∂ h ( ¯ p ), it follo ws that ¯ s ∈ ∂ g ( ¯ p ) ∩ ∂ h ( ¯ p ). This conﬁrms that ¯ p is a critical p oint of problem (29), thus completing the pro of. In ligh t of Prop osition 20, we can see that the quantit y d ( p k , p k +1 ) can b e interpreted as a measure of the criticality of p oin t p k . The following proposition provides a b ound on the iteration complexit y for this measure. Prop osition 23. Let ( p k ) k ∈ N b e generated b y Algorithm 2. Then, for all N ∈ N , there holds min k =0 , 1 ,...,N d ( p k , p k +1 ) ≤  2( ϕ ( p 0 ) − ϕ inf σ ( N + 1)  1 2 . Pr o of. F rom (35), we ha ve d 2 ( p k , p k +1 ) ≤ (2 /σ ) ( ϕ ( p k ) − ϕ ( p k +1 )), for all k ∈ N . Consequently , ( N + 1) min k =0 , 1 ,...,N  d 2 ( p k , p k +1 )  ≤ N X k =0 2 σ  ϕ ( p k ) − ϕ ( p k +1 )  ≤ 2 σ  ϕ ( p 0 ) − ϕ inf  , for all N ∈ N , where − ϕ inf < + ∞ is determined by (A2). Hence, the desired inequality follows. 6 Numerical exp erimen ts In this section we in vestigate the numerical p erformance of Algorithms 1 (CR–DCA) and 2 (B- DCA) on a collection of DC optimization problems p osed on the κ -h yp erbolic space H n κ and on the manifold of symmetric p ositiv e deﬁnite matrices P ( n ). In all exp erimen ts, the function g is diﬀeren tiable, and h is diﬀeren tiable; the solver uses closed-form expressions for the Riemannian gradien ts of the subproblem ob jectiv es (30) and (31). The empirical comparisons should b e read in light of the analytical guaran tees established for B-DCA in this work, in particular the geo desic con vexit y of the inner subproblem. Algorithms 1 and 2 were implemen ted in Matlab R2019a, and numerical experiments carried out in a Intel Core i5 1.8Ghz, 8 GB Ram, running MAC OS X 10.13.6. W e hav e used Manopt V ersion 8 [17], and the subproblems (30) and (31) w ere solv ed b y the trustregions solv er with its default options; among the solvers a v ailable in Manopt, this c hoice consistently exhibited the most stable b eha vior for these subproblems. The ob jectives w ere scaled by a factor γ = 1 / ( ∥ grad ϕ ( p 0 ) ∥ + 1) where p 0 ∈ M denotes the starting p oin t. As stopping criteria w e used ∥ grad ϕ ( p k ) ∥ ≤ ε or d ( p k +1 , p k ) ≤ ε , with ε = γ × 10 − 4 . Both algorithms are run with the same starting p oin t p 0 , scaling γ , stopping thresholds, and Manopt trustregions options. F or each instance w e rep ort the 18 n umber of outer iterations k , the num b er of inner iterations inn for solving the subproblems, the ratio inn / k , the ﬁnal function v alue fval , the ﬁnal scaled norm of the Riemannian gradien t grad , and, when a v ailable, the running time ( time ) in seconds. The pair ( fval , grad ) certiﬁes solution qualit y; ( k , inn , inn /k ) measures solv er workload; time summarizes the net computational cost. Diﬀerences in k should b e interpreted together with inn / k . As will b e seen throughout this section, across all rep orted tests CR–DCA and B–DCA attain comparable solution quality , as certiﬁed b y the terminal ob jective v alues fval and gradient norms grad . The workload measures k , inn , and inn / k are of the same order for b oth metho ds within eac h problem family; when diﬀerences o ccur, they are mainly in the outer count k , while inn / k remains similar. When total run time time is rep orted, it aligns with these workload indicators and shows no systematic dominance. Overall, the tables indicate that B–DCA is consisten tly competitive, matching the accuracy of CR–DCA with comparable solver eﬀort, which underscores its relev ance within the prop osed DC framew ork. The co des and data for the numerical exp erimen ts are av ailable at http://mtm.ufsc.br/ ~ douglas/downloads/BusemannDCA/ . 6.1 T est problems in h yp erb olic space Here we consider a Rosen bro c k-t yp e function in the κ -hyperb olic space form as the DC test problem. W e recall that basic results on H n κ w ere presented in Section 2.1, and prop erties of the Busemann functions in this Hadamard manifold were discussed in Example 3. Before stating the test problems, w e also show a useful result. Consider the function the function p 7→ d κ ( p, q ) := (1 / √ κ ) arcosh( − κ ⟨ p, q ⟩ ). Thus, using (5), w e hav e grad d 2 κ ( p, q ) = − 2 √ κd κ ( p, q ) p κ 2 ⟨ p, q ⟩ 2 − 1 ( q + κ ⟨ p, q ⟩ p ) , p  = q . (38) 6.1.1 DC problem with a θ -order hyperb olic Rosen bro c k-t yp e ob jective The Rosenbrock function is a classical b enc hmark in nonlinear optimization, kno wn for its nonconv ex structure and narrow, curved v alley that make it particularly challenging for iterative algorithms. Motiv ated by its role in the Euclidean setting, we introduce a family of intrinsic analogues deﬁned on hyperb olic space, which preserv e the essen tial geometric and analytical features of the original function. These hyperb olic Rosenbrock-t yp e functions are form ulated as DC functions, making them well suited for ev aluating the p erformance of DC optimization metho ds in negatively curv ed geometries. W e provide closed-form DC decomp ositions, establish the geo desic con vexit y of the comp onen ts, and explicitly characterize the global minimizers. These constructions provide a prin- cipled extension of classical b enc hmarks to non-Euclidean settings, with applications to ev aluating optimization algorithms on manifolds. With this in mind, we now presen t a detailed discussion. F or clarity , our analysis fo cuses on the h yp erb olic space H n 1 =: H n . Let θ ≥ 1 be a ﬁxed real exp onen t. Extending the classical Rosenbrock function w e in tro duce the θ -order ( a, b, θ )-family of hyperb olic Rosenbrock-t yp e function deﬁned on H n b y f ( p ) :=  a − d ( p, ¯ p ) θ  2 + b  d ( p, ¯ q ) θ − d ( p, ¯ p ) 2 θ  2 , p ∈ H n , (39) where a, b > 0 with b ≫ a ≥ 1, and ¯ p, ¯ q ∈ H n are tw o ﬁxed p oin ts satisfying the following conditions a 2 θ − a 1 θ ≤ d ( ¯ p, ¯ q ) ≤ a 2 θ + a 1 θ . (40) Observ e that f ( p ) = g ( p ) − h ( p ) for all p ∈ M , where the comp onen ts g and h are giv en by g ( p ) := a 2 + d ( p, ¯ p ) 2 θ + 2 b d ( p, ¯ q ) 2 θ + 2 b d ( p, ¯ p ) 4 θ , h ( p ) := 2 a d ( p, ¯ p ) θ + b  d ( p, ¯ p ) 2 θ + d ( p, ¯ q ) θ  2 . 19 By [46, Thm. 2.1, p. 111] we know that for any z ∈ H n and α ≥ 1 the map p 7→ d ( p, z ) α is geo desically conv ex. Hence the distance–p o wer terms d ( p, ¯ p ) θ , d ( p, ¯ p ) 2 θ , d ( p, ¯ p ) 4 θ , d ( p, ¯ q ) θ , and d ( p, ¯ q ) 2 θ are geo desically con vex. Moreov er, since t 7→ t 2 is conv ex and nondecreasing on [0 , ∞ ), the square of any nonnegative con vex function is con vex; in particular,  d ( p, ¯ p ) 2 θ + d ( p, ¯ q ) θ  2 is geo desically conv ex. By closure of geodesic con v exity under nonnegative scaling and addition, both g and h are geo desically conv ex, and therefore f = g − h is a DC function. Since f ( p ) ≥ 0 for all p ∈ H n and, b y (39), f in (39) is a sum of squares, an y p oin t p ∗ ∈ H n satisfying d ( p ∗ , ¯ p ) θ = a and d ( p ∗ , ¯ q ) θ = d ( p ∗ , ¯ p ) 2 θ = a 2 is a global minimizer and satisﬁes f ( p ∗ ) = 0. Equiv alen tly , any global minimizer p ∗ m ust satisfy d ( p ∗ , ¯ p ) = a 1 /θ , d ( p ∗ , ¯ q ) = a 2 /θ . Con versely , any p satisfying these equalities attains f ( p ) = 0 and is a global minimizer. No w, w e deﬁne the h yp erb olic sphere cen tered at ¯ p and ¯ q as follo ws S ( ¯ p, a 1 /θ ) := { p ∈ H n : d ( p, ¯ p ) = a 1 /θ } , S ( ¯ q , a 2 /θ ) := { p ∈ H n : d ( p, ¯ q ) = a 2 /θ } . Hence, every point p ∗ in the intersection S ( ¯ p, a 1 /θ ) ∩ S ( ¯ q , a 2 /θ ) is a global minimizer of f , with f ( p ∗ ) = 0. Condition (40) guaran tees that this in tersection is nonempty . F urthermore, we ha ve • If a 2 /θ − a 1 /θ < d ( ¯ p, ¯ q ) < a 2 /θ + a 1 /θ , then the in tersection contains inﬁnitely many points. Note that if n = 2 then just tw o p oin ts. • If d ( ¯ p, ¯ q ) = a 2 /θ − a 1 /θ (in ternal tangency) or d ( ¯ p, ¯ q ) = a 2 /θ + a 1 /θ (external tangency), then the in tersection consists of a single p oin t. In fact, for internal tangency , choose ¯ p int = (sinh a 1 /θ , 0 , cosh a 1 /θ ) , ¯ q int = (sinh a 2 /θ , 0 , cosh a 2 /θ ) , (41) Using the hyperb olic iden tity sinh u sinh v − cosh u cosh v = − cosh( u − v ) , we obtain that ⟨ ¯ p int , ¯ q int ⟩ = − cosh( a 2 /θ − a 1 /θ ) , so d ( ¯ p int , ¯ q int ) = arcosh(cosh( a 2 /θ − a 1 /θ )) = a 2 /θ − a 1 /θ . Th us, S ( ¯ p int , a 1 /θ ) ∩ S ( ¯ q int , a 2 /θ ) = { p ∗ } . F or external tangency , deﬁne ¯ p ext and ¯ q ext to a new p oin t ¯ p ext = (sinh a 1 /θ , 0 , cosh a 1 /θ ) , ¯ q ext = ( − sinh a 2 /θ , 0 , cosh a 2 /θ ) . (42) Then, using the h yp erbolic iden tit y sinh u sinh v + cosh u cosh v = cosh( u + v ) , we conclude that the inner pro duct is ⟨ ¯ p ext , ¯ q ext ⟩ = − sinh a 1 /θ sinh a 2 /θ − cosh a 1 /θ cosh a 2 /θ = − cosh( a 1 /θ + a 2 /θ ) , and the hyperb olic distance b ecomes d ( ¯ p ext , ¯ q ext ) = arcosh ( −⟨ ¯ p ext , ¯ q ext ⟩ ) = arcosh  cosh( a 1 /θ + a 2 /θ )  = a 1 /θ + a 2 /θ . Th us, the intersection S ( ¯ p int , a 1 /θ ) ∩ S ( ¯ q int , a 2 /θ ) reduces again to a single p oin t, i.e., S ( ¯ p ext , a 1 /θ ) ∩ S ( ¯ q ext , a 2 /θ ) = { p ∗ } . Note that for θ > 1, the function h is contin uously diﬀerentiable on H n , whereas for θ = 1, it fails to b e diﬀeren tiable at the p oints p = ¯ p ext and p = ¯ q ext . 20 Gradien t of h : F or p  = ¯ p and p  = ¯ q , using the c hain rule and (38), the gradient of h is giv en by grad h ( p ) = − 2 aθ d ( p, ¯ p ) θ − 1 ⟨ p, ¯ p ⟩ p + ¯ p p ⟨ p, ¯ p ⟩ 2 − 1 − 4 bθ  d ( p, ¯ p ) 2 θ + d ( p, ¯ q ) θ  d ( p, ¯ p ) 2 θ − 1 ⟨ p, ¯ p ⟩ p + ¯ p p ⟨ p, ¯ p ⟩ 2 − 1 − 2 bθ  d ( p, ¯ p ) 2 θ + d ( p, ¯ q ) θ  d ( p, ¯ q ) θ − 1 ⟨ p, ¯ q ⟩ p + ¯ q p ⟨ p, ¯ q ⟩ 2 − 1 . (43) Ob jectiv e functions for subproblems (30) and (31) : Let s k := grad h ( p k ) for p k  = ¯ p, ¯ q . Then, • The ob jective function for the classical subproblem (30) b ecomes ψ k ( p ) = a 2 + d ( p, ¯ p ) 2 θ + 2 b d ( p, ¯ q ) 2 θ + 2 b d ( p, ¯ p ) 4 θ − arcosh( −⟨ p k , p ⟩ ) p ⟨ p k , p ⟩ 2 − 1 ⟨ s k , p ⟩ . • The ob jective function for the Busemann subproblem (31) b ecomes ϕ k ( p ) = a 2 + d ( p, ¯ p ) 2 θ + 2 b d ( p, ¯ q ) 2 θ + 2 b d ( p, ¯ p ) 4 θ + ∥ s k ∥ ln  − D p, p k + s k ∥ s k ∥ E . In b oth cases, the smo oth conv ex comp onen t g app ears explicitly , and the second comp onen t h is “linearized” either via the exp onen tial map or the Busemann approximation. Here, in test problems with the Rosenbrock-t yp e function we used θ = 1 and n = 2. Then, we consider tw o cases for c ho osing ¯ p, ¯ q ∈ H 2 : internal tangency and external tangency . F or the in ternal tangency case, we set a = 1 and b = 100 and tak e ¯ p, ¯ q as in (41). W e rep eated the test from ﬁv e random starting p oints in H 2 . F rom T able 1, b oth CR–DCA and B–DCA displa y essen tially identical b eha vior: their outer/inner iteration coun ts coincide up to small ﬂuctuations, and the ﬁnal ob jectiv e v alues and gradien t norms agree to n umerical accuracy . This parit y indicates that the Busemann modeling in tro duces no observ able penalty in this regime, B-DCA is comp etitiv e in b oth accuracy and p er-outer eﬀort. While this problem class do es not separate the metho ds, B–DCA retains structural adv antages that can yield steadier progress on more demanding instances. CR-DCA B-DCA # k inn inn /k fv al grad k inn inn inn /k fv al grad 1 158 223 1.41 1.4E-12 9.8E-07 158 223 1.41 1.4E-12 9.8E-07 2 25 55 2.20 8.6E-02 2.8E-04 25 55 2.20 8.6E-02 2.8E-04 3 153 217 1.42 2.2E-12 9.4E-07 153 217 1.42 2.2E-12 9.4E-07 4 148 211 1.43 3.6E-12 9.3E-07 148 211 1.43 3.6E-12 9.3E-07 5 113 175 1.55 1.6E-09 1.2E-04 113 175 1.55 1.6E-09 1.2E-04 T able 1: Results for Rosenbrock-t yp e ob jectiv e: internal tangency case. F or the external tangency case, w e set a = 1 and b = 2 and take ¯ p, ¯ q as in (42). Using ﬁve random starting p oin ts in H 2 , T able 2 sho ws that B–DCA uses more outer iterations than CR–DCA, which translates in to a higher total inner eﬀort on a verage (mean k : 7 . 42 × 10 3 vs. 6 . 25 × 10 3 ; mean inn : 7 . 67 × 10 3 vs. 6 . 43 × 10 3 ). Even so, the p er-outer cost inn / k is essen tially the same in four of the ﬁv e runs (all b et ween 1 . 00 and 1 . 02 for b oth metho ds), with B–DCA slightly low er in runs 1 and 4; only run 5 is unfav orable to B–DCA (1 . 81 vs. 1 . 49), which raises its a verage (mean inn / k : 1 . 17 vs. 1 . 11). The ﬁnal ob jective v alues and gradient norms coincide to numerical precision in runs 3–4 and remain v ery close elsewhere, indicating indistinguishable solution qualit y . Overall, this table sho ws 21 that B–DCA is competitive in accuracy and per-outer eﬀort, while CR–DCA is modestly cheaper in total iteration counts on this test. CR-DCA B-DCA # k inn inn /k fv al grad k inn inn /k fv al grad 1 7853 8025 1.02 6.4E-08 4.6E-06 8454 8555 1.01 9.0E-08 5.9E-06 2 4174 4238 1.02 5.3E-07 2.8E-05 4560 4659 1.02 7.4E-07 3.6E-05 3 8989 9012 1.00 2.5E-08 2.0E-06 11300 11317 1.00 2.5E-08 2.0E-06 4 9053 9125 1.01 2.1E-08 1.6E-06 11464 11486 1.00 2.1E-08 1.6E-06 5 1193 1773 1.49 6.5E-05 1.7E-03 1299 2347 1.81 9.1E-05 2.2E-03 T able 2: Results for Rosenbrock-t yp e ob jectiv e: external tangency case. T o close this subsection, w e note that across the θ -order h yp erb olic Rosenbrock tests (b oth in ternal and external tangency , ﬁv e random starts each), CR–DCA and B–DCA attain essen tially the same solution quality , as reﬂected by the ﬁnal scaled ob jectiv e v alues and gradient norms. B– DCA tends to use more outer iterations on some instances, which can raise the total inner eﬀort, y et its p er-outer cost ( inn / k ) remains comparable and is o ccasionally smaller, consistent with the geo desically conv ex inner models (with unique minimizers) that it solv es at eac h step. In scenarios where the inner solve dominates the computational cost, this structural feature can be beneﬁcial. Ov erall, on this class of problems B–DCA is comp etitiv e and reliable in accuracy and p er-outer eﬀort, while CR–DCA is mo destly cheaper in total iteration coun ts on certain runs. 6.2 T est problems in p ositiv e deﬁnite matrices space W e no w turn to the manifold of symmetric p ositive deﬁnite matrices P ( n ). Preliminaries on P ( n ) and explicit expressions for the Busemann functions and its Riemannian gradient are recalled in Section 2.2 and Example 4. W e consider tw o set of problems on P ( n ): the ﬁrst is a syn thetic, academically orien ted example designed to isolate geometric eﬀects; the second is a practically motiv ated instance with direct application app eal. The exp erimen tal setup follows the same proto col adopted earlier. 6.2.1 An academic example This academic test, also examined in the numerical exp eriments of [13, Sec. 7.1], is designed to v alidate the implementations of CR–DCA and B-DCA in a con trolled setting with kno wn global minimizers and to compare their b eha vior under the aﬃne–in v arian t geometry of P ( n ). Because the ob jective dep ends only on log det X , the geometric eﬀects app ear solely through the subproblems via log X k and d ( · , · ), yielding a clean b enc hmark. As the results in T able 3 indicate, b oth me thods reac h the global minimum with matc hing iteration counts, and B-DCA typically ac hieves low er run time as n increases, though the trend is not strictly monotonic. It is worth emphasizing that, in this setup, b oth subproblems are con vex; the con v exity of the classical subproblem is established in [13, Example 6.1(i)]. Th us the observed adv antage of B-DCA is due to computational eﬀects, its Busemann mo del yields a well-conditioned tangen t-space solve, reducing costly ev aluations of log X k ( · ) and d ( · , · ) and enabling reuse of eigendecomp ositions across iterations, rather than to any con vexit y gap. W e ﬁrst recall some notations. Throughout, ln denotes the scalar natural logarithm and e the scalar exp onen tial. W e consider the diﬀerence-of-conv ex (DC) ob jectiv e f = g − h on P ( n ) with g ( X ) =  ln det X  4 , h ( X ) =  ln det X  2 . 22 The global minim um v alue of f is f ⋆ = − 1 4 , attained whenever ln det X = ± 1 / √ 2 (e.g., at X ⋆ = e ± 1 / ( √ 2 n ) I n ). F or the initialization we take X 0 = ln( n ) I n + e 1 e ⊤ n + e n e ⊤ 1 , where e 1 := (1 , 0 . . . ) T ∈ R n × 1 and e n := (0 , . . . , 0 , 1) T ∈ R n × 1 , and consider dimensions n ∈ { 4 , 10 , 20 , 50 , 100 } . The stopping criteria and all shared hyperparameters follo w our general exp er- imen tal proto col. Gradien t of h : On P ( n ) the Riemannian gradient of h and its norm are given b y grad h ( X ) = 2lndet( X ) X, ∥ grad h ( X ) ∥ = 2 √ n | lndet( X ) | . Ob jectiv e functions for subproblems (30) and (31) : Let X k b e the current iterate. The ob jective function of the classical subproblem (30) reads ψ k ( X ) =  ln det X  4 − 2ln det( X k )ln det X det X k , and the ob jectiv e function in subproblem (31) is given b y ϕ k ( X ) :=  ln det X  4 + ∥ grad h ( X k ) ∥ B X k , grad h ( X k ) ( X ) , where we use the explicit formula (22) in Example 4 to compute the Busemann functions B X k , grad h ( X k ) . CR-DCA B-DCA n k inn time fv al grad k inn time fv al grad 4 13 25 0.49 -0.25 7.52E-07 13 25 0.43 -0.25 7.52E-07 10 16 43 0.53 -0.25 5.09E-07 16 43 0.34 -0.25 5.09E-07 20 16 49 0.11 -0.25 1.01E-06 16 49 0.08 -0.25 1.01E-06 50 16 55 0.26 -0.25 2.13E-06 16 55 0.14 -0.25 2.13E-06 100 16 59 1.03 -0.25 3.55E-06 16 59 0.46 -0.25 3.55E-06 T able 3: Results in the academic problem for CR-DCA and B-DCA. T able 3 rep orts the outcomes for CR–DCA and B–DCA. W e observe that b oth algorithms displa y identical outer/inner iteration counts across all tested dimensions and attain the same ob jective v alue f ⋆ = − 0 . 25 to the rep orted precision, with ﬁnal gradien t norms ≤ 3 . 6 × 10 − 6 . As n increases from 4 to 100, the n um b er of outer iterations stabilizes at k = 16, while the inner iterations gro w mo derately (from 25 to 59). In terms of run time, B–DCA is consistently faster, with appro ximate reductions of 12%, 36%, 27%, 46%, and 55% for n = 4 , 10 , 20 , 50 , 100, resp ectiv ely . The adv antage tends to increase with dimension (though not strictly monotonically), indicating a lo wer per–iteration ov erhead in the Busemann-subgradient subproblem on P ( n ). 6.2.2 Con trastive learning via DC optimization in P ( n ) In this section we consider a con trastive DC optimization mo del on the manifold of symmetric p ositiv e deﬁnite matrices P ( n ). Given disjoint sets of reference p oin ts P = { ¯ X 1 , . . . , ¯ X m } ⊂ P ( n ) and N = { ¯ Y 1 , . . . , ¯ Y r } ⊂ P ( n ), the goal is to select X ∈ P ( n ) that is close to P and far from N , 23 where proximit y is measured by the aﬃne–in v ariant geo desic distance d ( · , · ) deﬁned in (7). The SPD c ontr astive obje ctive is f ( X ) := m X i =1 λ + i d 2  X , ¯ X i  − r X j =1 λ − j d 2  X , ¯ Y j  , X ∈ P ( n ) , (44) with ﬁxed weigh ts λ + i , λ − j > 0. Since X 7→ d 2 ( X , ¯ Z ) is geo desically strongly conv ex on P ( n ) for eac h ﬁxed ¯ Z , f admits the DC splitting g ( X ) := m X i =1 λ + i d 2  X , ¯ X i  , h ( X ) := r X j =1 λ − j d 2  X , ¯ Y j  , (45) so that f = g − h . This ob jectiv e enco des the contrastiv e principle: g promotes proximit y to p ositiv es, while h penalizes proximit y to negativ es. Such formulations are natural when data are represen ted b y cov ariance or k ernel matrices, e.g., in signal processing, computer vision, and Rie- mannian manifold learning; see [5, 45]. W e use this example to contrast the inner mo dels underlying CR–DCA and B–DCA on P ( n ). The ﬁrst–order mo del of h used b y CR–DCA does not, in general, preserv e geo desic conv exit y under the aﬃne–in v ariant metric. In B–DCA, h is replaced b y a Busemann-based surrogate; under our con ven tion that the Busemann functions is concav e (hence − B is conv ex), the inner ob jective is geo desically conv ex. Accordingly , the reported performance diﬀerences should be read through this con vex B–DCA v ersus nonconv ex CR–DCA contrast and its computational implications (condition- ing and reuse of eigendecomp ositions). Our ob jectives are tw ofold: (i) to v erify that C R–DCA and B–DCA compute meaningful con- trastiv e representativ es X ⋆ ∈ P ( n ) with comparable solution qualit y; and (ii) to compare their n umerical b ehavior under the aﬃne–in v ariant geometry . W e rep ort ﬁnal ob jectiv e v alues, Rieman- nian gradient norms, iteration counts, and run time, emphasizing the impact of the inner–subproblem mo del on eﬃciency . The exp erimen ts use n = 10 with unit w eights λ + i = λ − j = 1 and follo w the stopping criteria and shared hyperparameters of our general exp erimen tal proto col. Ob jectiv e functions for subproblems (30) and (31) . Let S k := grad h ( X k ), where X k ∈ P ( n ) is the current iterate. Then, the subproblem ob jectives are: • F or the classical DC subproblem (30), the ob jective b ecomes: Ψ k ( X ) := X i ∈ P λ + i d ( X , ¯ X i ) 2 − ⟨ S k , log X k X ⟩ , (46) where the inner pro duct is as in (6) and log X k X is given b y (9). • F or the Busemann-regularized DC subproblem (31), the ob jective b ecomes: Φ k ( X ) := X i ∈ P λ + i d ( X , ¯ X i ) 2 + ∥ S k ∥ B P k ,S k ( X ) , (47) where we use the explicit formula (22) in Example 4 to compute the Busemann functions B P k ,S k ( X ) 24 CR-DCA B-DCA # k inn inn/ k fv al grad k inn inn/ k fv al grad 1 7 13 1.86 0.1663 1.14E-05 8 16 2.00 0.1663 6.74E-06 2 7 14 2.00 0.1981 1.08E-05 8 15 1.88 0.1981 4.88E-06 3 7 14 2.00 0.1520 1.13E-05 8 15 1.88 0.1520 4.98E-06 4 7 14 2.00 0.1825 1.12E-05 8 16 2.00 0.1825 6.62E-06 5 7 13 1.86 0.2071 1.07E-05 8 15 1.88 0.2071 4.98E-06 6 7 14 2.00 0.2252 1.06E-05 8 15 1.88 0.2252 4.95E-06 7 7 13 1.86 0.1960 1.09E-05 8 15 1.88 0.1960 5.81E-06 8 7 14 2.00 0.1762 1.13E-05 8 16 2.00 0.1762 6.10E-06 9 7 14 2.00 0.1686 1.11E-05 8 15 1.88 0.1686 4.85E-06 10 7 13 1.86 0.1918 1.10E-05 8 15 1.88 0.1918 5.78E-06 T able 4: Results for contrastiv e learning in P (5), with m = 5 and r = 1. T able 4 summarizes ten runs with random initializations in P (5) with m = 5 and r = 1. Both metho ds attain identical ob jectiv e v alues across all restarts, conﬁrming comparable solution qualit y . A clear diﬀerence appears in stationarity: B–DCA yields strictly smaller ﬁnal Riemannian gradien t norms in ev ery trial (mean 5 . 569 × 10 − 6 ) than CR–DCA (mean 1 . 103 × 10 − 5 ), i.e., an almost t w ofold reduction. This tigh ter stationarit y can naturally en tail a sligh tly larger ov erall inner eﬀort: B–DCA uses one additional outer iteration ( k = 8 versus k = 7) and, consequently , a higher av erage total of inner iterations (means 15 . 3 versus 13 . 6). At the same time, the inner burden p er outer step is comparable, if an ything, marginally lo wer for B–DCA, as reﬂected by the av erage inn/ k ratios (1 . 9125 for B–DCA v ersus 1 . 9429 for CR–DCA) and medians (15 vs. 14). Overall, this exp eriment highligh ts B–DCA’s adv an tage of achieving stronger ﬁrst–order stationarity while k eeping the inner w orkload p er outer step essentially on par, consisten t with the con vexit y of its inner mo del. CR-DCA B-DCA # k inn inn/ k fv al grad k inn inn/ k fv al grad 1 51 91 1.78 -0.6747 2.03E-05 83 129 1.55 -0.6747 2.12E-05 2 50 91 1.82 -0.6587 2.14E-05 81 125 1.54 -0.6587 1.99E-05 3 49 87 1.78 -0.7142 2.27E-05 80 124 1.55 -0.7142 2.27E-05 4 50 90 1.80 -0.6460 1.82E-05 78 122 1.56 -0.6460 2.06E-05 5 53 96 1.81 -0.6116 1.72E-05 85 136 1.60 -0.6116 1.97E-05 6 52 94 1.81 -0.6051 1.73E-05 83 130 1.57 -0.6051 1.85E-05 7 49 85 1.73 -0.6570 1.80E-05 71 112 1.58 -0.6570 1.99E-05 8 53 97 1.83 -0.5740 1.79E-05 88 140 1.59 -0.5740 1.87E-05 9 53 96 1.81 -0.5892 1.73E-05 87 139 1.60 -0.5892 1.93E-05 10 49 88 1.80 -0.6763 2.21E-05 75 118 1.57 -0.6763 2.13E-05 T able 5: Results for contrastiv e learning in P (5), with m = 5 and r = 4. T able 5 summarizes ten runs with random initializations in P (5) with m = 5, r = 4. Both metho ds deliv er indistinguishable solution quality: ob jective v alues matc h entrywise, and the ﬁnal Riemannian gradien t norms are of the same order ( ≈ 2 × 10 − 5 ) for b oth algorithms. The principal con trast lies in eﬃciency . B–DCA requires more outer iterations (mean 81 . 1 vs. 50 . 9 for CR–DCA), but its inner subproblems are consistently easier: the inner-p er-outer ratio inn /k is strictly smaller for B–DCA in all ten trials (av erage 1 . 572 vs. 1 . 797, ab out 12 . 5% low er). The pattern is precisely what the mo deling predicts: B–DCA’s con vex Busemann-based inner model yields cheaper inner solv es (and facilitates eigendecomp osition reuse), so the extra outer iterations are oﬀset b y a reduced p er-iteration burden. 25 T o close this section, w e syn thesize the evidence across the three groups of tests: (i) the academic b enc hmark on P ( n ), (ii) the contrastiv e learning tasks on P ( n ), and (iii) the θ -order h yp erb olic Rosen bro c k problems on H 2 . Although the surrogate ob jectives in (24) and (25) are analytically distinct, they track each other closely along the observed iterates, which plausibly explains the near-iden tical solution quality ac hieved b y CR–DCA and B–DCA in all exp eriments (ﬁnal ob jectiv e v alues and ﬁrst-order stationarit y). The main practical diﬀerence lies in the structure of the inner problems: the Busemann mo del (31) is geo desically con vex, and this often translates into steadier progress and a comparable or low er p er-outer eﬀort ( inn / k ), even in cases where B–DCA uses more outer iterations. On P ( n ), the academic test shows matching iteration counts and solution quality , with B–DCA frequen tly exhibiting fav orable run time as n increases, consistent with concentrating the geometry in w ell-conditioned tangen t-space solves (fewer exp ensiv e ev aluations of log X k ( · ) and d ( · , · ) and b etter reuse of eigendecomp ositions). In the contrastiv e learning exp erimen ts, the metho ds again deliv er essen tially the same accuracy; B–DCA often requires more outer steps but displa ys lo wer inn / k and more regular progress across restarts, an adv an tage that b ecomes relev ant when inner solves dominate the computational budget. F or the hyperb olic Rosenbr o ck family , the internal tangency tests show virtually identical b ehavior, and the external tangency tests are mixed: CR–DCA is sometimes c heap er in total counts, whereas the p er-outer cost remains comparable and the attained solutions are indistinguishable in quality , so B–DCA remains comp etitiv e. T aken together, these results presen t CR–DCA as a strong baseline and B–DCA as a comp etitive alternativ e that can b e particularly attractiv e when (i) inner subproblem cost dominates runtime, (ii) n umerical stabilit y of the inner model matters, or (iii) one wishes to exploit the geo desic conv exit y in (31) for steadier progress. Beyond these exp erimen ts, the Busemann mo deling and the side-by-side comparison of (24) and (25) introduce new ideas into DC optimization on manifolds and suggest a ven ues to reﬁne conv ergence theory for DCA-type schemes. 7 Conclusions In summary , this pap er in vestigates fundamental prop erties of Busemann functions on Hadamard manifolds and highlights their role in the design and analysis of optimization algorithms on Rie- mannian manifolds. This is ac hieved through the introduction of a nov el Busemann-based char- acterization of the classical sub diﬀeren tial in Riemannian optimization. The prop osed approach addresses challenges arising from noncon vex subproblem functions on Hadamard manifolds, which commonly o ccur in classical diﬀerence-of-conv ex algorithms. By replacing the inner pro duct with Busemann functions, the resulting reform ulation guaran tees the strong con vexit y of the subprob- lem functions, thereby improving b oth optimization p erformance and algorithmic eﬃciency in the Hadamard manifold setting. W e are conﬁdent that the characterization dev elop ed here for DC problems will also b e useful in broader areas of contin uous optimization. F or instance, in an up coming pap er, we in tend to employ it to dev elop algorithms utilizing the Bregman distance concept. T o illustrate, let us explore ho w we can establish the Bregman distance concept using the Busemann functions, which forms the basis of our discussion: Let M b e a Hadamard manifold and S ⊆ M b e an op en and con vex set and ¯ S its closure. Consider a pr op er c onvex r e al function ψ : ¯ S → R ∪ { + ∞} , whic h is diﬀerentiable on the set S , and let D ψ ( · , · ) : ¯ S × S → R ∪ { + ∞} b e a function asso ciated to ψ deﬁned b y D ψ ( p, q ) := ψ ( p ) − ψ ( q ) + ∥ grad ψ ( q ) ∥ B q , grad ψ ( q ) ( p ) , (48) where B q , grad ψ ( q ) is the Busemann functions with a base p oin t q ∈ M and asso ciated direction grad ψ ( q ) ∈ T q M . T o introduce the subsequent deﬁnition, let us deﬁne the notation for the p artial 26 level sets of D ψ as follows: for an y α ∈ R , consider L 1 ( α, q ) := { p ∈ ¯ S : D ψ ( p, q ) ≤ α } , L 2 ( p, α ) := { q ∈ S : D ψ ( p, q ) ≤ α } . The extension of the Bregman distance concept to Hadamard manifolds utilizing a Busemann functions is as follows: Deﬁnition 24. The function ψ is called a Br e gman-Busemann function and D ψ a Br e gman- Busemann distanc e induc e d by ψ if the follo wing conditions hold: ( i ) ψ is contin uously diﬀerentiable on S ; ( ii ) ψ is strictly conv ex and contin uous on ¯ S ; ( iii ) F or all α ∈ R the partial lev el sets L 1 ( α, q ) and L 2 ( p, α ) are b ounded for every q ∈ S and p ∈ ¯ S , resp ectively . ( iv ) If ( q k ) k ∈ N ⊂ S con verges to ¯ q then D ψ ( ¯ q , q k ) conv erges to 0. ( v ) If ( p k ) k ∈ N ⊂ ¯ S and ( q k ) k ∈ N ⊂ S are sequences such that ( p k ) k ∈ N is bounded, lim q k = ¯ q , and lim k →∞ D ψ ( p k , q k ) = 0, then lim p k = ¯ q . W e pro ceed with some comments regarding Deﬁnition 24. The set S is referred to as the zone of ψ . Utilizing (13), we deduce that ( iv ) and ( v ) hold when ¯ q ∈ S , as a consequence of (i), (ii), and (iii), thereby necessitating their veriﬁcation solely at p oin ts on the b oundary ∂ S of S . In the follo wing prop osition, we demonstrate that the Bregman-Busemann distance is indeed a conv ex distance in the ﬁrst co ordinate. Prop osition 25. Let ψ b e a Bregman-Busemann function with zone S and D ψ b e the Bregman- Busemann distance induced by ψ . Then, the following statements hold: (i) D ψ ( p, q ) ≥ 0, for all p ∈ ¯ S and q ∈ S ; (ii) D ψ ( · , q ) : ¯ S → R ∪ { + ∞} is strictly con vex, for all q ∈ S ; Pr o of. If ψ and D ψ satisfy Deﬁnition 24, then it follows from Theorem 15 that function (48) satisﬁes item ( i ). It follows from Lemma 10 that B q , grad ψ ( q ) is conv ex. Thus, by using item ( ii ) of Deﬁnition 24 the pro of of item ( ii ) follows. The Bregman distance, in tro duced as a fundamental concept in [19], has b een extensively studied in the Euclidean con text, as evidenced b y v arious references including [6, 22, 27, 34]. The idea of dev eloping the Bregman-Busemann distance is inspired b y Example 1, which illustrates how our Deﬁnition 24 serves as a natural extension of the established concept of the Bregman distance in Euclidean spaces. F urthermore, it is noteworth y that the concept of Bregman functions has b een previously in tro duced in the con text of Hadamard manifolds, utilizing the function (24) in its deﬁnition, as men tioned in [40, 39]. How ev er, this deﬁnition results in a Bregman function that lac ks con vexit y in the ﬁrst co ordinate. This limitation has been ackno wledged in the pap ers [40, 39], where it led to erroneous outcomes, thereby restricting its applicability . It is imp ortan t to highlight that the introduction of the Bregman-Busemann distance in Deﬁnition 24 addresses this limitation, as demonstrated in Prop osition 25. 27 8 App endix App endix 1. Busemann functions on hyb erb olic sp ac e in Example 3: T o simplify the notations we set α ( t ) := − κ  p, exp q ( tv )  with v  = 0. Since ⟨ p, q ⟩ ≤ − 1 /κ , for all p, q ∈ H n κ , w e hav e α ( t ) ≥ 1. Th us, arcosh( α ( t )) = ln  α ( t ) + p α ( t ) 2 − 1  , or equiv alently , arcosh( α ( t )) = ln ( α ( t )(1 + β ( t ))) , where β ( t ) :=  1 − (1 /α ( t )) 2  1 / 2 . (49) By using (4) we obtain that d κ ( p, exp q ( tv )) = (1 / √ κ )arcosh( α ( t )). Hence, taking into account that (1 / √ κ ) ln e − √ κ ∥ v ∥ t = −∥ v ∥ t , it follo ws from (49) that d κ ( p, exp q ( tv )) − ∥ v ∥ t = 1 √ κ ln  e − √ κ ∥ v ∥ t α ( t ) (1 + β ( t ))  . (50) Due to α ( t ) := − κ  p, exp q ( tv )  , b y the deﬁnitions of cosh and sinh we obtain that α ( t ) = − κ 1 2  e √ κ ∥ v ∥ t + e − √ κ ∥ v ∥ t  ⟨ p, q ⟩ − √ κ 1 2  e √ κ ∥ v ∥ t − e − √ κ ∥ v ∥ t  1 ∥ v ∥ ⟨ p, v ⟩ = − 1 2 e √ κ ∥ v ∥ t  κ  1 + e − 2 √ κ ∥ v ∥ t  ⟨ p, q ⟩ + √ κ  1 − e − 2 √ κ ∥ v ∥ t  1 ∥ v ∥ ⟨ p, v ⟩  . Hence, we ha ve lim t → + ∞ α ( t ) 2 = + ∞ , lim t →∞ e − √ κ ∥ v ∥ t α ( t ) = − 1 2  κ ⟨ p, q ⟩ + √ κ 1 ∥ v ∥ ⟨ p, v ⟩  and lim t →∞ β ( t ) = 1. Thus, we conclude from (50) that lim t → + ∞  d  p, exp q ( tv )  − ∥ v ∥ t  = 1 √ κ ln  −  p, κ q + √ κ 1 ∥ v ∥ v  . Therefore, the last equality together with (12) gives (20) the desired equality . No w, w e are going to compute the gradient of Busemann functions. It follows from (5) that grad B q ,v ( p ) := Pro j κ p J B ′ q ,v ( p ) = J B ′ q ,v ( p ) + κ  J B ′ q ,v ( p ) , p  p. (51) T o simplify the notations w e set B q ,v ( p ) := (1 / √ κ ) ln( η ( p )), where η ( p ) := − D p, κ q + √ κ 1 ∥ v ∥ v E . Th us, taking the Euclidean deriv ative we ha ve B ′ q ,v ( p ) = 1 √ κ η ′ ( p ) η ( p ) , η ′ ( p ) = − J  κ q + √ κ 1 ∥ v ∥ v  . (52) Substituting the last equalities (52) into (51), after some algebraic manipulations, we obtain that grad B q ,v ( p ) = 1 √ κ 1 η ( p )  J η ′ ( p ) + κ  J η ′ ( p ) , p  p  . Substituting (52) in to the last equalit y and considering that JJ = I , w e obtain (21). Note that some calculations sho w that (21) implies that ∥ grad B q ,v ( p ) ∥ = 1, as stated in Lemma 10. Particularly , if p = q , then due to ⟨ q , q ⟩ = − 1 /κ and ⟨ p, v ⟩ = 0, the ﬁnal equation simpliﬁes to grad B q ,v ( q ) = − 1 √ κ  κ q + √ κ 1 ∥ v ∥ v − κq  = − 1 ∥ v ∥ v , whic h is in accordance with the last statement of Lemma 10. 28 App endix 2. W e now presen t the detailed computation of the explicit formula giv en in Example 4 for the Busemann functions and its gradient on the manifold of symmetric positive deﬁnite matrices. It is w orth noting that this computation do es not require any prior knowledge of the theory of symmetric spaces. T o this end, we ﬁrst pro ve tw o auxiliary lemmas. Lemma 26. T ake X ∈ P ( n ) and V ∈ S ( n ). If X comm utes with V , then B I ,V ( X ) = − 1 ∥ V ∥ tr ( V Log ( X )) . Pr o of. Since X commutes with V , applying Prop osition 13 and equation (7), and after some alge- braic manipulations, we obtain B I ,V ( X ) = 1 2 ∥ V ∥ lim t → + ∞ d 2 (Exp( tV ) , X ) − ( t ∥ V ∥ ) 2 t = 1 2 ∥ V ∥ lim t → + ∞ 1 t tr  Log( X − 1 / 2 Exp( tV ) X − 1 / 2 )  2 − ( tV ) 2  = 1 2 ∥ V ∥ lim t → + ∞ 1 t tr  Log( X − 1 / 2 Exp( tV ) X − 1 / 2 ) − tV  Log( X − 1 / 2 Exp( tV ) X − 1 / 2 ) + tV  . Since X comm utes with V , w e ha ve Log( X − 1 / 2 Exp( tV ) X − 1 / 2 ) = Log(Exp( tV )) − Log( X ). Th us, the last equality can b e written as B I ,V ( X ) = 1 2 ∥ V ∥ lim t → + ∞ 1 t tr ([Log (Exp( tV )) − Log ( X ) − tV ] [Log (Exp( tV )) − Log ( X ) + tV ]) = 1 2 ∥ V ∥ lim t → + ∞ 1 t tr ([ − Log ( X )] [2 tV − Log ( X )]) = − 1 ∥ V ∥ tr ( V Log ( X )) , whic h is the desired equalit y . Let L b e the low er triangular matrix given in Example 4. F or our analysis it is con v enient to use the decomp osition of the form L = W Z , with W :=      I n 1 0 · · · 0 W 21 I n 2 · · · 0 . . . . . . . . . . . . W k 1 W k 2 · · · I n k      Z :=      L n 1 0 · · · 0 0 L n 2 · · · 0 . . . . . . . . . . . . 0 0 · · · L n k      . (53) Lemma 27. Let D and L matrices as deﬁned in Example 4, and W and Z matrices as deﬁned in (53). Thus, the follo wing equalities hold: (i) lim t → + ∞ d  W − 1 Exp( tD )[ W T ] − 1 , Exp( tD )  = 0; (ii) lim t → + ∞ d  L − 1 Exp ( tD )  L T  − 1 , Exp ( tD )  = d ( I , Z Z T ). Pr o of. W e prov e item (ii) only , as item (i) is analogous. The second part of Lemma 5 implies that d  L − 1 Exp ( tD )  L T  − 1 , Exp ( tD )  = d  I , Exp − 1 / 2 ( tD ) L Exp( tD ) L T Exp − 1 / 2 ( tD )  . (54) T o simplify the notation set A ( t ) := Exp − 1 / 2 ( tD ) L Exp 1 / 2 ( tD ). Note that A ( t ) can b e written as A ( t ) :=       L n 1 0 · · · 0 e t ( λ 1 − λ 2 ) 2 L 21 L n 2 · · · 0 . . . . . . . . . . . . e t ( λ 1 − λ k ) 2 L k 1 e t ( λ 2 − λ k ) 2 L k 2 · · · L n k       . 29 Th us, equality (54) is equiv alent to d  L − 1 Exp ( tD )  L T  − 1 , Exp ( tD )  = d  I , A ( t ) A ( t ) T  . Since λ 1 < · · · < λ k , we can conclude that lim t → + ∞ A ( t ) = Z , whic h completes the proof. Busemann functions on P ( n ) in Example 4: Let the sp ectral and Cholesky factorizations b e Y − 1 / 2 V Y − 1 / 2 = U DU T , U T Y − 1 / 2 X Y − 1 / 2 U = LL T = W Z Z T W T , (55) with L = W Z . Since exp Y ( tV ) = Y 1 / 2 Exp  tY − 1 / 2 V Y − 1 / 2  Y 1 / 2 = Y 1 / 2 U Exp( tD ) U T Y 1 / 2 , Lemma 5 and (55) imply that d  exp Y ( tV ) , X  = d  Exp( tD ) , U T Y − 1 / 2 X Y − 1 / 2 U  = d  Exp( tD ) , W Z Z T W T  = d  W − 1 Exp( tD )[ W T ] − 1 , Z Z T  . (56) On the other hand, by applying the triangle inequality , w e obtain d  Exp( tD ) , Z Z T  − d  W − 1 Exp( tD )[ W T ] − 1 , Exp( tD )  ≤ d  W − 1 Exp( tD )[ W T ] − 1 , Z Z T  (57) ≤ d  Exp( tD ) , Z Z T  + d  W − 1 Exp( tD )[ W T ] − 1 , Exp( tD )  . Adding −∥ tV ∥ to ev ery term in (57) and using item (i) of Lemma 27 w e obtain that lim t →∞  d ( W − 1 Exp( tD )[ W T ] − 1 , Z Z T ) − ∥ tV ∥  = lim t →∞  d (Exp( tD ) , Z Z T ) − ∥ tV ∥  . Hence, by (12) and (56), and using the fact that ∥ tV ∥ = ∥ tD ∥ I , we hav e B Y ,V ( X ) = lim t →∞  d (exp Y ( tV ) , X ) − ∥ tV ∥  = lim t →∞  d (Exp( tD ) , Z Z T ) − ∥ tD ∥ I  = B I ,D ( Z Z T ) . Th us, since Z Z T comm utes with D , Lemma 26 gives B Y ,V ( X ) = − 1 ∥ D ∥ I tr  D Log( Z Z T )  . (58) T o conclude the pro of, it remains to show that the right-hand side ab o ve coincides with the righ t- hand side of (22). F or that, ﬁrst observ e that D Log  Z Z T  =      λ 1 Log  L n 1 L T n 1  0 · · · 0 0 λ 2 Log  L n 2 L T n 2  · · · 0 . . . . . . . . . . . . 0 0 · · · λ k Log  L n k L T n k       , and therefore tr  D Log( Z Z T )  = k X i =1 λ i tr  Log( L n i L T n i )  = k X i =1 λ i ln  det  L n i L T n i   = 2 k X i =1 λ i α i X j = α i − 1 +1 ln  ( L ) j j  . Com bining the last equalit y with (58) and taking into accoun t that ∥ D ∥ I =  n 1 λ 2 1 + · · · + n k λ 2 k  1 / 2 , w e obtain the desired equalit y . 30 Gr adient of Busemann functions on P ( n ) in Example 4: Since ¯ U := L T [ X − 1 / 2 Y 1 / 2 U ] T is an or- thogonal matrix, it follows from (6), (8), and (9) that ⟨ log X (exp Y ( tV )) , K ⟩ X = tr  X − 1 / 2 Log  X − 1 / 2 Y 1 / 2 Exp  tU DU T  Y 1 / 2 X − 1 / 2  X − 1 / 2 K  = tr  Log   X − 1 / 2 Y 1 / 2 U  Exp ( tD )  X − 1 / 2 Y 1 / 2 U  T  X − 1 / 2 K X − 1 / 2  = tr  Log  ¯ U T L − 1 Exp ( tD )  L T  − 1 ¯ U  X − 1 / 2 K X − 1 / 2  = tr  ¯ U T Log  L − 1 Exp ( tD )  L T  − 1  ¯ U X − 1 / 2 K X − 1 / 2  = tr  Log  L − 1 Exp ( tD )  L T  − 1  ¯ U X − 1 / 2 K X − 1 / 2 ¯ U T  for all K ∈ S ( n ). Substituting K := − (1 / ∥ D ∥ I )  X 1 / 2 ¯ U T D ¯ U X 1 / 2  in to the ab ov e equality and p erforming some algebraic simpliﬁcations, w e obtain ⟨ log X (exp Y ( tV )) , K ⟩ X = − 1 ∥ D ∥ I tr  Log  L − 1 Exp ( tD )  L T  − 1  D  . Applying this identit y , together with further algebraic manipulations and the Cauc hy–Sc h warz inequalit y , w e conclude that lim t →∞ 1 t ⟨ log X (exp Y ( tV )) , K ⟩ X = 1 ∥ D ∥ I lim t →∞ 1 t tr h tD − Log  L − 1 Exp ( tD )  L T  − 1 i D  − ∥ D ∥ I ≤ 1 ∥ D ∥ I lim t →∞ 1 t    tD − Log  L − 1 Exp ( tD )  L T  − 1     I ∥ D ∥ I − ∥ D ∥ I = lim t →∞ 1 t    log I (Exp ( tD )) − log I  L − 1 Exp ( tD )  L T  − 1     I − ∥ D ∥ I . Using (3) and (9) together with item (ii) of Lemma 27 we obtain that lim t →∞ 1 t    log I (Exp ( tD )) − log I  L − 1 Exp ( tD )  L T  − 1     I ≤ lim t →∞ 1 t d  Exp ( tD ) , L − 1 Exp ( tD )  L T  − 1  = 0 . Hence, it follows from t wo last inequalities that lim t →∞ 1 t ⟨ log X (exp Y ( tV )) , K ⟩ X ≤ −∥ D ∥ I . Therefore, applying Lemma 10 and taking into accoun t that ∥ V ∥ = ∥ D ∥ I and ∥ grad B Y ,V ( X ) ∥ X = 1 = ∥ K ∥ X , we obtain ∥ grad B Y ,V ( X ) − K ∥ 2 = ∥ grad B Y ,V ( X ) ∥ 2 X − 2 ⟨ grad B Y ,V ( X ) , K ⟩ X + ∥ K ∥ 2 X = 2 − 2 ⟨ grad B Y ,V ( X ) , K ⟩ X = 2 + 2 ∥ D ∥ I lim t →∞ 1 t ⟨ log X (exp Y ( tV )) , K ⟩ X ≤ 0 , whic h completes the pro of of (23), since ∥ D ∥ I =  n 1 λ 2 1 + · · · + n k λ 2 k  1 / 2 . 31 References [1] P .-A. Absil, R. Mahony , and R. Sepulchre. Optimization algorithms on matrix manifolds . Princeton Univ ersity Press, Princeton, NJ, 2008. With a foreword by P aul V an Dooren. [2] F. J. Arag´ on Artacho and P . T. V uong. The b oosted diﬀerence of conv ex functions algorithm for nonsmo oth functions. SIAM J. Optim. , 30(1):980–1006, 2020. [3] W. Ballmann. L e ctur es on Sp ac es of Nonp ositive Curvatur e , volume 25 of DMV Seminar . Birkh¨ auser, Basel, 1995. [4] W. Ballmann, M. Gromo v, and V. Schroeder. Manifolds of nonp ositive curvatur e , v olume 61 of Pr o g. Math. Birkh¨ auser, Cham, 1985. [5] Y. Bao, R. W ang, T. Xu, X. W u, and J. Kittler. SymCL: Riemannian con trastive learning on the symmetric p ositiv e deﬁnite manifold for visual classiﬁcation, 2024. [6] H. H. Bausc hke and J. M. Borwein. Legendre functions and the metho d of random Bregman pro jections. J. Convex Anal. , 4(1):27–67, 1997. [7] R. Benedetti and C. P etronio. L e ctur es on hyp erb olic ge ometry . Univ ersitext. Springer-V erlag, Berlin, 1992. [8] G. C. Ben to, O. P . F erreira, and P . R. Oliv eira. Local con vergence of the proximal p oin t metho d for a special class of noncon vex functions on Hadamard manifolds. Nonline ar A nal., The ory Metho ds Appl., Ser. A, The ory Metho ds , 73(2):564–572, 2010. [9] G. C. Bento, O. P . F erreira, and P . R. Oliveira. Proximal p oint metho d for a sp ecial class of noncon vex functions on Hadamard manifolds. Optimization , 64(2):289–319, 2015. [10] G. d. C. Bento, J. a. Cruz Neto, and I. D. L. Melo. F enchel conjugate via Busemann function on Hadamard manifolds. Appl. Math. Optim. , 88(3):Paper No. 83, 29, 2023. [11] G. d. C. Ben to, J. X. Cruz Neto, and ´ I. D. L. Melo. Combinatorial con v exity in Hadamard manifolds: existence for equilibrium problems. J. Optim. The ory Appl. , 195(3):1087–1105, 2022. [12] G. d. C. Ben to, J. X. C. Neto, J. O. Lop es, ´ I. D. L. Melo, and P . da Silv a Filho. A new approac h ab out equilibrium problems via Busemann functions. J. Optim. The ory Appl. , 200(1):428–436, 2024. [13] R. Bergmann, O. F erreira, E. M. Santos, and J. C. O. Souza. The diﬀerence of con vex algorithm on Hadamard manifolds. J. Optim. The ory Appl. , 201(1):221–251, 2024. [14] C. Bonet, L. Chap el, L. Drumetz, and N. Court y . Hyp erb olic sliced-Wasserstein via geo desic and horospherical pro jections. In T op olo gic al, A lgebr aic and Ge ometric L e arning Workshops 2023 , pages 334–370. PMLR, 2023. [15] C. Bonet, B. Mal´ ezieux, A. Rakotomamonjy , L. Drumetz, T. Moreau, M. Kow alski, and N. Court y . Sliced-Wasserstein on symmetric positive deﬁnite matrices for m/eeg signals. In International Confer enc e on Machine L e arning , pages 2777–2805. PMLR, 2023. [16] N. Boumal. An intr o duction to optimization on smo oth manifolds . Cam bridge Univ ersity Press, Cam bridge, 2023. 32 [17] N. Boumal, B. Mishra, P .-A. Absil, and R. Sepulchre. Manopt, a Matlab to olb o x for optimiza- tion on manifolds. Journal of Machine L e arning R ese ar ch , 15(42):1455–1459, 2014. [18] N. Bourbaki. Gener al T op olo gy . Springer Berlin Heidelb erg, 1955. [19] L. M. Bregman. The relaxation metho d of ﬁnding the common point of con vex sets and its application to the solution of problems in con vex programming. Zh. Vychisl. Mat. Mat. Fiz. , 7:620–631, 1967. [20] M. R. Bridson and A. Haeﬂiger. Metric sp ac es of non-p ositive curvatur e , volume 319 of Grund lehr en der mathematischen Wissenschaften [F undamental Principles of Mathematic al Scienc es] . Springer-V erlag, Berlin, 1999. [21] H. Busemann. The ge ometry of ge o desics , v olume 6 of Pur e Appl. Math., A c ademic Pr ess . Academic Press, New Y ork, NY, 1955. [22] G. Chen and M. T eb oulle. Conv ergence analysis of a proximal-lik e minimization algorithm using Bregman functions. SIAM J. Optim. , 3(3):538–543, 1993. [23] C. Criscitiello and J. Kim. Horospherically conv ex optimization on Hadamard manifolds. Part I: Analysis and algorithms. arXiv , 2505:16970, 2025, 2505.16970. Av ailable at https://arxiv. org/abs/2505.16970 . [24] J. X. da Cruz Neto, O. P . F erreira, and L. R. Lucambio P´ erez. Monotone p oin t-to-set v ector ﬁelds. Balkan J. Ge om. Appl. , 5(1):69–79, 2000. [25] J. X. da Cruz Neto, O. P . F erreira, and L. R. Lucam bio P´ erez. Contributions to the study of monotone vector ﬁelds. A cta Math. Hung. , 94(4):307–320, 2002. [26] W. de Oliveira. Sequen tial diﬀerence-of-conv ex programming. J. Optim. The ory Appl. , 186(3):936–959, 2020. [27] A. R. de Pierro and A. N. Iusem. A relaxed version of Bregman’s metho d for con vex program- ming. J. Optim. The ory Appl. , 51:421–440, 1986. [28] M. P . do Carmo. Riemannian ge ometry . Mathematics: Theory & Applications. Birkh¨ auser Boston Inc., Boston, MA, 1992. T ranslated from the second Portuguese edition by F rancis Flahert y . [29] O. P . F erreira, M. S. Louzeiro, and L. F. Pruden te. Iteration-complexity of the subgradien t metho d on Riemannian manifolds with lo wer bounded curv ature. Optimization , 68(4):713–729, 2019. [30] O. P . F erreira and P . R. Oliveira. Proximal p oin t algorithm on Riemannian manifolds. Opti- mization , 51(2):257–270, 2002. [31] M. Ghadimi Atigh, M. Keller-Ressel, and P . Mettes. Hyp erb olic Busemann learning with ideal protot yp es. A dvanc es in Neur al Information Pr o c essing Systems , 34:103–115, 2021. [32] A. Go odwin, A. S. Lewis, G. L´ opez-Acedo, and A. Nicolae. A subgradient splitting algorithm for optimization on nonp ositively curved metric spaces. arXiv , 2412:06730, 2024, 2412.06730. Av ailable at . 33 [33] H. Hirai. Con vex analysis on Hadamard spaces and scaling problems. F oundations of Compu- tational Mathematics , pages 1–38, 2023. [34] K. C. Kiwiel. Proximal minimization metho ds with generalized Bregman functions. SIAM J. Contr ol Optim. , 35(4):1142–1168, 1997. [35] A. Krist´ aly , C. Li, G. L´ opez-Acedo, and A. Nicolae. What do ‘conv exities’ imply on Hadamard manifolds? J. Optim. The ory Appl. , 170(3):1068–1074, 2016. [36] A. S. Lewis, G. Lop ez-Acedo, and A. Nicolae. Horoballs and the subgradient metho d. arXiv pr eprint arXiv:2403.15749 , 2024. [37] C. Li, B. S. Mordukhovic h, J. W ang, and J.-C. Y ao. W eak sharp minima on Riemannian manifolds. SIAM J. Optim. , 21(4):1523–1560, 2011. [38] Y. E. Nesterov and M. J. T o dd. On the Riemannian geometry deﬁned b y self-concordan t barriers and interior-point metho ds. F ound. Comput. Math. , 2(4):333–361, 2002. [39] E. A. Papa Quiroz. An extension of the proximal point algorithm with Bregman distances on Hadamard manifolds. J. Glob. Optim. , 56(1):43–59, 2013. [40] E. A. P apa Quiroz and P . R. Oliveira. Proximal p oint metho ds for quasicon vex and con v ex functions with Bregman distances on Hadamard manifolds. J. Convex A nal. , 16(1):49–69, 2009. [41] T. Rap cs´ ak. Smo oth nonline ar optimization in R n , volume 19 of Nonc onvex Optimization and its Applic ations . Kluw er Academic Publishers, Dordrec ht, 1997. [42] J. G. Ratcliﬀe. F oundations of hyp erb olic manifolds , v olume 149 of Gr aduate T exts in Mathe- matics . Springer, Cham, third edition, 2019. [43] T. Sak ai. Riemannian ge ometry , volume 149 of T r anslations of Mathematic al Mono gr aphs . American Mathematical So ciet y , Pro vidence, RI, 1996. T ranslated from the 1992 Japanese original b y the author. [44] P . D. T ao and S. El Bernoussi. Algorithms for solving a class of noncon vex optimiza- tion problems. Metho ds of subgradien ts. F ermat days 85: Mathematics for optimization, T oulouse/F rance 1985, North-Holland Math. Stud. 129, 249-271 (1986)., 1986. [45] A. Tibermacine, I. E. Tib ermacine, M. Zouai, and A. Rab ehi. EEG classiﬁcation using con- trastiv e learning and Riemannian tangen t space representations. In 2024 International Con- fer enc e on T ele c ommunic ations and Intel ligent Systems (ICTIS) , pages 1–7, 2024. [46] C. Udri¸ ste. Convex functions and optimization metho ds on Riemannian manifolds , volume 297 of Mathematics and its Applic ations . Kluw er Academic Publishers Group, Dordrech t, 1994. [47] X. W ang, C. Li, J. W ang, and J.-C. Y ao. Linear conv ergence of subgradient algorithm for con vex feasibilit y on Riemannian manifolds. SIAM J. Optim. , 25(4):2334–2358, 2015. 34

A subdifferential characterization via Busemann functions and applications to DC optimization on Hadamard manifolds

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment