The geometry of Stein's method of moments: A canonical decomposition via score matching

The geometry of Stein’s metho d of momen ts: A canonical decomp osition via score matc hing Mitsuki Nagai ∗ Keisuk e Y ano † Abstract In this pap er, we elucidate the geometry of Stein’s method of moments (SMoM). SMoM is a parameter estimation method based on the Stein op erator, and yields a wide class of estimators that do not dep end on the normalizing constant. W e present a canonical decomp osition of an SMoM estimator after cen tering the score matching estimator, whic h sheds light on the cen tral role of the score matc hing within the SMoM framew ork. Using this decomposition, w e construct an SMoM estimator that impro ves upon the score matching estimator in the asymptotic v ariance. W e also discuss the connection b et ween SMoM and the W asserstein geometry . Sp eciﬁcally , using the W asserstein score function, we provide a geometrical interpretation of the gap in the asymptotic v ariance b et ween the score matching estimator and the maximum lik eliho od estimator. F urthermore, it is shown that the score matc hing estimator is asymptotically eﬃcien t if and only if the Fisher score functions span the same space as the W asserstein score functions. 1 In tro duction W e consider estimation of statistical models whose probability densit y q θ has the form q θ ( x ) = 1 Z ( θ ) ˜ q θ ( x ) , where ˜ q θ is a non-normalized densit y , and Z ( θ ) : = R ˜ q θ ( x ) d x . Suc h mo dels are kno wn as non- normalized mo dels or energy-based mo dels, and app ear in v arious ﬁelds (cf. LeCun et al., 2006; Song & Kingma, 2021). Although ﬂexible, non-normalized mo dels often in volv e a computation- ally in tractable normalizing constant Z ( θ ), and thus the maximum likelihoo d estimator (MLE) is computationally infeasible for non-normalized mo dels. Hyv¨ arinen (2005) introduced score matching as an estimation metho d for non-normalized mo d- els. Score matc hing estimates the parameter b y minimizing the distance b et w een the score function ∇ x log q θ and the true one ∇ x log q θ ⋆ . This pro cedure do es not dep end on the normalizing constant or the true score function ∇ x log q θ ⋆ since the minimization problem can b e rewritten in a form that a voids ∇ x log q θ ⋆ using in tegration by parts. Recen tly , Ebner et al. (2025) prop osed y et another approach for non-normalized models, Stein’s metho d of moments (SMoM). This approach utilizes a Stein operator A θ , an op erator satisfying the identit y E θ [ A θ f ] = 0 for a class of test functions f on R p . In this pap er, we allo w test functions ∗ The Graduate Univ ersity for Adv anced Studies, SOKENDAI, e-mail: nagai.mitsuki@ism.ac.jp † The Institute of Statistical Mathematics, e-mail: yano@ism.ac.jp 1 f θ that may dep end on θ . Since the Stein op erator is not unique, w e need to choose an appropriate one, and emplo y the divergence-based Stein operator (Mijoule et al., 2023): A θ f θ : = ∇ x · ( q θ f θ ) q θ , f θ : R p → R p . Under the b oundary condition lim ∥ x ∥→∞ q θ ( x ) f θ ( x ) = 0, we hav e E θ [ A θ f θ ] = 0. This Stein op- erator does not dep end on the normalizing constan t, implying that SMoM yields a wide class of normalizing-constan t-free estimators. In this pap er, we provide a geometric framew ork for analyzing SMoM on the basis of score matc hing. W e derive the asymptotic linear represen tation of SMoM estimators after cen tering the score matching estimator (Theorem 1). The representation reveals that, within the framework of SMoM, the relev an t orthogonal structure arises from the space of test functions, rather than from the space of estimating functions (e.g., Bick el et al., 1993). Using the diﬀerence of orthogonalit y b et w een these spaces, we construct an SMoM estimator that improv es up on the score matc hing estimator in the asymptotic v ariance (Theorem 2), and sho w the condition for the asymptotic eﬃciency of the score matc hing estimator (Theorem 3). One intriguing asp ect of our construction is its connection to the W asserstein geometry (e.g., Otto & Villani, 2000; Otto, 2001; Li & Zhao, 2023; Amari & Matsuda, 2024; Ay, 2025; Nishimori & Matsuda, 2025; T rillos et al., 2025). In fact, our construction of the SMoM estimator impro ving the score matching estimator admits a geometric interpretation as the approac h to th e space spanned b y the W asserstein score functions (Li & Zhao, 2023). Also, using the W asserstein score function, w e sho w that the gap in the asymptotic v ariance b etw een the score matching estimator and the MLE also has a geometrical interpretation. F urthermore, we sho w that the score matching estimator is asymptotically eﬃcien t if and only if the Fisher score functions span the same space as the W asserstein score functions. 1.1 Literature review and con tributions There is an extensiv e literature on the estimation for non-normalized mo dels, beginning with Hyv¨ arinen’s pioneering w ork (Hyv¨ arinen, 2005). Regarding the denoising autoenco der as a non- normalized mo del, Vincent (2011) prop osed denoising score matching. T o improv e computational eﬃciency on score matching, Song et al. (2020) introduced sliced score matching. The framework of scoring rules is extended to cov er score matching (P arry et al., 2012; Kanamori & F ujisa wa, 2015; T ak asu et al., 2018). Srip erum budur et al. (2017) prop osed densit y estimation using inﬁnite dimensional exponential families with score matc hing. Gutmann & Hyv¨ arinen (2010) introduced y et another estimation metho d called noise contrastiv e estimation. Gutmann & Hira y ama (2011) pro vided a uniﬁed framework based on the Bregman divergence that encompasses the score match- ing and noise contrastiv e estimation. Matsuda et al. (2021) derived mo del selection criteria for non-normalized mo dels. In settings with missing data, Uehara et al. (2020b) com bined imputation tec hniques with estimators for non-normalized mo dels, and Giv ens et al. (2025) utilized marginal score functions. Uehara et al. (2020a) dev elop ed asymptotically eﬃcien t estimators that do not dep end on the normalizing constant by emplo ying density-ratio matching with a nonparametric densit y estimator. Despite the broad literature, the statistical eﬃciency of the original score matching has not b een w ell understo o d. Using the isop erimetric constant, Ko ehler et al. (2022) analyzed the condition under whic h the score matc hing estimator is comparable to the maximum likelihoo d estimator. Our 2 w ork provides a further characterization within the SMoM framework, and constructs an estimator that impro ves upon the score matching when the latter is not asymptotically eﬃcien t. Non-normalized mo dels also naturally arise on more general domains, suc h as truncated domains in R p (e.g., the non-negative orthan t R p + ) or Riemannian manifolds (e.g., the unit sphere S p − 1 ). Score matc hing has b een generalized to general domains b y c hanging the metric ⟨ · , · ⟩ to a weigh ted metric w ⟨ · , · ⟩ (e.g., Hyv¨ arinen, 2007; Liu et al., 2022), or replacing diﬀeren tial op erators with their manifold coun terparts (e.g., Dawid & Lauritzen, 2005; Mardia et al., 2016; Williams & Liu, 2022). SMoM also has b een generalized in the same wa y (c.f., Fischer et al., 2024, 2025). In line with these generalizations, our results are extended to these settings; see Appendix C. F or simplicity , w e primarily fo cus on non-normalized mo dels on R p in the main text. Our analysis is motiv ated b y the fact that the score matching estimator corresp onds to the SMoM estimator based on the test functions f θ,j : = ∇ x ∂ θ j log q θ ; see Lemma 1. This lemma has b een previously mentioned by Ebner et al. (2025), Eguchi (2025), and Kume & W alker (2026); it stands as a clue to in vestigate the relationship b et w een score matc hing and the Stein op erator, whic h has delivered v arious estimators. Eguchi (2025) extended Lemma 1 using the γ -divergence and dev elop ed the γ -score matc hing estimator, a robust estimator based on the γ -Stein op erator. Barp et al. (2019) prop osed an estimation metho d based on Stein discrepancy and derived its re- lationship with score matching. Closely related to the present paper is Kume & W alker (2026), where Lemma 1 together with the generalized method of moments (GMM) framew ork is emplo yed to construct an estimator that impro ves up on score matching in exponential families. W e emphasize that our geometric construction deliv ers a diﬀeren t p ersp ectiv e on score matching in general sta- tistical mo dels, and reveals an unexp ected connection b et ween score matching and the W asserstein geometry . 1.2 Organization The rest of this pap er is organized as follo ws. In Section 2, we prepare notations and review the score matching and the SMoM. In Section 3, we give our main results concerning the canonical decomp osition of SMoM estimators and the construction of a SMoM estimator impro ving the asymptotic v ariance of the score matching estimator. In Section 4, we in vestigate the connections b et w een the W asserstein geometry and SMoM. In Section 5, we present n umerical exp erimen ts. In Section 6, w e conclude the pap er. In App endix A, we provide the regularity conditions for the mo del and test functions. In Appendix B, we provide the pro ofs for the results in Section 4. In Appendix C, we give our results on more general domains. In Appendix D, we present additional n umerical exp erimen ts. 2 Preliminaries In this section, w e prepare notations and review the score matching and the SMoM. The standard inner product and the induced norm on R p is denoted by ⟨ · , · ⟩ and ∥ · ∥ , resp ec- tiv ely . The gradien t op erator on R p is denoted b y ∇ x , the div ergence of a vector ﬁeld f : R p → R p is deﬁned as ∇ x · f : = P p k =1 ∂ f k /∂ x k , and the Laplacian of a function g : R p → R is deﬁned as ∆ x g : = ∇ x · ( ∇ x g ) = P p k =1 ∂ 2 g /∂ x 2 k . Let ∂ θ j b e the partial diﬀeren tial op erator ∂ /∂ θ j . Let the parameter space Θ ⊂ R d b e an op en set, and let q θ b e a probability densit y on R d suc h that q θ > 0 almost everywhere. W e assume that the mo del { q θ | θ ∈ Θ } is identiﬁable, that 3 is, q θ = q θ ′ if and only if θ = θ ′ . F or an estimator ˆ θ that is asymptotically normal, i.e. √ n ( ˆ θ − θ ⋆ ) ⇝ N (0 , V ), we deﬁne its asymptotic v ariance as A V ar[ ˆ θ ] : = V . The Fisher information matrix I ( θ ⋆ ) ∈ R d × d is deﬁned by I ( θ ⋆ ) j k : = E θ ⋆ [( ∂ θ j log q θ ⋆ )( ∂ θ k log q θ ⋆ )]. W e assume that q θ and test functions f θ,j : R p → R p for j = 1 , . . . , d are smo oth with resp ect to b oth θ and x . Let X 1 , . . . , X n b e i.i.d. random v ariables from q θ ⋆ . The score matching estimator ˆ θ SM is giv en as a solution to the estimating equations: 1 n n X i =1 ∂ θ j H ( X i , θ ) = 0 , j = 1 , . . . , d, where the Hyv¨ arinen score H is deﬁned by H ( x, θ ) : = ∥∇ x log q θ ( x ) ∥ 2 / 2 + ∆ x log q θ ( x ) . Under the b oundary condition lim ∥ x ∥→∞ q θ ⋆ ( x ) ∇ x log q θ ( x ) = 0, using in tegration b y parts, we hav e the relation E θ ⋆ [ H ( X , θ )] = E θ ⋆ [ ∥∇ x log q θ ⋆ − ∇ x log q θ ∥ 2 ] / 2 + c, where X ∼ q θ ⋆ and c is a constant with resp ect to θ . Thus, minimizing E θ ⋆ [ H ( X , θ )] o ver Θ is equiv alent to minimizing E θ ⋆ [ ∥∇ x log q θ ⋆ − ∇ x log q θ ∥ 2 ]. The SMoM estimator ˆ θ SMoM based on the test functions f θ, 1 , . . . , f θ,d is giv en as a solution to the estimating equations: 1 n n X i =1 A θ f θ,j ( X i ) = 0 , j = 1 , . . . , d. The follo wing lemma states that the score matching estimator is an SMoM estimator. Lemma 1 (Section 2.1 of Eguchi (2025); Lemma 3.2 of Kume & W alk er (2026)) . The SMoM esti- mator b ase d on the test functions f θ,j : = ∇ x ∂ θ j log q θ , j = 1 , . . . , d is the sc or e matching estimator. Pr o of. Since f θ,j = ∇ x ∂ θ j log q θ is a solution of the equation A θ f θ,j ( x ) = ∂ θ j H ( x, θ ), the estimating equations of score matc hing coincide with that of SMoM based on the test functions. Remark 1 (T est functions and gradien t v ector ﬁelds) . By the Helmholtz de c omp osition (Ay, 2025), a test function f θ : R p → R p c an b e de c omp ose d a gr adient term and a diver genc e-fr e e term; ther e exist h θ : R p → R and g θ : R p → R p such that A θ g θ = 0 and f θ = ∇ h θ + g θ . However, we do not r estrict test functions to gr adients b e c ause mo deling ve ctor ﬁelds dir e ctly is c omputational ly eﬃcient. 3 Main results This section presen ts our main results. T o state the results rigorously , w e imp ose regularity con- ditions for the mo del and test functions detailed in App endix A. P articularly , w e assume the consistency and asymptotic linearity for SMoM estimators, and they are v eriﬁed by the standard theory of Z -estimators (e.g., V an der V aart, 1998). Under these conditions and Lemma 1, the score 4 matc hing estimator ˆ θ SM and the SMoM estimator ˆ θ SMoM based on the test functions f θ, 1 , . . . , f θ,d ha ve the follo wing asymptotic linear represen tation: ˆ θ SM − θ ⋆ = − G − 1 1 n n X i =1    A θ ⋆ ( ∇ x ∂ θ 1 log q θ ⋆ )( X i ) . . . A θ ⋆ ( ∇ x ∂ θ d log q θ ⋆ )( X i )    + o p  n − 1 / 2  , (1) ˆ θ SMoM − θ ⋆ = − G − 1 SMoM 1 n n X i =1    A θ ⋆ f θ ⋆ , 1 ( X i ) . . . A θ ⋆ f θ ⋆ ,d ( X i )    + o p  n − 1 / 2  , (2) resp ectiv ely , where G, G SMoM ∈ R d × d are the matrices whose ( j, k )-th en tries are deﬁned by G j k : = E θ ⋆ h ∂ θ k A θ ( ∇ x ∂ θ j log q θ )    θ = θ ⋆ i , ( G SMoM ) j k : = E θ ⋆ h ∂ θ k A θ f θ,j    θ = θ ⋆ i , resp ectiv ely . 3.1 A canonical decomp osition of SMoM estimators W e b egin with presenting a canonical decomp otision of SMoM estimators (Theorem 1). Before presen ting the main theorem, we introduce tw o notions of orthogonality: W -orthogonalit y and A θ ⋆ -orthogonalit y . The acron ym W refers to W asserstein; see Section 4 for details. Deﬁnition 1. A test function f is called W -orthogonal to a test function g if we ha ve E θ ⋆ [ ⟨ f , g ⟩ ] = 0 . A test function f is called A θ ⋆ -orthogonal to a test function g if we hav e E θ ⋆ [( A θ ⋆ f )( A θ ⋆ g )] = 0 . The following theorem pro vides the asymptotic linear representation of the SMoM estimator after cen tering the score matching estimator. Theorem 1 (Canonical decomposition of SMoM estimator) . Under the r e gularity c onditions, ther e exist u ⋆ j : R p → R p ( j = 1 , . . . , d ) such that the asymptotic line ar r epr esentation of ˆ θ SMoM c an b e de c omp ose d as fol lows: ˆ θ SMoM − θ ⋆ =  ˆ θ SM − θ ⋆  − G − 1 1 n n X i =1    A θ ⋆ u ⋆ 1 ( X i ) . . . A θ ⋆ u ⋆ d ( X i )    + o p  n − 1 / 2  , wher e e ach u ⋆ j : R p → R p is W -ortho gonal to the test function c orr esp onding to sc or e matching: E θ ⋆ [ ⟨ u ⋆ j , ∇ x ∂ θ k log q θ ⋆ ⟩ ] = 0 , k = 1 , . . . , d. Through this canonical decomp osition, the asymptotic linear representation of SMoM estimators is characterized only by its W -orthogonal term A θ ⋆ u ⋆ k . The score matc hing estimator is the SMoM estimator without W -orthogonal terms. Remark 2 ( W -orthogonality do es not imply A θ ⋆ -orthogonalit y) . Imp ortantly, the Stein op er ator do es not pr eserve ortho gonality; that is, even if u ⋆ k is W -ortho gonal to ∇ x ∂ θ j log q θ ⋆ , the A θ ⋆ - ortho gonal c ondition E θ ⋆ [ A θ ⋆ ( ∇ x ∂ θ j log q θ ⋆ ) A θ ⋆ u ⋆ k ] = 0 do es not ne c essarily hold. This give a clue to c onstructing an SMoM estimator that impr oves up on the sc or e matching estimator; by ﬁnding ap- pr opriate elements that ar e W -ortho gonal to the test functions c orr esp onding to the sc or e matching, we c an r e duc e the asymptotic varianc e. In Se ction 3.2, we show how to c onstruct such ortho gonal elements. 5 T o pro v e Theorem 1, w e need the follo wing lemma telling us that G SMoM is an inner pro duct matrix, and in particular, G is a Gram matrix. Lemma 2. Under the r e gularity c onditions, for the test functions f θ, 1 , . . . , f θ,d , we have E θ ⋆ h ∂ θ k A θ f θ,j    θ = θ ⋆ i = E θ ⋆ [ ⟨ f θ ⋆ ,j , ∇ x ∂ θ k log q θ ⋆ ⟩ ] , j , k = 1 , . . . , d. In p articular, we have G j k = E θ ⋆ [ ⟨∇ x ∂ θ j log q θ ⋆ , ∇ x ∂ θ k log q θ ⋆ ⟩ ] . Pr o of. Observe the relation ∂ θ k A θ f θ,j = A θ ( ∂ θ k f θ,j ) + ⟨ f θ,j , ∇ x ∂ θ k log q θ ⟩ . T ogether with E θ ⋆ [ A θ ⋆ ( ∂ θ k f θ ⋆ ,j )] = 0 under the regularity conditions, taking the exp ectation of this yields E θ ⋆ [ ∂ θ k A θ f θ,j | θ = θ ⋆ ] = E θ ⋆ [ ⟨ f θ ⋆ ,j , ∇ x ∂ θ k log q θ ⋆ ⟩ ], whic h completes the pro of. Remark that this representation of G gives the Riemannian metric induced by the score match- ing (Karakida et al., 2016). No w, w e pro ve Theorem 1 b y com bining the asymptotic linear representations (1)-(2) and Lemma 2. Pr o of of The or em 1. Fix f θ, 1 , . . . , f θ,d arbitrarily . W e b egin with the W -orthogonal decomp osition: f θ ⋆ ,j = d X k =1 B j k ( ∇ x ∂ θ k log q θ ⋆ + u ⋆ k ) , (3) where B ∈ R d × d is a co eﬃcien t matrix and u ⋆ k : R p → R p are W -orthogonal elemen ts suc h that E θ ⋆ [ ⟨ u ⋆ k , ∇ x ∂ θ j log q θ ⋆ ⟩ ] = 0 for j = 1 , . . . , d . This decomp osition, together with Lemma 2, gives the relation G SMoM = B G . Imp ortan tly , the co eﬃcien t matrix B must b e inv ertible for ˆ θ SMoM to admit the asymptotic linear represen tation. W e next substitute the relation G SMoM = B G to Equation (2). Then, w e hav e ˆ θ SMoM − θ ⋆ = − ( B G ) − 1 1 n n X i =1 B    A θ ⋆ ( ∇ x ∂ θ 1 log q θ ⋆ + u ⋆ 1 )( X i ) . . . A θ ⋆ ( ∇ x ∂ θ d log q θ ⋆ + u ⋆ d )( X i )    + o p  n − 1 / 2  . F rom the linearity of A θ ⋆ , the in vertibilit y of B , and Equation (1), we further obtain ˆ θ SMoM − θ ⋆ = − G − 1 1 n n X i =1    A θ ⋆ ( ∇ x ∂ θ 1 log q θ ⋆ )( X i ) . . . A θ ⋆ ( ∇ x ∂ θ d log q θ ⋆ )( X i )    − G − 1 1 n n X i =1    A θ ⋆ u ⋆ 1 ( X i ) . . . A θ ⋆ u ⋆ d ( X i )    + o p  n − 1 / 2  =  ˆ θ SM − θ ⋆  − G − 1 1 n n X i =1    A θ ⋆ u ⋆ 1 ( X i ) . . . A θ ⋆ u ⋆ d ( X i )    + o p  n − 1 / 2  , whic h completes the pro of. 6 Example 1 (Exp onen tial families) . Consider an exp onen tial family whose density is deﬁned b y q θ ( x ) = 1 Z ( θ ) exp  t ( x ) ⊤ θ + b ( x )  , Z ( θ ) = Z R p exp  t ( x ) ⊤ θ + b ( x )  d x, (4) where t : R p → R d is a suﬃcien t statistics and b : R p → R is a base measure. The SMoM estimator ˆ θ SMoM based on (parameter-independent) test functions f 1 , . . . , f d has the closed-form solution: ˆ θ SMoM = − G − 1 SMoM ,n 1 n n X i =1    ∇ x · f 1 ( X i ) + ⟨ f 1 ( X i ) , ∇ x b ( X i ) ⟩ . . . ∇ x · f d ( X i ) + ⟨ f d ( X i ) , ∇ x b ( X i ) ⟩    , where G SMoM ,n ∈ R d × d is the empirical inner pro duct matrix whose ( j, k )-th entry is deﬁned by ( G SMoM ,n ) j k : = n − 1 P n i =1 ⟨ f j ( X i ) , ∇ x t k ( X i ) ⟩ . Observe that the relation E θ ⋆ [ ∇ x · f j + ⟨ f j , ∇ x b ⟩ ] = E θ ⋆ " A θ ⋆ f j − d X k =1 ⟨ f j , ∇ x t k ⟩ θ ⋆ k # = − G SMoM θ ⋆ . Then, if G SMoM is inv ertible, ˆ θ SMoM is consistent and has asymptotic normality under some condi- tions on in tegrability . The linear indep endence of A θ ⋆ f 1 , . . . , A θ ⋆ f d follo ws from Lemma 3. The score matc hing estimator ˆ θ SM is the SMoM estimator based on the test functions f j = ∇ x t j , and also has the closed form (Hyv¨ arinen, 2007): ˆ θ SM = − G − 1 n 1 n n X i =1    ∆ x t 1 ( X i ) + ⟨∇ x t 1 ( X i ) , ∇ x b ( X i ) ⟩ . . . ∆ x t d ( X i ) + ⟨∇ x t d ( X i ) , ∇ x b ( X i ) ⟩    , where G n ∈ R d × d is the empirical inner product matrix whose ( j, k )-th entry is deﬁned b y ( G n ) j k : = n − 1 P n i =1 ⟨∇ x t j ( X i ) , ∇ x t k ( X i ) ⟩ . Consider the test functions f j = ∇ x t j + u ⋆ j , where u ⋆ j is W - orthogonal to ∇ x t k for k = 1 , . . . , d . The SMoM estimator based on this test functions can b e decomp osed as follo ws: ˆ θ SMoM − θ ⋆ = −  G n + ( G SMoM ,n − G n )  − 1 1 n n X i =1    A θ ⋆ ( ∇ x t 1 )( X i ) + A θ ⋆ u ⋆ 1 ( X i ) . . . A θ ⋆ ( ∇ x t d )( X i ) + A θ ⋆ u ⋆ d ( X i )    =  ˆ θ SM − θ ⋆  − G − 1 1 n n X i =1    A θ ⋆ u ⋆ 1 ( X i ) . . . A θ ⋆ u ⋆ d ( X i )    + o p  n − 1 / 2  , whic h agrees with Theorem 1. 3.2 Impro ving asymptotic v ariance of score matching estimator W e next construct an SMoM estimator impro ving upon the score matc hing estimator in the asymp- totic v ariance. Fix θ 0 ∈ Θ and ˜ v α : R p → R p , α = 1 , . . . , K arbitrarily . Using the test function ˜ v α , 7 w e deﬁne v θ 0 ,α : R p → R p b y the following orthogonalization pro cedure: v θ 0 ,α : = ˜ v α − d X j =1  F θ 0 G − 1 θ 0  αj ∇ x ∂ θ j log q θ 0 , (5) where F θ 0 ∈ R K × d and G θ 0 ∈ R d × d are inner pro duct matrices whose ( α , j )-th entry and ( j, k )-th en try are deﬁned by ( F θ 0 ) αj : = E θ 0  ⟨ ˜ v α , ∇ x ∂ θ j log q θ 0 ⟩  , ( G θ 0 ) j k : = E θ 0  ⟨∇ x ∂ θ j log q θ 0 , ∇ x ∂ θ k log q θ 0 ⟩  , resp ectiv ely . Observe the relation E θ 0 [ ⟨ v θ 0 ,α , ∇ x ∂ θ j log q θ 0 ⟩ ] = 0 , j = 1 , . . . , d. Denote b y ˆ θ [ θ 0 ] the SMoM estimator based on the follo wing test functions: f θ,j : = ∇ x ∂ θ j log q θ − K X α =1  S θ 0 T − 1 θ 0  j α v θ 0 ,α , j = 1 , . . . , d, (6) where S θ 0 ∈ R d × K and T θ 0 ∈ R K × K are inner pro duct matrices whose ( j, α )-th entry and ( α, β )-th en try are deﬁned by ( S θ 0 ) j α : = E θ 0  A θ 0 ( ∇ x ∂ θ j log q θ 0 ) A θ 0 v θ 0 ,α  , ( T θ 0 ) αβ : = E θ 0 [( A θ 0 v θ 0 ,α ) ( A θ 0 v θ 0 ,β )] , resp ectiv ely . F or the matrices F θ 0 , G θ 0 , S θ 0 , T θ 0 , w e make the follo wing assumption. Assumption 1. F θ 0 , G θ 0 , S θ 0 , T θ 0 are con tinuous at θ 0 = θ ⋆ . The follo wing theorem tells us that ˆ θ [ θ ⋆ ] impro v es the asymptotic v ariance of ˆ θ SM . F urthermore, suc h improv emen t remains even when ˆ θ 0 is plugged in to θ ⋆ . The pro of is given in App endix C. Theorem 2. Under the r e gularity c onditions, we have A V ar h ˆ θ SM i − A V ar h ˆ θ [ θ ⋆ ] i = G − 1 S θ ⋆ T − 1 θ ⋆ S ⊤ θ ⋆ G − 1 ⪰ 0 . The e quality A V ar[ ˆ θ SM ] = A V ar[ ˆ θ [ θ ⋆ ]] holds if and only if S θ ⋆ = O with a zer o matrix O . F urther- mor e, under Assumption 1, for an estimator ˆ θ 0 satisfying ˆ θ 0 − θ ⋆ = O p  n − 1 / 2  , we have ˆ θ [ ˆ θ 0 ] − θ ⋆ = ˆ θ [ θ ⋆ ] − θ ⋆ + o p  n − 1 / 2  . Figure 1 illustrates the impro vemen t procedure. Consider the SMoM estimator ˆ θ SMoM based on the follo wing test functions: f θ,j : = ∇ x ∂ θ j log q θ + K X α =1 C j α v θ ⋆ ,α , j = 1 , . . . , d, (7) 8 where C ∈ R d × K is a co eﬃcien t matrix. By Theorem 1, the asymptotic v ariance A V ar[ ˆ θ SMoM ] can b e decomposed as follows: A V ar h ˆ θ SMoM i = A V ar h ˆ θ SM i + G − 1 C ( T θ ⋆ + S θ ⋆ + S ⊤ θ ⋆ ) C ⊤ G − 1 , and it is minimized at C = − S θ ⋆ T − 1 θ ⋆ . In Figure 1, the score matching estimator, whic h is the SMoM estimator without W -orthogonal terms, is lo cated at c 1 = c 2 = 0. When K = 1, i.e. c 2 = 0, since A V ar[ ˆ θ SMoM ] is a quadratic function of c 1 , ˆ θ SMoM can b e mo ved in the descent direction of A V ar[ ˆ θ SMoM ] starting from ˆ θ SM , resulting in A V ar[ ˆ θ SMoM ] b eing minimized at c 1 = c ⋆ 1 . When K = 2, A V ar[ ˆ θ SM ] is further improv ed by additional direction v ⋆ 2 , and A V ar[ ˆ θ SMoM ] is minimized at ( c 1 , c 2 ) = c ⋆ . I ( θ ⋆ ) − 1 A V ar h ˆ θ SMoM i ˆ θ MLE ∇ x Φ θ ⋆ c 2 = 0 ∇ x ∂ θ log q θ ⋆ + c 1 v ⋆ 1 + c 2 v ⋆ 2 ˆ θ SM c 1 = 0 c ∗ c ∗ 1 Figure 1: Schematic diagram of the improv emen t of the asymptotic v ariance A V ar[ ˆ θ SM ] of the score matc hing estimator via SMoM. F or K = 1, the asymptotic v ariance A V ar[ ˆ θ [ θ ⋆ ]] is minimized at c ⋆ 1 , and it is lo wer than A V ar[ ˆ θ SM ]. F or K = 2, A V ar[ ˆ θ SM ] is further impro ved at c ⋆ . A V ar[ ˆ θ [ θ ⋆ ]] approac hes the eﬃciency b ound as increasing K . Let U θ 0 ∈ R d × d b e the inner matrix whose ( j, k )-th en try deﬁned by ( U θ 0 ) j k : = E θ 0  A θ 0 ( ∇ x ∂ θ j log q θ 0 ) A θ 0 ( ∇ x ∂ θ k log q θ 0 )  . The asymptotic v ariance of the score matc hing estimator is given by A V ar[ ˆ θ SM ] = G − 1 U θ ⋆ G − 1 . As 9 the estimate of asymptotic relativ e eﬃciency A V ar[ ˆ θ [ ˆ θ 0 ]] j j / A V ar[ ˆ θ SM ] j j , w e employ 1 −  G − 1 ˆ θ 0 S ˆ θ 0 T − 1 ˆ θ 0 S ⊤ ˆ θ 0 G − 1 ˆ θ 0  j j  G − 1 ˆ θ SM U ˆ θ SM G − 1 ˆ θ SM  j j , j = 1 , . . . , d, (8) and this quan tity can b e calculated using the Monte Carlo approximation. Remark 3 (The choice of ˆ θ 0 , ˜ v α , and K ) . A r e asonable choic e of ˆ θ 0 and ˜ v α ar e the sc or e matching estimator ˆ θ SM and neur al networks with r andom weight, which yields ﬂexible and e asily diﬀer entiable functions. Also, the or etic al ly, incr e asing K al lows us to se ar ch for optimal W -ortho gonal terms in a wider sp ac e. In pr actic e, we may ne e d the Monte Carlo appr oximation of the exp e ctation E ˆ θ 0 [ · ] , and it c an le ad to instability for a lar ge K . In Se ction 5, we investigate this instability thr ough numeric al simulations. 4 Connections b et ween SMoM and the W asserstein geometry In this section, we relate SMoM to the W asserstein geometry . This unexp ected connection presents a further c haracterization of the SMoM and the score matching. W e ﬁrst introduce the W asserstein score function, a central concept in the W asserstein geometry . Deﬁnition 2 (T he W asserstein score function (Li & Zhao, 2023)) . The W asserstein score function Φ θ,j : R p → R is the solution of the following partial diﬀerential equation: A θ ( ∇ x Φ θ,j ) = − ∂ θ j log q θ , j = 1 , . . . , d with E θ [Φ θ,j ] = 0. The W asserstein score function induces a second-order approximation of the 2-W asserstein distance, and plays a k ey role in the W asse rstein geometry (e.g., Li & Zhao, 2023; Nishimori & Matsuda, 2025). Also, W -orthogonality corresp onds to the orthogonality with resp ect to the W asserstein cov ariance. Within the SMoM framework, the maximum likelihoo d estimator is the SMoM estimator based on the test functions f θ,j = −∇ x Φ θ,j . Remark that since the W asserstein score function often does not has a closed form and depends on the normalizing constant, this SMoM estimator (MLE) is computationally infeasible. Y et, regarding MLE as the SMoM from the (min us) W asserstein score function oﬀers the fol- lo wing c haracterization of the score matc hing estimator: the score matching estimator cannot be impro ved if and only if the Fisher score functions span the same space as the W asserstein score functions. In this situation, the score matching estimator coincides with the MLE. The proof is giv en in App endix B. Theorem 3. Assume the r e gularity c onditions. Then, the fol lowing ar e e quivalent: 1. the sc or e matching estimator is asymptotic al ly eﬃcient; 2. if u ⋆ is W -ortho gonal to ∇ x ∂ θ j log q θ ⋆ for j = 1 , . . . , d , then it is also A θ ⋆ -ortho gonal to ∇ x ∂ θ j log q θ ⋆ for j = 1 , . . . , d ; 10 3. ther e exists a matrix Λ ∈ R d × d for which we have ∂ θ j log q θ ⋆ = d X k =1 Λ j k Φ θ ⋆ ,k , j = 1 , . . . , d. Let us pro vide tw o examples in whic h the score matching estimator is asymptotically eﬃcien t. Example 2 (Normal distribution) . F or the normal distribution N ( µ, Σ), the W asserstein score functions are giv en as follows (Amari & Matsuda, 2024): Φ µ j ( x ) = x j − µ j , j = 1 , . . . , p, Φ Σ j k ( x ) = − 1 2 tr( S j k Σ) + 1 2 ( x − µ ) ⊤ S j k ( x − µ ) , 1 ≤ j ≤ k ≤ p. Here, S j k ∈ R p × p is the symmetric matrix deﬁned b y the unique solution of the Sylv ester equation S j k Σ + Σ S j k = ( E j j j = k E j k + E kj j  = k , where E j k ∈ R p × p is the matrix whose ( j, k )-th entry is 1 and all other en tries are 0. The Fisher score functions are giv ens as follows: ∂ µ j log q θ ( x ) = p X k =1 (Σ − 1 ) j k ( x k − µ k ) , ∂ Σ j k log q θ ( x ) = − tr S j k + 1 2 ( x − µ ) ⊤ ( S j k Σ − 1 + Σ − 1 S j k )( x − µ ) . By the existence and uniqueness of the solution of the Sylvester equation, there exist unique con- stan ts c lm ∈ R such that S j k Σ − 1 + Σ − 1 S j k = P l ≤ m c lm S lm . In addition, we ha ve X l ≤ m c lm tr( S lm Σ) = tr  ( S j k Σ − 1 + Σ − 1 S j k )Σ  = 2 tr S j k . Th us, we obtain ∂ µ j log q θ = p X k =1 (Σ − 1 ) j k Φ µ k , ∂ Σ j k log q θ = X l ≤ m c lm Φ Σ lm , whic h, together with Theorem 3, concludes that the score matching is asymptotically eﬃcient; in this example, the score matc hing coincides with MLE and thus is eﬃcien t (Hyv¨ arinen, 2005). Example 3 (Generalized gamma-type distribution) . Consider the mo del on R whose density is deﬁned b y q θ ( x ) = θ β +1 / 2 Γ( β + 1 / 2) x 2 β e − x 2 θ , 11 where β ∈ N is a ﬁxed shap e parameter and θ > 0 is a scale parameter. Since the Fisher score function is giv en by ∂ θ log q θ ( x ) = − x 2 + (2 β + 1) / (2 θ ), w e obtain A θ ( ∇ x ∂ θ log q θ ) = − 4 θ ∂ θ log q θ . Th us, the W asserstein score function is giv en b y Φ θ ( x ) = − x 2 / (4 θ ) + (2 β + 1) / (8 θ 2 ), which, to- gether with Theorem 3, concludes that the score matching is asymptotically eﬃcient. This giv es a coun terexample to the latter part of Theorem 13.5 in Amari (2016). Remark 4 (Iterative subspace construction) . Our The or em 2 shows that the sc or e matching is impr ove d up on by inc orp or ating appr opriate W -ortho gonal elements. If these W -ortho gonal ele- ments c orr esp onds to the Wasserstein sc or e functions, the r esulting SMoM estimator attains the Fisher eﬃciency. Even if we c annot identify such elements, the asymptotic varianc e appr o aches the eﬃciency b ound as we inc orp or ate mor e ortho gonal elements. As il lustr ate d in Figur e 1, this pr o c e dur e is interpr ete d as the iter ative orthonormal subsp ac e exp ansion of test functions to c over the Wasserstein sc or e functions. T o prov e Theorem 3, w e need the tw o lemmas. The follo wing lemma connects tw o inner pro ducts related to W - and A θ ⋆ -orthogonalit y . The pro of is giv en in App endix B. Lemma 3. Under the r e gularity c onditions, for f θ : R p → R p , we have E θ ⋆  ⟨ f θ ⋆ , ∇ x ∂ θ j log q θ ⋆ ⟩  = E θ ⋆ [( A θ ⋆ f θ ⋆ )( A θ ⋆ ( ∇ x Φ θ ⋆ ,j ))] , j = 1 , . . . , d. In p articular, for j = 1 , . . . , d , if the test function u ⋆ is W -ortho gonal to ∇ x ∂ θ j log q θ ⋆ , then it is A θ ⋆ -ortho gonal to ∇ x Φ θ ⋆ ,j . The following lemma sho ws that the gap b et ween A V ar[ ˆ θ MLE ] and A V ar[ ˆ θ SM ] is c haracterized b y the W -orthogonal elements u ⋆ k . The pro of is given in App endix B. Lemma 4. Under the r e gularity c onditions, we have A V ar h ˆ θ SM i − A V ar h ˆ θ MLE i = G − 1 E θ ⋆        A θ ⋆ u ⋆ 1 . . . A θ ⋆ u ⋆ d       A θ ⋆ u ⋆ 1 . . . A θ ⋆ u ⋆ d    ⊤     G − 1 , wher e u ⋆ j is the W -ortho gonal element of ∇ x Φ θ ⋆ ,j given by the W -ortho gonal de c omp osition (3) . Lemma 4 tells us that if the A θ ⋆ -orthogonal terms A θ ⋆ u ⋆ j is dominant in A θ ⋆ ( ∇ x Φ θ ⋆ ,j ), the asymptotic v ariance A V ar[ ˆ θ SM ] j j is m uch worse than A V ar[ ˆ θ MLE ] j j . Conv ersely , if A θ ⋆ u ⋆ j is negligible compared to A θ ⋆ ( ∇ x ∂ θ j log q θ ⋆ ), w e hav e A V ar[ ˆ θ SM ] j j ≈ A V ar[ ˆ θ MLE ] j j . Example 4 (Generalized normal distribution) . Consider the sp ecial case of the generalized normal distribution whose densit y is deﬁned by q θ ( x ) = β θ 1 2 β Γ(1 / (2 β )) exp  − θ x 2 β  , (9) 12 where x ∈ R , β ∈ N is a ﬁxed shap e parameter, and θ > 0 is the scale parameter. If β = 1, the distribution is N (0 , 1 / (2 θ )). B oth of the score matching estimator and maximum lik eliho o d estimator ha ve the closed-form solutions: ˆ θ MLE = n 2 β P n i =1 X 2 β i , ˆ θ SM = 2 β − 1 2 β P n i =1 X 2 β − 2 i P n i =1 X 4 β − 2 i . They coincide only in case β = 1. The SMoM estimator based on the test function f : R → R also has the closed form: ˆ θ SMoM = P n i =1 d d x f ( X i ) 2 β P n i =1 X 2 β − 1 i f ( X i ) . The MLE corresp onds to the case of f ( x ) = x , and the score matching estimator corresp onds to the case of f ( x ) = − 2 β x 2 β − 1 . The W asserstein score function Φ θ has the explicit form: Φ θ ( x ) = − 1 4 β θ x 2 + Γ(3 / (2 β )) 4 β θ 1+1 /β Γ(1 / (2 β )) . Observ e that ∇ x Φ θ is orthogonally decomp osed as follo ws: ∇ x Φ θ = − Γ(1 / (2 β )) 4 β 2 (2 β − 1) θ 1 /β Γ(1 − 1 / (2 β )  − 2 β x 2 β − 1 + u θ  , where u θ is deﬁned b y u θ ( x ) : = − 2 β (2 β − 1) θ − 1+1 /β Γ(1 − 1 / (2 β )) Γ(1 / (2 β )) x + 2 β x 2 β − 1 and satisﬁes E θ [ ⟨− 2 β X 2 β − 1 , u θ ⟩ ] = 0. Combine it with Lemma 4 to get the asymptotic relative eﬃciency: A V ar h ˆ θ MLE i A V ar h ˆ θ SM i = 2 β − 1 2(3 β − 2) Γ(1 − 1 / (2 β )) 2 Γ(1 + 1 / (2 β ))Γ(2 − 3 / (2 β )) → 1 3 ( β → ∞ ) . (10) Figure 2 shows the behavior of (10) with respect to β . It tells us that the gap in the asymptotic v ariance b etw een the score matc hing estimator and the MLE widens as β increases. 13 Figure 2: The asymptotic relativ e eﬃciency A V ar[ ˆ θ MLE ] / A V ar[ ˆ θ SM ] with resp ect to β (blue) along with its limits (gra y). 5 Numerical exp erimen ts In this section, we pro vide numerical exp erimen ts illustrating the SMoM estimator constructed in Theorem 2; see also App endix D for additional exp erimen ts. W e fo cus on estimating the parameters θ . The estimates ˆ θ SM , ˆ θ [ θ ⋆ ], and ˆ θ [ ˆ θ SM ] are calculated using an i.i.d. sample of size n dra wn from the distribution with θ ⋆ . This pro cedure is iterated in 1000 times, and the MSE of each estimates are calculated. The estimates of asymptotic relative eﬃciency given by (8) or (21) are also calculated. ˜ v 1 , . . . , ˜ v K is constructed using neural netw orks. Sp eciﬁcally , ˜ v α is implemen ted as a neural netw ork with ﬁv e hidden lay ers, eac h consisting of three nodes, and p -dimensional input and output lay ers. W e employ tanh as the activ ation function, and all parameters are randomly initialized with v alues dra wn from N (0 , 1). Since the estimates ˆ θ [ θ ⋆ ] , ˆ θ [ ˆ θ SM ] dep end on the choice of ˜ v 1 , . . . , ˜ v K , w e ev aluate the performance o v er 10 diﬀerent pairs of initializations. The expectations are approximated via Mon te Carlo integration with 1000 samples. 5.1 Generalized normal distribution W e fo cus on estimating the scale parameter of θ of the generalized normal distribution whose densit y given by (9). Here, θ ⋆ is set to θ ⋆ = Γ(3 / (2 β )) 2 β / Γ(1 / (2 β )) 2 β , which ensures that the distribution has unit v ariance. β is set to 2. The sample size n v aries in { 10 , 100 , 1000 } . The n umber of W -orthogonal elemen ts K v aries in { 1 , 2 , 4 , 8 } . W e also calculate ˆ θ MLE as a b enc hmark. T able 1 sho ws that b oth of ˆ θ [ θ ⋆ ] , ˆ θ [ ˆ θ SM ] typically hav e low er v ariance than the score matching estimator, whereas this improv ement do es not hold for small K or small n . Both of them are comparable to the result of the MLE under some ˜ v α . Figure 4 shows that the test functions of 14 ˆ θ [ θ ⋆ ] , ˆ θ [ ˆ θ SM ] are almost same as that of the score matching estimator with K = 1, and close to that of the MLE with K = 8. T able 1: MSE ratio for ˆ θ [ ˆ θ SM ], ˆ θ [ θ ⋆ ], and ˆ θ MLE relativ e to ˆ θ SM . F or ˆ θ [ ˆ θ SM ] and ˆ θ [ θ ⋆ ], the results are summarized as median (min, max) ov er diﬀerent pairs of ˜ v α . A v alue smaller than 1 indicates b etter p erformance. F or each sample size n , the median corresp onding to the b est-p erforming K is underlined. V alues exceeding 100 are omitted and represented by a h yphen (-). ˆ θ [ ˆ θ SM ] ˆ θ [ θ ⋆ ] ˆ θ MLE n = 10 K = 1 1 . 015 (0 . 835 , 1 . 347) 2 . 764 (0 . 683 , − ) 0 . 536 K = 2 0 . 858 (0 . 635 , 1 . 075) 1 . 355 (0 . 539 , 5 . 118) K = 4 0 . 646 (0 . 408 , 0 . 964) 0 . 635 (0 . 347 , 2 . 688) K = 8 0 . 740 (0 . 605 , 0 . 862) 0 . 565 (0 . 447 , 1 . 022) n = 100 K = 1 1 . 020 (0 . 944 , 1 . 054) 1 . 022 (0 . 946 , 1 . 083) 0 . 660 K = 2 1 . 021 (0 . 735 , 1 . 065) 1 . 015 (0 . 706 , 1 . 121) K = 4 0 . 748 (0 . 661 , 0 . 851) 0 . 742 (0 . 660 , 0 . 825) K = 8 0 . 791 (0 . 695 , 1 . 156) 0 . 736 (0 . 689 , 0 . 772) n = 1000 K = 1 1 . 031 (0 . 984 , 1 . 042) 1 . 006 (0 . 968 , 1 . 041) 0 . 685 K = 2 0 . 902 (0 . 665 , 1 . 050) 0 . 899 (0 . 702 , 1 . 061) K = 4 0 . 750 (0 . 700 , 1 . 019) 0 . 750 (0 . 694 , 1 . 023) K = 8 0 . 831 (0 . 698 , 2 . 758) 0 . 829 (0 . 692 , 2 . 953) Figure 3: MSE ratio for ˆ θ [ ˆ θ SM ] relativ e to ˆ θ SM v ersus the geometric mean of estimates given by (8). P oin ts of the same color corresp ond to diﬀerent pairs of ˜ v α . The horizontal and vertical line (gra y) represen t the asymptotic relative eﬃciency for MLE calculated b y (10). The horizontal axis is truncated at 2. V alues less than 1 on the horizontal axis indicate that corresp onding SMoM estimator improv es the v ariance of the score matching estimator. Poin ts near the diagonal indicate that the estimate of the asymptotic relativ e eﬃciency is reliable. 15 Figure 4: T est functions for ˆ θ SM (blue), ˆ θ MLE (orange), and ˆ θ [ ˆ θ SM ] (gray). F or ˆ θ [ ˆ θ SM ], the mean v alue o ver iterations are plotted, where each line corresp onds to a diﬀerent pairs of ˜ v α . A test function close to the test function of the MLE implies that the corresp onding SMoM estimator is also close to the MLE. 5.2 P olynomially tilted pairwise interaction model Consider the ( p − 1)-dimensional unit sphere S p − 1 : = { x ∈ R p | ∥ x ∥ = 1 } , Let S p − 1 + b e its non- negativ e orthan t, i.e., S p − 1 + : = S p − 1 ∩ [0 , ∞ ) p . The polynomially tilted pairwise in teraction (PPI) mo del (Scealy & W o o d, 2023) has the follo wing densit y: q θ ( x ) = 1 Z ( A, µ, β )   p Y j =1 x 1+2 β j j   exp  x 2 ⊤ Ax 2 + µ ⊤ x 2  , x ∈ S p − 1 + , where θ = ( A, µ ), A ∈ R p × p is a symmetric matrix, µ ∈ R p , and β ∈ ( − 1 , ∞ ) p is a ﬁxed shap e parameter. F or identiﬁabilit y , we ﬁx A pj = A j p = 0 for j = 1 , . . . , d and µ p = 0. This density is deriv ed by the square ro ot transformation from the unit simplex to S p − 1 + . See Scealy & W o o d (2023) for details. Also, ab out the weigh ted score matching estimator ˆ θ wSM and corresp onding SMoM estimators, see App endix C. The sample size n is set to 100. W e set to p = 3, A ⋆ = diag(1 , 1 , 0), µ ⋆ = (0 , 0 , 0) ⊤ and β = ( − 0 . 5 , − 0 . 5 , − 0 . 5) ⊤ . W e use the weigh t function w ( x ) = Q p j =1 x j . The num ber of W -orthogonal 16 elemen ts K v aries in { 3 , 6 , 12 , 24 } . F or calculation of ˜ v α , to ensure the output is a v alid vector ﬁeld on S p − 1 + , the output v ector is pro jected on to the tangen t space T x S p − 1 + via the orthogonal pro jection P x = I p − xx ⊤ . T able 2 sho ws that b oth of ˆ θ [ θ ⋆ ] , ˆ θ [ ˆ θ wSM ] impro ve the v ariance of the score matc hing estimator for K ≤ 12. ˆ θ [ ˆ θ wSM ] typically shows b etter p erformance than ˆ θ [ θ ⋆ ]. While the eﬃciency initially impro ves as K increases, it b egins to deteriorate at K = 24, p oten tially due to n umerical instabilit y . Figure 5 sho ws that the estimate of asymptotic relative eﬃciency is t ypically well-performed for K ≤ 12. These results conﬁrm our theory w orks well in ﬁnite-sample settings, although we need to c ho ose an appropriate K . T able 2: MSE ratio of PPI mo del for ˆ θ [ ˆ θ wSM ], ˆ θ [ θ ⋆ ] relativ e to ˆ θ wSM . The results are represented in the format: median (min, max) across diﬀerent pairs of ˜ v α . A v alue smaller than 1 indicates b etter p erformance. The b est-p erforming K is underlined. V alues exceeding 100 are omitted and represen ted by a h yphen (-). ˆ θ [ ˆ θ wSM ] ˆ θ [ θ ⋆ ] K = 3 A 11 0 . 919 (0 . 883 , 0 . 967) 0 . 937 (0 . 912 , 0 . 961) A 22 0 . 913 (0 . 840 , 0 . 952) 0 . 929 (0 . 853 , 0 . 967) A 12 0 . 909 (0 . 866 , 0 . 972) 0 . 923 (0 . 893 , 0 . 973) µ 1 0 . 932 (0 . 909 , 0 . 982) 0 . 948 (0 . 929 , 0 . 985) µ 2 0 . 919 (0 . 880 , 0 . 959) 0 . 933 (0 . 892 , 0 . 968) K = 6 A 11 0 . 810 (0 . 695 , 0 . 913) 0 . 829 (0 . 704 , 0 . 907) A 22 0 . 792 (0 . 680 , 0 . 884) 0 . 812 (0 . 710 , 0 . 908) A 12 0 . 830 (0 . 652 , 0 . 888) 0 . 846 (0 . 657 , 0 . 899) µ 1 0 . 841 (0 . 640 , 0 . 907) 0 . 850 (0 . 654 , 0 . 905) µ 2 0 . 788 (0 . 670 , 0 . 911) 0 . 805 (0 . 679 , 0 . 917) K = 12 A 11 0 . 694 (0 . 648 , 0 . 765) 0 . 732 (0 . 658 , 0 . 878) A 22 0 . 664 (0 . 561 , 0 . 806) 0 . 683 (0 . 595 , 0 . 851) A 12 0 . 702 (0 . 622 , 0 . 825) 0 . 709 (0 . 624 , 0 . 882) µ 1 0 . 706 (0 . 615 , 0 . 774) 0 . 736 (0 . 613 , 0 . 830) µ 2 0 . 684 (0 . 559 , 0 . 786) 0 . 694 (0 . 596 , 0 . 836) K = 24 A 11 1 . 886 (0 . 638 , − ) 33 . 035 (0 . 636 , − ) A 22 2 . 054 (0 . 632 , 66 . 790) 81 . 787 (0 . 653 , − ) A 12 2 . 676 (0 . 613 , − ) 40 . 986 (0 . 618 , − ) µ 1 1 . 985 (0 . 603 , − ) 24 . 774 (0 . 599 , − ) µ 2 2 . 065 (0 . 622 , − ) 55 . 931 (0 . 624 , − ) 17 Figure 5: MSE ratio of PPI mo del for ˆ θ [ ˆ θ wSM ] relative to ˆ θ wSM v ersus the geometric mean of estimates giv en b y (21). Poin ts of the same color corresp ond to diﬀeren t pairs of ˜ v α . The horizontal axis is truncated at 2. V alues less than 1 on the horizontal axis indicate that corresponding SMoM estimator improv es the v ariance of the score matching estimator. Poin ts near the diagonal indicate that the estimate of the asymptotic relativ e eﬃciency is reliable. 6 Conclusion In this pap er, we studied the geometry of SMoM estimators fo cusing on W -orthogonalit y and A θ ⋆ -orthogonalit y . The canonical decomposition (Theorem 1) giv es the asymptotoic linear represen tation of SMoM estimators after centering the score matching estimator, and it implies that the asymptotic linear represen tation of SMoM estimators is c haracterized b y its W -orthogonal term. Using the fact that the W -orthogonality do es not imply A θ ⋆ -orthogonalit y , w e constructed a SMoM estimator whic h impro ves the asymptotic v ariance of the score matc hing estimator (Theorem 2). Through the n umerical exp eriments, we comﬁrmed that the impro vemen t is eﬀective in ﬁnite sample settings under an appropriate c hoice of K . The unexp ected connection b et ween the W asserstein geometry and SMoM presents further c haracterization of the asymptotic eﬃciency of the score matc hing estimator. W e sho wed that the score matc hing estimator is asymptotically eﬃcient if and only if the Fisher score functions span the same space as the W asserstein score functions (Theorem 3). W e also sho w ed that the W -orthogonal 18 term of the W asserstein score function c haracterizes the gap of the asymptotic v ariance b et w een the score matching estimator and the MLE (Lemma 4). The geometry of SMoM ma y provide further connection betw een the three geometries: the geometry of score matching, Fisher-Rao geometry , and W asserstein geometry . 7 Ac kno wledgemen t The authors would lik e to thank T akeru Matsuda, Y oshik azu T erada, and Shotaro Y agishita for helpful commen ts. This w ork is supp orted b y JSPS KAKENHI (21H05205, 23K11024), and MEXT (JPJ010217). References Amari, S. (2016). Information ge ometry and its applic ations . Springer. Amari, S. & Ma tsuda, T. (2024). Information geometry of W asserstein statistics on shap es and aﬃne deformations. Information Ge ometry 7 , 285–309. A y, N. (2025). Information geometry of the Otto metric. Information Ge ometry 8 , 209–232. Barp, A. , Briol, F.-X. , Duncan, A. , Gir olami, M. & Ma ckey, L. (2019). Minimum Stein discrepancy estimators. A dvanc es in Neur al Information Pr o c essing Systems 32 . Bickel, P. J. , Klaassen, C. , Rito v, Y. & Wellner, J. (1993). Eﬃcient and adaptive estima- tion for semip ar ametric mo dels , vol. 4. Springer. Chikuse, Y. (2003). Statistics on Sp e cial Manifolds , vol. 174. Springer Science & Business Media. D a wid, A. P. & La uritzen, S. (2005). The geometry of decision theory . In The 2nd International Symp osium on Information Ge ometry and Applic ations . Ebner, B. , Fischer, A. , Gaunt, R. E. , Picker, B. & Sw an, Y. (2025). Stein’s metho d of momen ts. Sc andinavian Journal of Statistics 52 , 1594–1624. Eguchi, S. (2025). Robust inference using density-pow ered Stein op erators. arXiv pr eprint arXiv:2511.03963 . Fischer, A. , Gaunt, R. E. & Sw an, Y. (2024). Stein’s metho d of momen ts on the sphere. arXiv pr eprint arXiv:2407.02299 . Fischer, A. , Gaunt, R. E. & Sw an, Y. (2025). Stein’s metho d of moments for truncated m ultiv ariate distributions. Ele ctr onic Journal of Statistics 19 , 1784–1808. Givens, J. , Liu, S. & Reeve, H. (2025). Score matc hing with missing data. In Pr o c e e dings of the 42nd International Confer enc e on Machine L e arning (ICML 2025) . Gutmann, M. U. & Hira y ama, J. (2011). Bregman divergence as general framew ork to esti- mate unnormalized statistical mo dels. In Pr o c e e dings of the 27th Confer enc e on Unc ertainty in A rtiﬁcial Intel ligenc e (UAI 2011) . 19 Gutmann, M. U. & Hyv ¨ arinen, A. (2010). Noise-contrastiv e estimation: A new estimation principle for unnormalized statistical mo dels. In Pr o c e e dings of the 13th International Confer enc e on A rtiﬁcial Intel ligenc e and Statistics (AIST A TS 2010) . Hyv ¨ arinen, A. (2005). Estimation of non-normalized statistical mo dels by score matching. Journal of Machine L e arning R ese ar ch 6 , 695–709. Hyv ¨ arinen, A. (2007). Some extensions of score matc hing. Computational Statistics & Data A nalysis 51 , 2499–2512. Kanamori, T. & Fujisa w a, H. (2015). Robust estimation under heavy contamination using unnormalized mo dels. Biometrika 102 , 559–572. Karakida, R. , Okada, M. & Amari, S. (2016). Adaptive natural gradient learning algorithms for unnormalized statistical mo dels. In International Confer enc e on A rtiﬁcial Neur al Networks . Springer. K oehler, F. , Heckett, A. & Risteski, A. (2022). Statistical eﬃciency of score matching: The view from isop erimetry . arXiv pr eprint arXiv:2210.00726 . Kume, A. & W alker, S. G. (2026). On Stein’s metho d of momen ts and generalized score matc hing. arXiv pr eprint arXiv:2602.06482 . LeCun, Y. , Chopra, S. , Hadsell, R. , Ranza to, M. & Huang, F. J. (2006). A tutorial on energy-based learning. Pr e dicting structur e d data 1 . Li, W. & Zhao, J. (2023). W asserstein information matrix. Information Ge ometry 6 , 203–255. Liu, S. , Kanamori, T. & Williams, D. J. (2022). Estimating density mo dels with truncation b oundaries using score matc hing. Journal of Machine L e arning R ese ar ch 23 , 1–38. Mardia, K. V. , Kent, J. T. & Laha, A. K. (2016). Score matching estimators for directional distributions. arXiv pr eprint arXiv:1604.08470 . Ma tsuda, T. , Uehara, M. & Hyv ¨ arinen, A. (2021). Information criteria for non-normalized mo dels. Journal of Machine L e arning R ese ar ch 22 , 1–33. Mijoule, G. , Rai ˇ c, M. , Reiner t, G. & Sw an, Y. (2023). Stein’s densit y metho d for m ultiv ariate con tinuous distributions. Ele ctr onic Journal of Pr ob ability 28 , 1–40. Nishimori, H. & Ma tsud a, T. (2025). On the attainment of the W asserstein–Cramer–Rao low er b ound. Information Ge ometry . Otto, F. (2001). The geometry of dissipativ e evolution equations: the p orous medium equation. Communic ations in Partial Diﬀer ential Equations 26 , 101–174. Otto, F. & Villani, C. (2000). Generalization of an inequalit y b y T alagrand and links with the logarithmic Sob olev inequality . Journal of F unctional Analysis 173 , 361–400. P arr y, M. , D a wid, A. P. & Lauritzen, S. (2012). Prop er lo cal scoring rules. The A nnals of Statistics 40 , 561–592. 20 Sceal y, J. L. & W ood, A. T. A. (2023). Score matching for comp ositional distributions. Journal of the A meric an Statistic al Asso ciation 118 , 1811–1823. Song, Y. , Gar g, S. , Shi, J. & Ermon, S. (2020). Sliced score matching: A scalable approac h to density and score estimation. In Pr o c e e dings of The 35th Unc ertainty in Artiﬁcial Intel ligenc e Confer enc e (UAI 2020) . Song, Y. & Kingma, D. P. (2021). How to train y our energy-based mo dels. arXiv pr eprint arXiv:2101.03288 . Sriperumbudur, B. , Fukumizu, K. , Gretton, A. , Hyv ¨ arinen, A. & Kumar, R. (2017). Densit y estimation in inﬁnite dimensional exp onen tial families. Journal of Machine L e arning R ese ar ch 18 , 1–59. T akasu, Y. , Y ano, K. & Komaki, F. (2018). Scoring rules for statistical mo dels on spheres. Statistics & Pr ob ability L etters 138 , 111–115. Trillos, N. G. , Jaffe, A. Q. & Sen, B. (2025). W asserstein-Cram ´ er-Rao theory of un biased estimation. arXiv pr eprint arXiv:2511.07414 . Uehara, M. , Kanamori, T. , T akenouchi, T. & Ma tsuda, T. (2020a). A uniﬁed statistically eﬃcien t estimation framework for unnormalized mo dels. In Pr o c e e dings of the Twenty Thir d In- ternational Confer enc e on A rtiﬁcial Intel ligenc e and Statistics , v ol. 108 of Pr o c e e dings of Machine L e arning R ese ar ch . PMLR. Uehara, M. , Ma tsuda, T. & Kim, J. K. (2020b). Imputation estimators for unnormalized mo dels with missing data. In Pr o c e e dings of the 23th International Confer enc e on A rtiﬁcial Intel ligenc e and Statistics (AIST A TS 2020) . V an der V aar t, A. W. (1998). Asymptotic Statistics . Cambridge Universit y Press, 1st ed. Vincent, P. (2011). A connection b et ween score matc hing and denoising autoenco ders. Neur al Computation 23 , 1661–1674. Williams, D. J. & Liu, S. (2022). Score matc hing for truncated densit y estimation on a manifold. In Pr o c e e dings of T op olo gic al, Algebr aic, and Ge ometric L e arning Workshops 2022 (T AG-ML 2022) . App endix A Regularit y conditions This app endix summarizes the regularit y conditions that we use in this pap er. W e assume the following conditions for the model { q θ | θ ∈ Θ } and test functions f θ : A1. E θ [( A θ f θ ) 2 ] < ∞ and E θ [ A θ f θ ] = 0. A2. E θ [ A θ ( ∂ θ j f θ )] = 0 , j = 1 , . . . , d . A3. E θ [ |⟨ f θ , ∇ x ∂ θ j log q θ ⟩| ] < ∞ , j = 1 , . . . , d . 21 A4. ∂ θ j E θ [ A θ f θ ] = R ∂ θ j ( q θ A θ f θ ) d x, j = 1 , . . . , d . The condition E θ [ A θ f θ ] = 0 is satisﬁed if the boundary condition lim ∥ x ∥→∞ q θ ( x ) f θ ( x ) = 0 holds. Condition A2 is used in the pro of of Lemma 2, the latter part of Theorem 2, and Lemma 3. Condition A4 is used in the pro of of Lemma 3. W e also assume the follo wing conditions for the Fisher score functions ∂ θ j log q θ and the W asser- stein score functions Φ θ,j : A5. E θ [ A θ (Φ θ,j ∇ x ∂ θ k log q θ )] = 0. A6. E θ [ A θ ( ∂ θ j log q θ ∇ x Φ θ,k )] = 0. A7. E θ [ A θ ( ∂ θ j log q θ ∇ x ∂ θ k log q θ )] = 0. A8. E θ [ A θ (Φ θ,j ∇ x Φ θ,k )] = 0. These conditions are used in the pro of of Theorem 3. F or the SMoM estimator based on the test functions f θ, 1 , . . . , f θ,d , we assume the following conditions: A9. A θ ⋆ f θ ⋆ , 1 , . . . , A θ ⋆ f θ ⋆ ,d are linearly indep enden t. A10. The matrix G SMoM ∈ R d × d whose ( j, k )-th en try is deﬁned by ( G SMoM ) j k : = E θ ⋆ h ∂ θ k A θ f θ,j    θ = θ ⋆ i is in vertible. A11. ˆ θ SMoM is consistent and asymptotic linear; that is, ˆ θ SMoM − θ ⋆ has the follo wing representation: ˆ θ SMoM − θ ⋆ = − G − 1 SMoM 1 n n X i =1    A θ ⋆ f θ ⋆ , 1 ( X i ) . . . A θ ⋆ f θ ⋆ ,d ( X i )    + o p  n − 1 / 2  . App endix B Pro ofs in Section 4 This app endix provides the pro ofs for the results in Section 4. Pr o of of L emma 3. By the Leibniz rule and the deﬁnition of the W asserstein score function, we ha ve 0 = ∂ θ j E θ [ A θ f θ ] | θ = θ ⋆ = E θ ⋆ [ ∂ θ j A θ f θ | θ = θ ⋆ ] + E θ ⋆ [( A θ ⋆ f θ ⋆ )( ∂ θ j log q θ ⋆ )] = E θ ⋆ [ ∂ θ j A θ f θ | θ = θ ⋆ ] − E θ ⋆ [( A θ ⋆ f θ ⋆ ) A θ ⋆ ( ∇ x Φ θ ⋆ ,j )] , j = 1 , . . . , d. Com bining this with the relation E θ ⋆ [ ∂ θ j A θ f θ | θ = θ ⋆ ] = E θ ⋆ [ ⟨ f θ ⋆ , ∇ x ∂ θ j log q θ ⋆ ⟩ ] from Lemma 2, w e obtain E θ ⋆ [ ⟨ f θ ⋆ , ∇ x ∂ θ j log q θ ⋆ ⟩ ] = E θ ⋆ [( A θ ⋆ f θ ⋆ )( A θ ⋆ ∇ x Φ θ ⋆ ,j )] , whic h completes the pro of. 22 Pr o of of L emma 4. Let F ∈ R d × d b e the inner pro duct matrices whose ( j, k )-th entry is deﬁned by F j k : = E θ ⋆ [ ⟨∇ x Φ θ ⋆ ,j , ∇ x ∂ θ k log q θ ⋆ ⟩ ] , and u ⋆ j is deﬁned b y u ⋆ j : = d X k =1 ( GF − 1 ) j k ∇ x Φ θ ⋆ ,k − ∇ x ∂ θ j log q θ ⋆ . (11) Note that this u ⋆ j is W -orthogonal to the test functions of score matc hing: E θ ⋆ [ ⟨ u ⋆ j , ∇ x ∂ θ k log q θ ⋆ ⟩ ] = 0 , k = 1 , . . . , d. W e will show the relation A V ar h ˆ θ SM i − A V ar h ˆ θ MLE i = G − 1 T G − 1 , (12) where T ∈ R d × d is the inner pro duct matrix whose ( j, k )-th en try is deﬁned by T j k : = E θ ⋆ [( A θ ⋆ u ⋆ j )( A θ ⋆ u ⋆ k )] . W e ﬁrst relate F to the Fisher information matrix I ( θ ⋆ ). Using u ⋆ j , we decomp ose ∇ x Φ θ ⋆ ,j as follo ws: ∇ x Φ θ ⋆ ,j = d X k =1 ( F G − 1 ) j k ( ∇ x ∂ θ k log q θ ⋆ + u ⋆ k ) . Observ e the relation ∂ θ k ∂ θ j log q θ ⋆ = − ∂ θ k A θ ⋆ ( ∇ x Φ θ ⋆ ,j ) = −A θ ⋆ ( ∇ x ∂ θ k Φ θ ⋆ ,j ) − ⟨∇ x Φ θ,j , ∇ x ∂ θ k log q θ ⋆ ⟩ . T ogether with E θ ⋆ [ A θ ⋆ ( ∇ x ∂ θ k Φ θ ⋆ ,j )] = 0 under the regularity conditions, taking the expectation of this yields I ( θ ⋆ ) = F . (13) W e next sho w (12). Using Theorem 1, we can decomp ose the asymptotic v ariance A V ar[ ˆ θ MLE ]: A V ar h ˆ θ MLE i = A V ar h ˆ θ SM i + G − 1 ( T + S + S ⊤ ) G − 1 , where S ∈ R d × d is the inner pro duce matrix whose ( j, k )-en try is deﬁned by S j k : = E θ ⋆ [ A θ ⋆ u ⋆ j A θ ⋆ ( ∇ x ∂ θ k log q θ ⋆ )] . Note that we hav e E θ ⋆ [ A θ ⋆ ( ∇ x Φ θ ⋆ ,j ) A θ ⋆ ( ∇ x ∂ θ k log q θ ⋆ )] = G j k b y Lemma 3. Note also that substi- tuting (11) to S j k , w e hav e S j k = d X l =1 ( GF − 1 ) j l E θ ⋆ [ A θ ⋆ ( ∇ x Φ θ ⋆ ,l ) A θ ⋆ ( ∇ x ∂ θ k log q θ ⋆ )] − E θ ⋆ [ A θ ⋆ ( ∇ x ∂ θ j log q θ ⋆ ) A θ ⋆ ( ∇ x ∂ θ k log q θ ⋆ )] = ( GI ( θ ⋆ ) − 1 G ) j k − E θ ⋆ [ A θ ⋆ ( ∇ x ∂ θ j log q θ ⋆ ) A θ ⋆ ( ∇ x ∂ θ k log q θ ⋆ )] , 23 where the iden tity (13) is also used. Thus, the asymptotic v ariance A V ar[ ˆ θ MLE ] is rearranged as A V ar[ ˆ θ MLE ] = A V ar[ ˆ θ SM ] + G − 1 T G − 1 + 2 I ( θ ⋆ ) − 1 − 2 A V ar[ ˆ θ SM ] . Since A V ar[ ˆ θ MLE ] = I ( θ ⋆ ) − 1 , w e ﬁnally get A V ar[ ˆ θ SM ] − A V ar[ ˆ θ MLE ] = G − 1 T G − 1 , whic h completes the pro of. F or the pro of of Theorem 3, w e prepare the following lemma. Lemma 5. F or h : R p → R satisfying E θ ⋆ [ A θ ⋆ ( h ∇ x h )] = 0 , we have A θ ⋆ ( ∇ x h ) = 0 if and only if the identity ∇ x h = 0 holds with pr ob ability 1. Pr o of. If ∇ x h = 0, we ha v e A θ ⋆ ( ∇ x h ) = 0 b y the linearit y of A θ ⋆ . Supp ose A θ ⋆ ( ∇ x h ) = 0. Observe the relation A θ ⋆ ( h ∇ x h ) = h A θ ⋆ ( ∇ x h ) + ∥∇ x h ∥ 2 = ∥∇ x h ∥ 2 . T aking the exp ectation of this yields E θ ⋆ [ ∥∇ x h ∥ 2 ] = 0 and th us implies ∇ x h = 0 with probabilit y 1. Pr o of of The or em 3. (1) ⇒ (3): Supp ose the score matching estimator is asymptotically eﬃcient: A V ar[ ˆ θ SM ] = A V ar[ ˆ θ MLE ]. By Lemma 4, it implies E θ ⋆ [( A θ ⋆ u ⋆ j ) 2 ] = 0, that is, A θ ⋆ u ⋆ j = 0 with probabilit y 1, where u ⋆ j is deﬁned by (11). Since there exists h j : R d → R such that tw o identities u ⋆ j = ∇ x h j and E θ ⋆ [ h j ∇ x h j ] = 0 hold under the regularity conditions, we hav e u ⋆ j = 0 with probabilit y 1 by Lemma 5. Substituting u ⋆ j = 0 to (11), w e get ∇ x ∂ θ j log q θ ⋆ = d X k =1 ( GF − 1 ) j k ∇ x Φ θ ⋆ ,k . Th us, there exists a constant c j ∈ R such that ∂ θ j log q θ ⋆ = d X k =1 ( GF − 1 ) j k Φ θ ⋆ ,k + c j . Since E θ ⋆ [Φ θ ⋆ ,j ] = 0 and E θ ⋆ [ ∂ θ j log q θ ⋆ ] = 0, taking the exp ectation of this gives c j = 0, which pro ves (1) ⇒ (3). (3) ⇒ (1): Supp ose there exists a matrix Λ ∈ R d × d suc h that ∂ θ j log q θ ⋆ = d X k =1 Λ j k Φ θ ⋆ ,k . Applying h 7→ A θ ⋆ ( ∇ x h ) on b oth sides, w e ha ve A θ ⋆ ( ∇ x ∂ θ j log q θ ⋆ ) = d X k =1 Λ j k A θ ⋆ ( ∇ x Φ θ ⋆ ,k ) = − d X k =1 Λ j k ∂ θ j log q θ ⋆ , 24 whic h implies that the estimating equation of the score matc hing estimator coincides with the MLE. Th us, the score matching estimator is asymptotically eﬃcien t, implying (3) ⇒ (1). (3) ⇒ (2): Supp ose there exists a matrix Λ ∈ R d × d suc h that ∂ θ j log q θ ⋆ = d X k =1 Λ j k Φ θ ⋆ ,k . Fix u ⋆ : R d → R that is W -orthogonal to ∇ x ∂ θ j log q θ ⋆ for j = 1 , . . . , d arbitrarily . Com bining the assumption with Lemma 3, w e hav e E θ ⋆ [( A θ ⋆ u ⋆ ) A θ ⋆ ( ∇ x ∂ θ j log q θ ⋆ )] = d X k =1 Λ j k E θ ⋆ [( A θ ⋆ u ⋆ ) A θ ⋆ ( ∇ x Φ θ ⋆ ,j )] = d X k =1 Λ j k E θ ⋆ [ ⟨ u ⋆ , ∇ x ∂ θ k log q θ ⋆ ⟩ ] = 0 , whic h shows u ⋆ is A θ ⋆ -orthogonal to ∇ x ∂ θ j log q θ ⋆ and completes (3) ⇒ (2). (2) ⇒ (1): Supp ose for an y element u ⋆ that is W -orthogonal to ∇ x ∂ θ j log q θ ⋆ , it is also A θ ⋆ - orthogonal to ∇ x ∂ θ j log q θ ⋆ for j = 1 , . . . , d . Let u ⋆ j b e the W -orthogonal elemen ts deﬁned b y (11). By Lemma 3, w e hav e E θ ⋆ [( A θ ⋆ u ⋆ j ) A θ ⋆ ( ∇ x Φ θ ⋆ ,k )] = 0 , k = 1 , . . . , d. By Theorem 1, w e can decomp ose the asymptotic v ariance of the MLE: A V ar h ˆ θ MLE i = A V ar h ˆ θ SM i + G − 1 ( T + S + S ⊤ ) G − 1 , where S ∈ R d × d and T ∈ R d × d are the inner pro duce matrices whose ( j, k )-en tries are deﬁned by S j k : = E θ ⋆ [ A θ ⋆ u ⋆ j A θ ⋆ ( ∇ x ∂ θ k log q θ ⋆ )] , T j k : = E θ ⋆ [( A θ ⋆ u ⋆ j )( A θ ⋆ u ⋆ k )] , resp ectiv ely . Since S = O , w e hav e A V ar[ ˆ θ MLE ] − A V ar[ ˆ θ SM ] = G − 1 T G − 1 . How ev er, we also hav e A V ar[ ˆ θ SM ] − A V ar[ ˆ θ MLE ] = G − 1 T G − 1 b y Lemma 4, so w e ﬁnally obtain A V ar[ ˆ θ SM ] = A V ar[ ˆ θ MLE ], whic h implies that the score matching estimator is asymptotically eﬃcient and concludes (2) ⇒ (1). App endix C Main results on more general domains In this app endix, w e giv e an extension of our main results on Riemannian manifolds. App endix C.1 Preliminaries W e b egin with preparing notations, review a generalization of score matc hing (Williams & Liu, 2022), and in tro ducing a Stein op erator w e emplo y in the App endix. 25 Let M b e an oriented and connected Riemannian manifold with corners, and let d x b e the v olume form giv en by the Riemannian metric on M . Let ∂ M be the boundary of M , and let N and d s denote the unit outw ard normal v ector ﬁeld and v olume form on ∂ M , resp ectiv ely . The Riemannian metric on M and the induced norm on the tangen t space T x M are denoted b y ⟨ · , · ⟩ and ∥ · ∥ , resp ectiv ely . Let C ∞ ( M ) b e the space of smo oth functions on M , and let X ∞ ( M ) b e the space of smo oth v ector ﬁelds on M . The gradient op erator on M is denoted by ∇ M , and the divergence of a vector ﬁeld f ∈ X ∞ ( M ) is denoted b y ∇ M · f . Let the parameter space Θ ⊂ R d b e an op en set, and let q θ b e a probabilit y density on M with resp ect to d x such that q θ > 0 almost ev erywhere. W e assume that the mo del { q θ | θ ∈ Θ } is iden tiﬁable; that is, q θ = q θ ′ if and only if θ = θ ′ . W e also assume that q θ and test functions f θ,j ∈ X ∞ ( M ) for j = 1 , . . . , d are smo oth with resp ect to b oth θ and x . Let X 1 , . . . , X n b e i.i.d. random v ariables from q θ ⋆ . Fix a weigh t function w ∈ C ∞ ( M ) that is strictly p ositive almost ev erywhere. The case w = 1 and M = R d corresp onds to the main text. The w eighted score matc hing estimator ˆ θ wSM is giv en as a solution to the estimating equations: 1 n n X i =1 ∂ θ j H w ( X i , θ ) = 0 , j = 1 , . . . , d, where the weigh ted Hyv¨ arinen score H w is deﬁned by H w ( x, θ ) : = w ( x ) ∥∇ M log q θ ( x ) ∥ 2 / 2 + ∇ M · ( w ( x ) ∇ M log q θ ( x )). Under the condition R M ∇ M · ( w q θ ⋆ ∇ M log q θ ) d x = 0, w e hav e the relation E θ ⋆ [ H w ( X , θ )] = E θ ⋆ [ w ∥∇ M log q θ ⋆ − ∇ M log q θ ∥ 2 ] / 2 + c, where X ∼ q θ ⋆ and c is a constant indep endent of θ . Thus, minimizing E θ ⋆ [ H w ( X , θ )] o v er Θ is equiv alent to minimizing E θ ⋆ [ w ∥∇ M log q θ ⋆ − ∇ M log q θ ∥ 2 ]. W e employ the follo wing w eighted Stein operator: A w θ f : = ∇ M · ( q θ w f ) q θ , f ∈ X ∞ ( M ) . F or a test function f satisfying the condition R M ∇ M · ( q θ ⋆ w f ) d x = 0, the identit y E θ ⋆ [ A w θ ⋆ f ] = 0 holds. The SMoM estimator based on the test functions f θ, 1 , . . . , f θ,d is given as a solution to the estimating equations n − 1 P n i =1 A w θ f θ,j ( X i ) = 0 for j = 1 , . . . , d . The following lemma states that the weigh ted score matching estimator is an SMoM estimator. Lemma 6 (Lemma 3.2 of Kume & W alker (2026)) . The SMoM estimator b ase d on the test functions f θ,j : = ∇ M ∂ θ j log q θ , j = 1 , . . . , d is the weighte d sc or e matching estimator. Pr o of. Since f θ,j = ∇ M ∂ θ j log q θ is a solution of the equation A w θ f θ,j ( x ) = ∂ θ j H w ( x, θ ), the es- timating equations of weigh ted score matc hing coincide with that of SMoM based on the test functions. Remark 5 (The weigh t function w ) . The weight function w assists in satisfying the b oundary c ondition. In M = R d , the c ondition R M ∇ M · ( w q θ ⋆ f ) d x = 0 is satisﬁe d for f ∈ X ∞ ( M ) if we have lim ∥ x ∥→∞ w ( x ) q θ ⋆ ( x ) f ( x ) = 0 . If M is c omp act and w | ∂ M = 0 , by Stokes’ the or em, we have Z M ∇ M · ( w q θ ⋆ f ) d x = Z ∂ M w q θ ⋆ ⟨ f , N ⟩ d s, which automatic al ly vanishes. 26 App endix C.2 Regularit y conditions for the generalization W e assume the following conditions for the model { q θ | θ ∈ Θ } and test functions f θ : C1. E θ [( A w θ f θ ) 2 ] < ∞ and E θ [ A w θ f θ ] = 0. C2. E θ [ A w θ ( ∂ θ j f θ )] = 0 , j = 1 , . . . , d . C3. E θ [ | w ⟨ f θ , ∇ M ∂ θ j log q θ ⟩| ] < ∞ , j = 1 , . . . , d . F or the SMoM estimator based on the test functions f θ, 1 , . . . , f θ,d , we assume the following condi- tions: C4. A w θ ⋆ f θ ⋆ , 1 , . . . , A w θ ⋆ f θ ⋆ ,d are linearly indep enden t. C5. The matrix G SMoM ∈ R d × d whose ( j, k )-th en try is deﬁned by ( G SMoM ) j k : = E θ ⋆ h ∂ θ k A w θ f θ,j    θ = θ ⋆ i is in vertible. C6. ˆ θ SMoM is consistent and asymptotic linear; that is, ˆ θ SMoM − θ ⋆ has the follo wing representation: ˆ θ SMoM − θ ⋆ = − G − 1 SMoM 1 n n X i =1    A w θ ⋆ f θ ⋆ , 1 ( X i ) . . . A w θ ⋆ f θ ⋆ ,d ( X i )    + o p  n − 1 / 2  . (14) App endix C.3 Generalized canonical decomp osition of SMoM estimators W e then presen t extensions of our main result. Under the regularit y conditions and Lemma 6, the w eighted score matc hing estimator ˆ θ wSM has the follo wing asymptotic linear representation: ˆ θ wSM − θ ⋆ = − G − 1 1 n n X i =1    A w θ ⋆ ( ∇ M ∂ θ 1 log q θ ⋆ )( X i ) . . . A w θ ⋆ ( ∇ M ∂ θ d log q θ ⋆ )( X i )    + o p  n − 1 / 2  (15) where G ∈ R d × d are the matrices whose ( j, k )-th en tries are deﬁned by G j k : = E θ ⋆ h ∂ θ k A w θ ( ∇ M ∂ θ j log q θ )    θ = θ ⋆ i . W e ﬁrst introduce weigh ted W -orthogonalit y and A w θ ⋆ -orthogonalit y . Deﬁnition 3. A test function f is called weigh ted W -orthogonal to a test function g if we hav e E θ ⋆ [ w ⟨ g, f ⟩ ] = 0. A test function f is called A w θ ⋆ -orthogonal to a test function g if w e ha ve E θ ⋆ [( A w θ ⋆ g )( A w θ ⋆ f )] = 0. The following theorem, whic h is a generalized result of Theorem 1, states that the asymptotic linear represen tation of the SMoM estimator after centering the weigh ted score matching estimator. It is pro ven b y combining the asymptotic linear represen tation (14)-(15) and Lemma 7, and the pro of follo ws the same steps as that of Theorem 1, so it is omitted. 27 Theorem 4 (Generalized canonical decomp osition of SMoM estimator) . Under the r e gularity c on- ditions, the asymptotic line ar r epr esentation of ˆ θ SMoM is de c omp ose d as fol lows: ˆ θ SMoM − θ ⋆ =  ˆ θ wSM − θ ⋆  − G − 1 1 n n X i =1    A w θ ⋆ u ⋆ 1 ( X i ) . . . A w θ ⋆ u ⋆ d ( X i )    + o p  n − 1 / 2  , wher e e ach u ⋆ j ∈ X ∞ ( M ) is weighte d W -ortho gonal to the test function c orr esp onding to weighte d sc or e matching: E θ ⋆ [ w ⟨ u ⋆ j , ∇ M ∂ θ k log q θ ⋆ ⟩ ] = 0 , k = 1 , . . . , d. The follo wing lemma tells us that G SMoM is an inner pro duct matrix, and in particular, G is a Gram matrix. Lemma 7. Under the r e gularity c onditions, for the test functions f θ, 1 , . . . , f θ,d , we have E θ ⋆ h ∂ θ k A w θ f θ,j    θ = θ ⋆ i = E θ ⋆ [ w ⟨ f θ ⋆ ,j , ∇ M ∂ θ k log q θ ⋆ ⟩ ] , j , k = 1 , . . . , d. In p articular, we have G j k = E θ ⋆  w ⟨∇ M ∂ θ j log q θ ⋆ , ∇ M ∂ θ k log q θ ⋆ ⟩  . Pr o of. Observe the relation ∂ θ k A w θ f θ,j = A w θ ( ∂ θ k f θ,j ) + w ⟨ f θ,j , ∇ M ∂ θ k log q θ ⟩ . T ogether with E θ ⋆ [ A w θ ⋆ ( ∂ θ k f θ ⋆ ,j )] = 0 under the regularity conditions, taking the exp ectation of this yields E θ ⋆ [ ∂ θ k A w θ f θ,j | θ = θ ⋆ ] = E θ ⋆ [ w ⟨ f θ ⋆ ,j , ∇ M ∂ θ k log q θ ⋆ ⟩ ], whic h completes the pro of. Example 5 (Exp onen tial families on Riemannian manifolds) . Consider an exp onen tial family whose densit y is deﬁned by q θ ( x ) = 1 Z ( θ ) exp  t ( x ) ⊤ θ + b ( x )  , Z ( θ ) = Z M exp  t ( x ) ⊤ θ + b ( x )  d x, (16) where t j ∈ C ∞ ( M ) , j = 1 , . . . , d is a suﬃcient statistics and b ∈ C ∞ ( M ) is a base measure. The SMoM estimator ˆ θ SMoM based on (parameter-indep enden t) test functions f 1 , . . . , f d has the closed-form solution: ˆ θ SMoM = − G − 1 SMoM ,n 1 n n X i =1    ∇ M · ( w f 1 )( X i ) + w ( X i ) ⟨ f 1 ( X i ) , ∇ M b ( X i ) ⟩ . . . ∇ M · ( w f d )( X i ) + w ( X i ) ⟨ f d ( X i ) , ∇ M b ( X i ) ⟩    , where G SMoM ,n ∈ R d × d is the empirical inner pro duct matrix whose ( j, k )-th entry is deﬁned by ( G SMoM ,n ) j k : = n − 1 P n i =1 w ( X i ) ⟨ f j ( X i ) , ∇ M t k ( X i ) ⟩ . The s core matc hing estimator ˆ θ wSM is the SMoM estimator based on the test functions f j = ∇ M t j , which also has the closed-form solution (Mardia et al., 2016; Scealy & W o o d, 2023): ˆ θ wSM = − G − 1 n 1 n n X i =1    ∇ M · ( w ∇ M t 1 )( X i ) + w ( X i ) ⟨∇ M t 1 ( X i ) , ∇ M b ( X i ) ⟩ . . . ∇ M · ( w ∇ M t d )( X i ) + w ( X i ) ⟨∇ M t d ( X i ) , ∇ M b ( X i ) ⟩    , 28 where G n ∈ R d × d is the empirical inner product matrix whose ( j, k )-th entry is deﬁned b y ( G n ) j k : = n − 1 P n i =1 w ( X i ) ⟨∇ M t j ( X i ) , ∇ M t k ( X i ) ⟩ . Consider the test functions f j = ∇ M t j + u ⋆ j for j = 1 , . . . , d , where u ⋆ j is weigh ted W -orthogonal to ∇ M t k for k = 1 , . . . , d . The SMoM estimator based on this test functions can b e decomposed as follows: ˆ θ SMoM − θ ⋆ = −  G n + ( G SMoM ,n − G n )  − 1 1 n n X i =1    A w θ ⋆ ( ∇ M t 1 )( X i ) + A w θ ⋆ u ⋆ 1 ( X i ) . . . A w θ ⋆ ( ∇ M t d )( X i ) + A w θ ⋆ u ⋆ d ( X i )    =  ˆ θ wSM − θ ⋆  − G − 1 1 n n X i =1    A w θ ⋆ u ⋆ 1 ( X i ) . . . A w θ ⋆ u ⋆ d ( X i )    + o p  n − 1 / 2  . App endix C.4 Impro ving asymptotic v ariance of weigh ted score matching esti- mator W e next construct an SMoM estimator impro ving up on the weigh ted score matching estimator in the asymptotic v ariance. Fix θ 0 ∈ Θ and ˜ v α ∈ X ∞ ( M ) , α = 1 , . . . , K arbitrarily . Using the test function ˜ v α , w e deﬁne v θ 0 ,α b y the following orthogonalization pro cedure: v θ 0 ,α : = ˜ v α − d X j =1  F θ 0 G − 1 θ 0  αj ∇ M ∂ θ j log q θ 0 , (17) where F θ 0 ∈ R K × d and G θ 0 ∈ R d × d are inner pro duct matrices whose ( α , j )-th entry and ( j, k )-th en try are deﬁned by ( F θ 0 ) αj : = E θ 0  w ⟨ ˜ v α , ∇ M ∂ θ j log q θ 0 ⟩  , ( G θ 0 ) j k : = E θ 0  w ⟨∇ M ∂ θ j log q θ 0 , ∇ M ∂ θ k log q θ 0 ⟩  , resp ectiv ely . Observe the relation E θ 0 [ w ⟨∇ M ∂ θ j log q θ 0 , v θ 0 ,α ⟩ ] = 0 , j = 1 , . . . , d. Denote b y ˆ θ [ θ 0 ] the SMoM estimator based on the follo wing test functions: f θ,j : = ∇ M ∂ θ j log q θ − K X α =1  S θ 0 T − 1 θ 0  j α v θ 0 ,α , j = 1 , . . . , d, (18) where S θ 0 ∈ R d × K and T θ 0 ∈ R K × K are inner pro duct matrices whose ( j, α )-th entry and ( α, β )-th en try are deﬁned by ( S θ 0 ) j α : = E θ 0  A w θ 0 ( ∇ M ∂ θ j log q θ 0 ) A w θ 0 v θ 0 ,α  , ( T θ 0 ) αβ : = E θ 0  A w θ 0 v θ 0 ,α   A w θ 0 v θ 0 ,β  . F or the matrices F θ 0 , G θ 0 , S θ 0 , T θ 0 , w e make the follo wing assumption. Assumption 2. F θ 0 , G θ 0 , S θ 0 , T θ 0 are con tinuous at θ 0 = θ ⋆ . 29 The follo wing theorem, whic h is generalized result of Theorem 2, tells us that ˆ θ [ θ ⋆ ] impro v es the asymptotic v ariance of ˆ θ wSM . F urthermore, such impro vemen t remains even when ˆ θ 0 is plugged in to θ ⋆ . Theorem 5. Under the r e gularity c onditions, we have A V ar h ˆ θ wSM i − A V ar h ˆ θ [ θ ⋆ ] i = G − 1 S θ ⋆ T − 1 θ ⋆ S ⊤ θ ⋆ G − 1 ⪰ 0 . The e quality A V ar[ ˆ θ wSM ] = A V ar[ ˆ θ [ θ ⋆ ]] holds if and only if S θ ⋆ = O . F urthermor e, under Assump- tion 1, for an estimator ˆ θ 0 satisfying ˆ θ 0 − θ ⋆ = O p  n − 1 / 2  , we have ˆ θ [ ˆ θ 0 ] − θ ⋆ = ˆ θ [ θ ⋆ ] − θ ⋆ + o p  n − 1 / 2  . Pr o of. W e ﬁrst show that A V ar[ ˆ θ wSM ] − A V ar[ ˆ θ [ θ ⋆ ]] = G − 1 S θ ⋆ T − 1 θ ⋆ S ⊤ θ ⋆ G − 1 . Com bining with Theo- rem 4 and (18), w e can decomp ose the asymptotic linear represen tation of ˆ θ [ θ ⋆ ]: ˆ θ [ θ ⋆ ] − θ ⋆ =  ˆ θ wSM − θ ⋆  + G − 1 S θ ⋆ T − 1 θ ⋆ 1 n n X i =1  A w θ ⋆ v θ ⋆ , 1 , . . . , A w θ ⋆ v θ ⋆ ,K  ⊤ + o p  n − 1 / 2  . Th us, we can also decompose the asymptotic v ariance of ˆ θ [ θ ⋆ ] as follo ws: A V ar h ˆ θ [ θ ⋆ ] i = A V ar h ˆ θ wSM i + G − 1 ( − S θ ⋆ T − 1 θ ⋆ S ⊤ θ ⋆ ) G − 1 . Since T θ ⋆ and G is p ositive deﬁnite, w e hav e A V ar[ ˆ θ [ θ ⋆ ]] = A V ar[ ˆ θ wSM ] if and only if S θ ⋆ = 0 holds. Next, w e show ˆ θ [ ˆ θ 0 ] − θ ⋆ = ˆ θ [ θ ⋆ ] − θ ⋆ + o p  n − 1 / 2  . Using Theorem 4, (17), and (18), w e can decomp ose the asymptotically linear represen tation of ˆ θ [ ˆ θ 0 ]: ˆ θ [ ˆ θ 0 ] − θ ⋆ =  ˆ θ wSM − θ ⋆  + G − 1 S ˆ θ 0 T − 1 ˆ θ 0 1 n n X i =1    A w θ ⋆ v ˆ θ 0 , 1 . . . A w θ ⋆ v ˆ θ 0 ,K    + o p  n − 1 / 2  =  ˆ θ wSM − θ ⋆  + G − 1 S ˆ θ 0 T − 1 ˆ θ 0 1 n n X i =1    A w θ ⋆ ˜ v 1 . . . A w θ ⋆ ˜ v K    − G − 1 S ˆ θ 0 T − 1 ˆ θ 0 F ˆ θ 0 G − 1 ˆ θ 0 1 n n X i =1    A w θ ⋆ ( ∇ M ∂ θ 1 log q ˆ θ 0 ) . . . A w θ ⋆ ( ∇ M ∂ θ d log q ˆ θ 0 )    + o p  n − 1 / 2  . (19) Consider T aylor expansion of ∂ θ j log q ˆ θ 0 . Since the identit y E θ ⋆ [ A w θ ⋆ ( ∇ M ∂ θ j ∂ θ k log q θ ⋆ )] = 0 holds under the regularit y conditions, we ha v e 1 n n X i =1 A w θ ⋆ ( ∇ M ∂ θ j log q ˆ θ 0 ) = 1 n n X i =1 A w θ ⋆ ( ∇ M ∂ θ j log q θ ⋆ )( X i ) + d X k =1 A w θ ⋆  ∇ M ∂ θ j ∂ θ k log q θ ⋆  ( X i )( ˆ θ 0 − θ ⋆ ) k ! + o p  ∥ ˆ θ 0 − θ ⋆ ∥  = 1 n n X i =1 A w θ ⋆ ( ∇ M ∂ θ j log q θ ⋆ )( X i ) + o p (1) O p  n − 1 / 2  + o p  n − 1 / 2  . (20) 30 Substituting (20) to (19) and using contin uous mapping theorem, w e ﬁnally obtain ˆ θ [ ˆ θ 0 ] − θ ⋆ =  ˆ θ wSM − θ ⋆  + G − 1 S θ ⋆ T − 1 θ ⋆ 1 n n X i =1    A w θ ⋆ v θ ⋆ , 1 . . . A w θ ⋆ v θ ⋆ ,K    + o p  n − 1 / 2  = ˆ θ [ θ ⋆ ] − θ ⋆ + o p  n − 1 / 2  , whic h completes the pro of. Let U θ 0 ∈ R d × d b e the inner matrix whose ( j, k )-th en try deﬁned by ( U θ 0 ) j k : = E θ 0  A w θ 0 ( ∇ M ∂ θ j log q θ 0 ) A w θ 0 ( ∇ M ∂ θ k log q θ 0 )  . The asymptotic v ariance of the weigh ted score matching estimator is given by A V ar[ ˆ θ wSM ] = G − 1 U θ ⋆ G − 1 . As the estimate of asymptotic relative eﬃciency A V ar[ ˆ θ [ ˆ θ 0 ]] j j / A V ar[ ˆ θ wSM ] j j , we em- plo y 1 −  G − 1 ˆ θ 0 S ˆ θ 0 T − 1 ˆ θ 0 S ⊤ ˆ θ 0 G − 1 ˆ θ 0  j j  G − 1 ˆ θ wSM U ˆ θ wSM G − 1 ˆ θ wSM  j j , j = 1 , . . . , d. (21) App endix D Additional n umerical exp eriments In this app endix, we provide additional numerical exp erimen ts illustrating the SMoM estimator constructed in Theorem 5. W e focus on estimating the parameters θ . The estimates ˆ θ wSM , ˆ θ [ θ ⋆ ], and ˆ θ [ ˆ θ wSM ] are calculated using an i.i.d. sample of size n dra wn from the distribution with θ ⋆ . This pro cedure is iterated in 1000 times, and the MSE of eac h estimates are calculated. The estimates of asymptotic relative eﬃciency given by (21) are also calculated. ˜ v 1 , . . . , ˜ v K is constructed using neural netw orks. Sp eciﬁcally , ˜ v α is implemented as a neural netw ork with ﬁve hidden lay ers, each consisting of three no des, and p -dimensional input and output la yers. W e employ tanh as the activ ation function, and all parameters are randomly initialized with v alues dra wn from N (0 , 1). T o ensure the output is a v alid v ector ﬁeld on M , the output v ector is pro jected on to the tangent space T x M via the orthogonal pro jection P x . Since the estimates ˆ θ [ θ ⋆ ] , ˆ θ [ ˆ θ wSM ] dep end on the c hoice of ˜ v 1 , . . . , ˜ v K , we ev aluate the p erformance o ver 10 diﬀerent pairs of initializations. The sample size n is set to 100. The exp ectations are approximated via Monte Carlo in tegration with 1000 samples. App endix D.1 Matrix Bingham distribution Consider the Stiefel manifold V k,p : = { X ∈ R p × k | X ⊤ X = I k } . The orthogonal pro jection is given b y P X ( Z ) = Z − X ( X ⊤ Z + Z ⊤ X ) / 2. The matrix Bingham distribution (Chikuse, 2003) has the follo wing density: q θ ( X ) = 1 Z ( A ) exp  tr( X ⊤ AX )  , where θ = A , A ∈ R p × p is a symmetric matrix. F or identiﬁabilit y , we ﬁx A pp = 0. W e set to ( p, k ) = (3 , 2), A ⋆ = diag (1 , 1 , 0). Since V 3 , 2 is compact manifold without b oundary , w e use the w eigh t function w ( X ) = 1. The num b er K of orthogonal elemen ts v aries in { 3 , 6 , 12 , 24 } . 31 T able 3 shows that b oth of ˆ θ [ θ ⋆ ] , ˆ θ [ ˆ θ SM ] do es not improv e the v ariance of the score matc hing estimator for all K . Figure 6 shows that the estimate of asymptotic relativ e eﬃciency concen trate near 1; that is, it implies that the score matc hing estimator has little ro om for impro vemen t. T able 3: MSE ratio of matrix Bingham distribution for ˆ θ [ ˆ θ SM ], ˆ θ [ θ ⋆ ] relative to ˆ θ SM . The res ults are represented in the format: median (min, max) across diﬀeren t pairs of ˜ v α . A v alue smaller than 1 indicates b etter performance. ˆ θ [ ˆ θ SM ] ˆ θ [ θ ⋆ ] K = 3 A 11 1 . 006 (0 . 990 , 1 . 016) 1 . 006 (1 . 000 , 1 . 016) A 22 1 . 007 (0 . 997 , 1 . 018) 1 . 007 (1 . 005 , 1 . 022) A 12 1 . 008 (1 . 000 , 1 . 016) 1 . 008 (0 . 995 , 1 . 015) A 13 1 . 006 (1 . 000 , 1 . 011) 1 . 004 (0 . 993 , 1 . 015) A 23 1 . 007 (1 . 000 , 1 . 014) 1 . 009 (0 . 994 , 1 . 017) K = 6 A 11 1 . 010 (1 . 001 , 1 . 015) 1 . 012 (0 . 997 , 1 . 017) A 22 1 . 014 (0 . 995 , 1 . 023) 1 . 011 (1 . 005 , 1 . 018) A 12 1 . 011 (0 . 996 , 1 . 016) 1 . 011 (1 . 006 , 1 . 023) A 13 1 . 011 (0 . 995 , 1 . 027) 1 . 011 (0 . 993 , 1 . 018) A 23 1 . 014 (0 . 999 , 1 . 032) 1 . 017 (1 . 005 , 1 . 029) K = 12 A 11 1 . 016 (1 . 003 , 1 . 025) 1 . 023 (1 . 006 , 1 . 040) A 22 1 . 020 (1 . 005 , 1 . 031) 1 . 020 (0 . 998 , 1 . 026) A 12 1 . 021 (1 . 005 , 1 . 047) 1 . 026 (1 . 010 , 1 . 041) A 13 1 . 021 (1 . 002 , 1 . 032) 1 . 012 (1 . 004 , 1 . 038) A 23 1 . 025 (1 . 014 , 1 . 032) 1 . 033 (1 . 001 , 1 . 045) K = 24 A 11 1 . 037 (1 . 011 , 1 . 051) 1 . 046 (1 . 018 , 1 . 065) A 22 1 . 039 (1 . 024 , 1 . 056) 1 . 035 (1 . 016 , 1 . 055) A 12 1 . 045 (1 . 023 , 1 . 065) 1 . 038 (1 . 023 , 1 . 050) A 13 1 . 041 (1 . 032 , 1 . 052) 1 . 048 (1 . 027 , 1 . 095) A 23 1 . 039 (1 . 026 , 1 . 072) 1 . 051 (1 . 022 , 1 . 060) 32 Figure 6: MSE ratio of matrix Bingham distribution for ˆ θ [ ˆ θ SM ] relativ e to ˆ θ SM v ersus the geometric mean of estimates given by (21). V alues less than 1 on the horizontal axis indicate that corre- sp onding SMoM estimator impro v es the v ariance of the score matching estimator. Poin ts near the diagonal indicate that the estimate of the asymptotic relativ e eﬃciency is reliable. 33

The geometry of Stein's method of moments: A canonical decomposition via score matching

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment