Functional Estimation of Manifold-Valued Diffusion Processes

Nonstationary high-dimensional time series are increasingly encountered in biomedical research as measurement technologies advance. Owing to the homeostatic nature of physiological systems, such datasets are often located on, or can be well approxima…

Authors: Jacob McErlean, Hau-Tieng Wu

Functional Estimation of Manifold-Valued Diffusion Processes
FUNCTIONAL ESTIMA TION OF MANIF OLD-V ALUED DIFFUSION PR OCESSES JACOB MCERLEAN AND HA U-TIENG WU Abstract. Nonstationary high-dimensional time series are increasingly en- countered in biomedical research as measurement technologies adv ance. Ow- ing to the homeostatic nature of physiological systems, such datasets are often located on, or can b e w ell appro ximated b y , a lo w-dimensional manifold. Mo d- eling such datasets b y manifold-v alued Itˆ o diffusion pro cesses has b een shown to pro vide v aluable insights and to guide the design of algorithms for clinical applications. In this pap er, we prop ose Nadaray a-W atson type nonparametric estimators for the drift vector field and diffusion matrix of the pro cess from one tra jectory . Assuming a time-homogeneous sto c hastic differential equation on a smo oth complete manifold without b oundary , w e sho w that as the sam- pling in terv al and k ernel bandwidth v anish with increasing tra jectory length, recurrence of the pro cess yields asymptotic consistency and normality of the drift and diffusion estimators, as well as the asso ciated o ccupation density . Analysis of the diffusion estimator further pro duces a tangent space estima- tor for dep endent data, whic h has its own interest and is essential for drift estimation. Numerical experiments across a range of manifold configurations support the theoretical results. 1. Introduction High-dimensional time series data ha v e b ecome increasingly p erv asive across a range of quan titativ e disciplines, including but not exclusively economics, finance, geoscience, and medicine [ 11 ]. In such settings, multiple interrelated v ariables are recorded sim ultaneously o v er time, often at high sampling frequencies and ov er long horizons, resulting in complex, high-dimensional temp oral datasets. A cen tral in- feren tial ob jectiv e is to c haracterize and quantify the underlying system dynamics, either for scien tific understanding, real-time monitoring and control, or impro v ed prediction. This requires the developmen t of statistical metho ds capable of ade- quately capturing both temp oral dep endence and cross-sectional in teractions in the data. A motiv ating example from biomedicine is the repurp osing of underutilized in- traop erativ e signals, such as arterial bloo d pressure (ABP) [ 15 , 55 ] and photo- pleth ysmogram (PPG) [ 26 ], to predict clinical outcomes. By leveraging physiolog- ical knowledge, these signals can b e transformed into high-dimensional time series and in terpreted as sensors probing the dynamics of the cardiov ascular system. In particular, each cardiac cycle in an ABP signal can b e viewed as the system’s re- sp onse to a heartb eat stroke, analogous to the sound pro duced when a drum is struc k. This p erspective evok es the classical in v erse problem of whether one can “hear the shap e of a drum”. [ 30 ]. How ever, the cardio v ascular system is far more complex: it is adaptive rather than static, and its state ev olves con tin uously o ver time. Consequently , v ariabilit y across cardiac cycles [ 12 , 37 ], rather than any single 1 2 JACOB MCERLEAN AND HA U-TIENG WU w av eform, enco des critical information ab out physiological status. This observ a- tion motiv ates representing ABP as a sequence of functions, each corresp onding to a cardiac cycle, thereby forming a high-dimensional time series that captures tem- p oral evolution. Despite this v ariability , homeostatic regulation [ 41 ] ensures that these dynamics are not arbitrary but constrained by some underlying ph ysiological principles. These considerations naturally motiv ate a geometric p ersp ective for mo deling suc h data. High-dimensional biomedical time series are t ypically nonstationary and arise from deterministic dynamics mo dulated by stochastic perturbations. The underlying dynamics often evolv e on a low-dimensional, nonlinear manifold that is not directly observ able. Empirically , these tra jectories exhibit diffusion-like b ehav- ior ov er time, echoing reactiondiffusion mo dels [ 33 ] used in cardiac electrophysiol- ogy and sto chastic descriptions of heart rate v ariability [ 53 ]. While the underlying ph ysiological mechanisms are complex, only partially accessible, and usually qual- itativ e, we adopt a phenomenolo gic al approach and mo del the data as ev olving on a manifold endow ed with a diffusion pro cess. This approach provides a princi- pled framework for capturing intrinsic structure and enables the developmen t of algorithms for quantifying the evolving physiological dynamics. T o capture the ab ov e mentioned characteristics, we mo del a high-dimensional time series X := { x k } n k =1 ⊂ R p , where x k is sampled at time t k , using a sto chastic differen tial equation (SDE). Let ( M , g ) b e a d -dimensional Riemannian manifold isometrically em b edded in R p through ι with d ≤ p . Adapting Einstein summation con ven tion, consider an Itˆ o semimartingale X t satisfying the time-homo gene ous SDE in the Itˆ o differen tial form: (1) dX t = ν ( X t ) dt + σ α ( X t ) ◦ dW α t = µ ( X t ) dt + σ α ( X t ) dW α t ∈ M , where ν ∈ Γ ∞ ( T M ) and µ ∈ Γ ∞ ( T M ) denote the drift in the Stratonovic h and Itˆ o form, resp ectiv ely , σ α ∈ Γ ∞ ( T M ), α = 1 , . . . , r , r ∈ N , denote the diffusion vector fields, ◦ indicates the Stratonovic h sto chastic integral, and W t is standard r -dim Bro wnian motion. W e mo del X as discrete samples x k = ι ( X t k ) ∈ R p at t k = k ∆. Here, M reflects the geometric structure that constrains X t . Although the mo del pro vides a structured represen tation of the underlying sys- tem, in practice neither the parameters of the SDE nor the parametrization of the manifold M are known. Instead, we only ha v e access to the observed time series data X . This challenge has b een widely studied in the literature, though often considering M = R p ; that is, without incorp orating a nonlinear manifold in the mo del. The univ ariate case, M = R 1 , is b y now relatively well-understoo d; w e refer the reader to [ 2 , 10 ] and references therein for a broad, though not ex- haustiv e, ov erview. In the multiv ariate setting, M = R p , substan tial progress has b een made from v arious persp ectives, addressing a wide range of related challenges; non-exhaustiv e examples include nonparametric kernel-based drift and diffusion es- timators [ 8 , 14 , 21 ], direct cov ariance approach [ 25 ], maximal lik eliho o d approach [ 1 ], sparsit y approach [ 9 ], Malliavian-F ourier approach [ 5 , 38 ], principal comp onent analysis approach [ 4 , 13 , 16 ], sp ectral approach [ 18 , 22 ], multiscale structure based estimators [ 18 , 43 ], Gaussian pro cess approach [ 19 ], parametric approac h [ 48 ], etc. Among the challenges p osed b y this framework, in this paper we fo cus on esti- mating the drift and diffusion from X under the manifold mo del ( 1 ). Estimation of the o ccupation densities and tangent spaces arises as a byproduct. Our pri- mary motiv ation is [ 8 ] and [ 34 ] (in the con tinuous-time setting), which fo cuses 3 on M = R p . There, Nadara ya-W atson-type nonparametric estimators for drift and diffusion are prop osed, and their asymptotic prop erties are established us- ing Harris recurrence structure. T o recall the intuition b ehind the kernel regres- sion approach in [ 8 ], note that the increments 1 ∆ ( x k +1 − x k ) and outer pro ducts 1 ∆ ( x k +1 − x k )( x k +1 − x k ) ⊤ act as noisy one-step estimators of the drift ι ∗ µ ( x k ) and diffusion matrix P r l =1 ι ∗ σ l ( x k )[ ι ∗ σ l ( x k )] ⊤ . Individually , these quantities are highly v ariable. Kernel regression stabilizes them by aggregating lo cal estimators, assigning larger w eights to observ ations near a target p oint x via a kernel function applied to a distance scaled b y bandwidth h . The resulting weigh ted av erages yield consisten t estimators of the drift and diffusion. Our main con tribution is multifold, circling around generalization of the k ernel regression to the manifold setup with dep endent dataset and establishing theoreti- cal guaran tees. First, although one might attempt to directly apply the Euclidean estimator of [ 8 ], arguing that manifolds are lo cally w ell approximated by affine subspaces, this in tuition fails. Even under contin uous observ ation, curv ature gen- erates a leading bias term in the Euclidean drift estimation. Sp ecifically , with Itˆ o’s form ula, the Euclidean-em b edded pro cess Z t := ι ( X t ) satisfies d Z t =  ι ∗ ν + 1 2 D ι ∗ σ α ( ι ∗ σ α )  ( Z t ) dt + ι ∗ σ α ( Z t ) dW α t , (2) where ι ∗ ν + 1 2 D ι ∗ σ α ( ι ∗ σ α )  = ι ∗ µ = ι ∗ ν + ι ∗ | x ∇ σ α σ α and D ι ∗ σ α ( ι ∗ σ α ) is the cov ari- an t deriv ativ e of the vector field ι ∗ σ α along itself in R p satisfying D ι ∗ σ α ( ι ∗ σ α ) = ι ∗ | x ∇ σ α σ α + I I x ( σ α , σ α ). This computation shows that if w e estimate the drift term using the Euclidean estimator, the normal comp onen t, I I x ( σ α , σ α ), biases the estimator. See Section S.1.4 for more details. Therefore, the k ernel regression pro- cedure must b e carefully designed. By contrast, curv ature do es not asymptotically affect diffusion estimation, and the diffusion matrix spans the tangent space under mild assumptions. This insigh t motiv ates our construction: we first estimate the tangen t space using the diffusion estimator, and then recov er the drift using the estimated tangen t structure. The resulting tangen t space estimator, deriv ed from dep enden t data, app ears to b e new in the manifold learning literature and is of indep enden t interest. Second, the sampling scheme p oses substantial challenges. Kernel regression requires sufficiently dense lo cal sampling. In contrast to the i.i.d. framew ork commonly assumed in manifold learning [ 47 , 50 , 56 ], our data consist of a single realization of the SDE, and its empirical distribution does not automatically pro vide uniform lo cal cov erage. T o address this issue, we exploit Harris recurrence on manifolds, whic h guarantees the existence of an inv arian t measure serving as a surrogate sampling density . T o analyze the prop osed estimators, w e need a gener- alized Nummelin splitting sc heme for Harris recurrent processes, which ma y or may not hav e life cycles [ 34 ]. Com bined with the Darling-Kac result that quantifies how the time X t sp ends in a region scales ov er long perio ds, w e conv ert additive func- tionals asso ciated with the estimators into in tegrals on the manifold with resp ect to X t ’s in v arian t measure. Under suitable conditions on bandwidth and sampling rates, together with careful analysis of the resulting Gaussian mixtures, we estab- lish a cen tral limit theorem for all prop osed estimators as the observ ation horizon tends to infinity . This pap er is organized as follows. Section 2 presen ts the mathematical model. Section 3 details the prop osed algorithm. Theoretical results are pro vided in Sec- tion 4 , follow ed b y n umerical simulations in Section 5 . Section 6 discusses the 4 JACOB MCERLEAN AND HA U-TIENG WU findings and outlines future researc h directions. All pro ofs and tec hnical details are giv en in the App endix. W e adapt the following notation in what follows. d − − → and p − − → stand for con v ergence in distribution and con vergence in probability . F or t wo sequences of random v ariables, A n = o p ( B n ) means A n /B n p − − → 0 and A n = O p ( B n ) means A n /B n is sto c hastically b ounded as n → ∞ . The sup erscript ⊤ means ma- trix transp ose. The symbol ⊗ denotes the Kronec ker pro duct and 1 A denotes the indicator function of the set A . Other commonly used sym b ols are listed in T able S.5 . 2. Ma thema tical Model W e now present our manifold-v alued diffusion mo del ( 1 ) for high-dimensional time series data, viewed as observ ations ov er a finite time p erio d from a sto c hastic dynamical system. Sp ecifically , we mo del the data as samples from a solution to an SDE evolving as a diffusion pro cess on a manifold. Assumption 1. Let ( M , g ) b e a d -dimensional, complete, connected, smo oth Rie- mannian manifold without b oundary , isometrically embedded into R p via an em- b edding ι . When the manifold is non-compact, assume the Ricci curv ature is low er b ounded; that is, Ric ≥ K g for some K ∈ R , and inj x ≥ δ > 0, where inj x is the injectivit y radius of x , for all x ∈ M . The Ricci curv ature and injectivit y assumptions hold automatically when M is compact. In the noncompact case, the lo w er b ound assumption on Ricci curv ature helps control the diffusion kernel behavior, and the uniform p ositiv e low er b ound assumption on the injectivity radius is imposed to preclude lo cal geometric collapse so that the v olume doubling prop erty holds. Let d g ( x, y ) denote the geo desic dis- tance b et w een x, y ∈ M and dV g denote the Riemannian volume measure. Denote b y B b ( M ), C ( M ), C 0 ( M ), and C ∞ c ( M ) the spaces of bounded measurable functions on M , contin uous functions on M , contin uous functions v anishing at infinit y , and smo oth functions with compact supp ort, resp ectiv ely . Let Γ( T M ) and Γ k ( T M ), k ∈ N ∪ {∞} , denote the space of con tin uous and C k v ector fields on M . When M is noncompact, denote c M := M ∪ { ∂ M } to b e the one-p oint compactification of M ; otherwise c M := M . Denote B ( M ) to b e the Borel σ -algebra on c M . F or a sequence x n ∈ M , w e know x n → ∂ M if and only if ∥ x n ∥ R q → ∞ [ 28 ]. Denote W r 0 := { ω ∈ C ([0 , ∞ ) , R r ) } and {F t } t> 0 its top ological σ -field generated by Borel cylinders. Consider the probability space (Ω , F , P ) and a r -dimensional standard Bro wnian motion W s = [ W 1 s , . . . , W r s ] ⊤ with the initial distribution µ 0 . W e then ha ve a filtrated probabilit y space (Ω , F ∗ , P ), where F ∗ := {F t | t ≥ 0 } is a right con tinuous filtration so that F := lim t →∞ F t and F t = ∩ ϵ> 0 σ { W u | u ≤ t + ϵ } . Clearly , W t is adapted to F ∗ . Let W ( M ) := { w ∈ C ([0 , ∞ ) , c M ) | w (0) ∈ M , and if w ( t ) = ∂ M then w ( t ′ ) = ∂ M for all t ′ ≥ t } , and B t ( W ( M )) b e the topological σ -field generated by Borel cylinders. The explosion time is defined on W ( M ) by e ( w ) := inf { t | w ( t ) = ∂ M } ; that is, ∂ M is the “trap” of w . When M is compact, e = ∞ [ 49 ]. 2.1. Quic k review of SDE on manifold and assumptions. Adapt notation from ( 1 ). It is well-kno wn [ 28 , Prop osition 1.2.9] that there exists a unique strong solution, which is an R p -v alued adapted semi-martingale X t supp orted on M ⊂ R p , 5 up to its explosion time e ( X ), to a time-homogeneous SDE on manifold in the Stratono vich form: X t = X 0 + Z t 0 ν ( X s ) ds + Z t 0 σ α ( X s ) ◦ dW α s (3) where we use Einstein summation conv ention, r ∈ N , W t is the standard Brownian motion with µ 0 = δ 0 , the initial v alue X 0 ∈ M follows the distribution λ supp orted on M and is indep enden t of W t , ◦ denotes the Stratono vic h sto chastic in tegral (e.g. [ 28 , e.g., 1.1.13]), and ν , σ 1 , · · · , σ r ∈ Γ ∞ ( T M ). Recall that a strong solution to ( 3 ) up to a stopping time τ is an F ∗ -adapted random pro cess X t ∈ W ( M ) defined on Ω such that for any f ∈ C ∞ ( M ) [ 28 , Definition 1.2.3], f ( X t ) = f ( X 0 ) + Z t 0 ( ν f )( X s ) ds + Z t 0 ( σ α f )( X s ) ◦ dW α s , 0 ≤ t < τ . (4) The SDE ( 3 ) is o ften written in Itˆ o differential form as that in ( 1 ). W e presen t the Stratono vich form of the integral here b ecause it respects the nonlinear geometry; that is, the Stratonovic h SDE is intrinsic to the manifold and its driving vector fields transform naturally under diffeomorphisms via push-forward [ 28 , Prop. 1.2.4]. Let µ X denote the probability law on W ( M ) of the solution X t to ( 3 ) and { P t } t ≥ 0 denote the asso ciated transition semigroup. Recall that for S ∈ B ( M ) and λ = δ x , where x ∈ M , for t ≥ 0, P t ( x, S ) := µ X { w ∈ W ( M ) | w ( t ) ∈ S } and P t f ( x ) := E x f ( X t ) = R W ( M ) f ( ω t ) dµ Y ( ω ), where f ∈ B b ( M ) and E x is the exp ectation asso ciated with P x . { P t } of the solution to ( 3 ) is a strong Marko v pro cess [ 20 , IX. § 3- § 5]. With a co ordinate system o ver a neigh b orhoo d U ⊂ M of x ∈ M , lo cally there is a symmetric and non-negative definite matrix-v alued function, π ( x ) := ( π ij ( x )) ∈ R d × d , where π ij ( x ) = X α σ i α ( x ) σ j α ( x ) . W e call π the diffusion matrix . X t is a diffusion pro cess whose infinitesimal gener- ator L satisfies Lf ( x ) := lim t → 0 E x [ f ( X t )] − f ( x ) t = 1 2 r X α =1 σ α ( σ α f )( x ) + µf ( x ) , where f ∈ C ∞ c ( M ); that is, X t is a L -diffusion process [ 28 , Section 1.3]. In lo cal co ordinates, the generator admits the representation Lf ( x ) = 1 2 π ij ( x ) ∂ 2 f ∂ x i ∂ x j ( x ) + b i ( x ) ∂ f ∂ x i ( x ) , where b i := µ i + P α σ k α ∂ k σ i α . The additional P α σ k α ∂ k σ i α in b i arises from the con version b et w een the Itˆ o form and Stratonovic h form ulations. Harris recurrence is the key to ol for our algorithm analysis. Recall that [ 27 , Definition 1.1] X = ( X t ) t ≥ 0 with X 0 = x ∈ M a.s. is called Harris r e curr ent if there exists a non-trivial σ -finite measure η on ( M , B ( M )) so that for any A ∈ B ( M ) so that η ( A ) > 0, we ha v e P x ( R ∞ 0 1 A ( X t ) dt = ∞ ) = 1. It is well kno wn that a Harris recurrent pro cess implies the existence of a unique inv ariant measure ϕ X of { P t } , up to a constant multiple [ 27 , Theorem 1.2]. When ϕ X ( M ) = ∞ , we call X t nul l Harris r e curr ent . When ϕ X ( M ) < ∞ , we call X t p ositive Harris r e curr ent . Geometrically , the pro cess X t is Harris recurrent if, for every measurable subset of 6 JACOB MCERLEAN AND HA U-TIENG WU the manifold with p ositiv e v olume, the pro cess almost surely visits that set infinitely often (in time). F or more details on Harris recurrence, w e refer readers to [ 27 ]. W e imp ose the follo wing assumption. Assumption 2. Assume the initial measure of X t , λ , is compactly supp orted on C ⊂ M with dV g ( C ) > 0 and a b ounded density function, and the sp ectrum of π ( x ) is uniformly b ounded from ab o ve and a w ay from 0; that is, L is uniformly elliptic. When the manifold is non-compact, we imp ose further assumptions. (1) (conserv ation) The explosion time is infinite; that is, e ( Y ) = ∞ . (2) ( C 0 -prop ert y) { P t } is strong F eller. (3) (diffusion k ernel) The diffusion k ernel p t ( x, y ) on (0 , ∞ ) × M × M asso ciated with the transition semi-group is smo oth and strictly p ositiv e. (4) (Harris recurrence) X t is Harris recurrent with an inv arian t measure ϕ X . Note that in [ 8 ], rather than uniform ellipticit y , the H¨ ormander c ondition [ 49 , Chapter 7] is imp osed. Instead, we adopt uniform ellipticity since we need non- degenerate diffusion in order to design our drift estimator. The compact supp ort assumption on the initial measure is motiv ated b y biomedical applications, where it reflects the relatively homogeneous initial state of the dynamics. R emark 1 . Note that not ev ery choice of r is suitable for modeling and analyzing high-dimensional time series data if we need a non-de gener ate diffusion , which is essen tial for reco vering the drift term. Recall that the well-kno wn Hairy Ball theorem prohibits the existence of a smo oth, non-v anishing vector field on S 2 . This top ological constrain t guarantees that c ho osing r = 2 cannot yield a non-degenerate diffusion on S 2 . Therefore, depending on the manifold, we may need r > d . F rom a statistical p ersp ective, taking r > d may seem to in tro duce iden tifiabilit y issues. In particular, it is poss ible to find r , s ≥ d with r  = s suc h that the diffusion terms P r α =1 σ α ( X t ) ◦ dW α t and P s β =1 ˜ σ β ( X t ) ◦ d ˜ W β t ha ve the same law on M for some ˜ σ β and ˜ W β t , β = 1 , . . . , s , that is indepe nden t of W t . This is how ever not a big trouble in our analysis since our goal is estimating the diffusion matrix, not recov ering eac h σ α . Assumption 2 (1) is necessary for our analysis to b e asymptotic. Assumption 2 (2)-(4) are needed for the construction of a recurrent atom [ 27 , Definition 1.9.A] for the Nummelin-lik e splitting argumen t via an embedding tec hnique used in the analysis. Note that not all Harris recurrent pro cesses hav e a recurrent atom. The Harris recurrence assumption effectively amounts to requiring sufficien tly dense sampling lo cally on the manifold ov er finite recording time, analogous to the “low er b ounded densit y assumption” commonly imp osed in the analysis of manifold learn- ing algorithms. See [ 27 , Chapter 1] for a summary of Harris recurrence. When M is compact, Assumption 2 (1)-(4) hold automatically . Sp ecifically , the existence of smo oth kernel functions p t ( x, y ) on (0 , ∞ ) × M × M is ensured b y the H¨ ormander theorem, and p t ( x, y ) has Gaussian upp er and low er b ounds [ 49 , Theorem 6.4.1], which leads to Assumption 2 (3). Moreov er, { P t } is strong F eller [ 20 , VI I I. § 6] and X t is p ositiv e Harris recurren t [ 39 ]. When M is non-compact, v arious sufficien t conditions ensure Assumption 2 (1)-(4). T o av oid distraction, w e p ostpone these details to Section S.1.5 . W e shall mention that a non-compact manifold is necessary to host an Itˆ o diffusion with null Harris recurrence. As our primary focus is on estimating dynamics and quantifying the asymptotic b eha vior of 7 the estimators, in the non-compact case w e are conten t to work under Assumption 2 (1)-(4) in this pap er. 2.2. Manifold-v alued diffusion mo del and sampling for the high-dimensional time series. W e now detail the prop osed manifold-v alued diffusion mo del for the high-dimensional time series. W e start with a definition and some nominations. Definition 1 (Manifold-v alued diffusion model) . Supp ose Assumption 1 for a man- ifold M holds. W e call X t on M satisfying Assumption 2 a manifold-value d diffusion mo del . The high-dimensional time series X = { x k } n k =1 ⊂ R p is mo deled as the discretiza- tion of a realization of the solution ι ( X t ) on the interv al [0 , T ] at uniform times; that is, x k = ι ( X k ∆ ), where ∆ > 0 denotes the sampling interv al, k = 1 , . . . , n , n = ⌊ T / ∆ ⌋ , and ⌊·⌋ is the flo or op erator. 3. Proposed algorithm Under the manifold-v alued diffusion mo del, w e dev elop a class of manifold- adaptiv e, Nadaray a-W atson type kernel estimators to recov er the inv ariant mea- sure ϕ X , or the o ccupation density of X t , and “observed” drift v ector and diffusion matrix, denoted as µ ( o ) := ι ∗ µ ∈ R p and π ( o ) := r X l =1 ( ι ∗ σ l )( ι ∗ σ l ) ⊤ ∈ R p × p , (5) from the high-dimensional time series X := { x k } n k =1 . The idea is straigh tforw ard: w e first obtain noisy preliminary estimates of the drift and diffusion from pro- cess incremen ts, then av erage them using a nonparametric kernel. This “plug-in” approac h exploits v ariance reduction through a v eraging, yielding more stable and accurate estimates. In all cases, its success hinges on establishing a meaningful quan titative relationship b et w een successive observ ations x k +1 and x k b y taking care of the curv ature. In practice, the geo desic distance and tangent space of M are not av ailable and m ust b e estimated from the data. F or each state x ∈ M , prepare an appropriate “distance-lik e” function D x : M → R + defined b y (6) D x ( x ′ ) := ∥ ι ( x ′ ) − ι ( x ) ∥ R p , where x ′ ∈ M , which accurately estimates the geodesic distance b etw een x and x ′ on M when they are sufficiently close. Cho ose a kernel K : R + → R , which is smo oth and compactly supp orted. Define ˆ L ( o ) ( x ) := ∆ h d n − 1 X k =0 K  D x ( X k ∆ ) h  , whic h estimates the inv ariant measure ϕ X at x . With the chosen D x , estimate the diffusion matrix at x ∈ M from X by ˆ π ( o ) ( x ) = 1 ∆ P n − 1 k =0 K  D x ( x k ) h  ( x k +1 − x k )( x k +1 − x k ) ⊤ P n − 1 k =0 K  D x ( x k ) h  ∈ R p × p . (7) Note that the denominator is h d ∆ ˆ L ( o ) ( x ). 8 JACOB MCERLEAN AND HA U-TIENG WU T o estimate the drift term, we need to estimate the tangent space and a pro jec- tion onto ι ∗ T x M , denoted as ˆ P x . Denote the eigenv alue decomp osition of ˆ π ( o ) ( x ) as ˆ U x ˆ Σ x ˆ U ⊤ x , where the eigenv alues are ordered decreasingly and define (8) ˆ P x := ˆ U d ˆ U ⊤ d , where ˆ U d ∈ R p × d is formed from the first d columns of ˆ U x . Clearly , ˆ U ⊤ d ˆ U d = I d × d b y construction. As w e show b elow, the column space of ˆ U d pro vides a go o d appro ximation of the embedded tangen t space ι ∗ T x M . With the chosen pro jection matrix ˆ P x , estimate the drift term by ˆ µ ( o ) ( x ) = ˆ P x ˆ µ E ( x ) ∈ R p , (9) where ˆ µ E ( x ) := 1 ∆ P n − 1 k =0 K  D x ( x k ) h  ( x k +1 − x k ) P n − 1 k =0 K  D x ( x k ) h  ∈ R p . (10) When M = R p , our problem re duces to that studied in [ 8 ]. Indeed, choosing ˆ P x = I p × p mak es the estimators in ( 7 ) and ( 9 ) coincide with those of [ 8 ]. In the manifold setting, particularly when d < p and M is nonlinear, the situation differs. Although the diffusion pro cess can be embedded in Euclidean space, the estimator of [ 8 ], denoted ˆ µ E , targets the drift of the embedded pro cess Z t rather than that of X t ; see ( 2 ). A simple illustration is the curve M = { ( x, y ) : y = x 2 } . Driftless Brownian motion on R lifts naturally to M , yet a naive drift estimate at (0 , 0) acquires a positive vertical comp onent normal to M , ev en when geo desic distances on M are used. T o correct curv ature-induced bias, a pro jection matrix ˆ P x is necessary . R emark 2 . Recall that when x and x ′ are sufficiently close, the Euclidean distance ∥ ι ( x ′ ) − ι ( x ) ∥ R p pro vides a go o d approximation to the geo desic distance [ 36 ]; that is, when x ′ = exp x ( tθ ), ∥ θ ∥ = 1, and t sufficiently small, w e ha ve ∥ ι ( x ) − ι ( x ′ ) ∥ R q = t + 1 6 t 3 ι ∗ θ ⊤ ∇ θ I I x ( θ , θ ) + O ( t 4 ) , where I I x is the second fundamen tal form at x . More accurate appro ximations can b e obtained through higher-order corrections inv olving estimating the second fundamen tal form [ 36 ], and in some settings geo desic information ma y even b e a v ailable. How ev er, the main difficult y arises not from estimating geodesic distances but from analyzing the increment x k +1 − x k . In general, x k +1 − x k do es not lie in ι ∗ T x M and contains a normal comp onen t, whic h is the primary source of bias. F or this reason, we do not pursue higher-order lo cal appro ximations of the geo desic distance here. 4. Asymptotic Anal ysis In this section, w e study the asymptotic behavior of the estimators introduced in Section 3 . W e b egin with in tuition and relev ant existing results b efore presen ting main results. In tuitively , since X t is contin uous, our prop osed estimators can b e viewed as Riemann sum approximations of their target quantities. T o b e more sp ecific, b y 9 Itˆ o’s formula, we ha ve x k +1 − x k = ι ( X ( k +1)∆ ) − ι ( X k ∆ ) = Z ( k +1)∆ k ∆ µι ( X s ) ds + Z ( k +1)∆ k ∆ σ α ιdW α s , and hence the numerator of ˆ µ E ( x ) becomes 1 h d n − 1 X k =0 K  D x ( x k ) h  Z ( k +1)∆ k ∆ µι ( X s ) ds + Z ( k +1)∆ k ∆ σ α ιdW α s ! . By appro ximating R ( k +1)∆ k ∆ µι ( X s ) ds b y ∆ µι ( X k ∆ ), w e ha v e: ˆ µ ( o ) ( x ) ≈ P x  ∆ h d P n − 1 k =0 K  D x ( x k ) h  µι ( X k ∆ ) ∆ h d P n − 1 k =0 K  D x ( x k ) h  + 1 h d P n − 1 k =0 K  D x ( x k ) h  R ( k +1)∆ k ∆ σ α ιdW α s ∆ h d P n − 1 k =0 K  D x ( x k ) h   , where the summation is the Riemann sum approximation; that is, ∆ h d P n − 1 k =0 K  D x ( x k ) h  µι ( X k ∆ ) ∆ h d P n − 1 k =0 K  D x ( x k ) h  ≈ 1 h d R T 0 K  D x ( X s ) h  µι ( X s ) ds 1 h d R T 0 K  D x ( X s ) h  ds . (11) The first term in ( 11 ) is the ratio of tw o integrations inv olving X t . Recall that for b ounded measurable f ≥ 0, an R + ∪ {∞} v alued, F t -adapted pro cess A t := R t 0 f ( X s ) ds , t ≥ 0, with A 0 = 0, is called an additive functional . See Section S.1.1 for more details. By definition, the denominator 1 h d R T 0 K  D x ( X s ) h  ds is an additiv e functional, and the n umerator can b e handled similarly using addi- tiv e functional prop erties after prop er manipulations. In the pro of, we quan- tify these approximations and show that this term, combined with P x appro xi- mates the desired drift. The second term in ( 11 ) in volv es sto chastic integration, 1 h d P n − 1 k =0 K  D x ( x k ) h  R ( k +1)∆ k ∆ σ α ιdW α s , which we c on trol via controlling its qua- dratic v ariation using its martingale property . In the end, we show that it con v erges to a normal distribution. Since all proposed estimators share a similar structure, while some might b e more complicated than others, the analysis reduces to con- trolling the Riemann sum approximation when ∆ → 0, quantifying the b ehavior of the integrals when h → 0 and T → ∞ , and analyzing the asymptotics of the sto c hastic integral terms. The ratio of t w o integrations inv olving X t reminds us of the ratio-limit theorem. Theorem 4.1 (Ratio Limit Theorem) . [ 27 , The or em 1.7] Supp ose Assumptions 1 and 2 hold. F or any Bor el-me asur able, p ositive, and ϕ X -inte gr able f , g : M → R such that 0 < ⟨ ϕ X , g ⟩ M := R M g ( x ) ϕ X ( x ) < ∞ , we have P x lim T →∞ R T 0 f ( X s ) ds R T 0 g ( X s ) ds = ⟨ ϕ X , f ⟩ M ⟨ ϕ X , g ⟩ M ! = 1 (12) for al l x ∈ M . Mor e over, lim T →∞ E x R T 0 f ( X s ) ds E x R T 0 g ( X s ) ds = ⟨ ϕ X , f ⟩ M ⟨ ϕ X , g ⟩ M (13) ϕ X -a.s., wher e the exc eptional set dep ends on f and g . 10 JACOB MCERLEAN AND HA U-TIENG WU This theorem shows that, in the long-time limit, an additiv e functional b eha v es lik e an integral ov er the manifold, and it links our analysis with that in mani- fold learning via ( 11 ). In other words, the effect of the initial condition is effec- tiv ely “washed out” asymptotically , linking the result to the ergodic theorem, with R T 0 g ( X s ) ds acting as a random clo c k counting effective time. See Section S.1.3 for further details. T o quantify the asymptotic b ehavior of additive functionals, we need to know ho w often the system X t returns to a given state and ho w the time sp en t in a region scales ov er long p erio ds. When X t is Harris recurren t, these quan tities follo w a w ell- defined probabilistic pattern that typically resem bles a stable distribution rather than the usual b ell-curve b ehavior. This fact is quantified in the following general Darling-Kac theorem. T o state this theorem, we need the notion of a function that, while not necessarily b ounded, b ehav es asymptotically like a constan t. Definition 2. Let ℓ : R + → R + . W e say that ℓ is r e gularly varying at infinity with index α ∈ R if for all ζ > 0, lim x →∞ ℓ ( ζ x ) ℓ ( x ) = ζ α . If α = 0, ℓ ( x ) is called slow ly varying at infinity . W e similarly say that ℓ is r e gularly varying at zer o with index α if for all ζ > 0, lim x → 0 ℓ ( ζ x ) ℓ ( x ) = ζ α . Denote D ( R + , R ) to b e the Skorohod space with Borel σ -algebra and canonical filtration. Note that the classical Darling-Kac theorem is the necessary direction of the follo wing theorem, so we call it the general Darling-Kac theorem. Theorem 4.2 (General Darling-Kac Theorem) . [ 27 , The or em 3.15] Supp ose As- sumptions 1 and 2 hold. (a) The fol lowing two statements ar e e quivalent. (1) F or every non-ne gative Bor el-me asur able function g with 0 < ⟨ ϕ X , g ⟩ M < ∞ , one has r e gular variation at 0 of r esolvants in X t : ( R 1 /T g )( x ) := E x  Z ∞ 0 exp  − t T  g ( X t ) dt  ∼ ⟨ ϕ X , g ⟩ M Υ( T ) (14) as T → ∞ for p X -almost al l x ∈ M (the exc eptional set dep ends on f ), wher e Υ( T ) := U ( T ) T α , α ∈ (0 , 1] , and U : R + → R + is slow ly varying at infinity. (2) F or any additive functional A t = R t 0 f ( X s ) ds of X t with 0 < E ϕ X ( A 1 ) < ∞ , one has we ak c onver genc e ( A sT ) s ≥ 0 Υ( T ) d − − → E ϕ X ( A 1 ) g α ( s ) in D ( R + , R ) as T → ∞ under P x for al l x ∈ M , wher e g α is the Mittag- L effler pr o c ess of index α . (b) The c ases in (a) ar e the only ones that the we ak c onver genc e of ( A sT ) s ≥ 0 υ ( T ) to a c ontinuous nonde cr e asing limit pr o c ess ϖ so that ϖ 0 = 0 and the law of ϖ 1 is not de gener ate at 0 , is available for some norming function υ . See Section S.1.2 for a quic k review of Mittag-Leffler pro cess. Call Υ : R + → R + the sc aling factor of X t . When the manifold is compact, X t is p ositive Harris recurren t and w e ha v e Υ( T ) = T , or α = 1, and U ( T ) = 1. 11 4.1. F urther mo del assumptions. Denote the densities of X t and X s and the corresp onding joint density as ˜ p t ( · ), ˜ p s ( · ), and ˜ p s,t ( · , · ) resp ectiv ely , and denote (15) ˜ g s,t ( a, b ) = ˜ p s,t ( a, b ) − ˜ p s ( a ) ˜ p t ( b ) , where a, b ∈ M , which quan tifies the lev el of path-dep endence of the pro cess X t . W e mak e the following assumption ab out Υ’s b ehavior near 0, which allows us to apply Theorem 4.2 , and the regularity of the path-dep endence of the pro cess via the scaling factor Υ. Assumption 3. The scaling factor Υ of X t is regularly v arying at 0 with index α ∈ (0 , 1]. Moreo ver, lim T →∞ Υ( T ) − 1 Z Z [0 ,T ] 2 ∥ ˜ g s,t ∥ r ds dt < ∞ for some r ∈ [1 , ∞ ]. In other words, the first part of Assumption ( 3 ) says that Theorem 4.2 (a)(1) holds. W e imp ose the follo wing assumption for ϕ X , whic h allows us to carry out asymptotic analysis on M . Assumption 4. The inv ariant measure ϕ X asso ciated with X t is absolutely con- tin uous asso ciated with the Riemannian measure of the manifold M and admits a strictly-p ositiv e density function p X ∈ C 3 ( M ) so that ϕ X ( dx ) = p X ( x ) dV g ( x ) by Radon-Nik o dym theorem. 4.2. Sampling and algorithm assumptions. Our final set of assumptions con- cerns the data sampling scheme and algorithm. Regarding the kernel function K , w e mak e the follo wing assumption. Assumption 5. The kernel function K ∈ C 3 ( R ) is nonnegative and compactly supp orted on [0 , L ], where L > 0. Denote κ p,q = R R d K ( ∥ u ∥ ) p ∥ u ∥ q du , where p ∈ N and q ∈ { 0 } ∪ N , and assume κ 1 , 0 = 1. The assumption κ 1 , 0 = 1 can b e easily ac hiev ed b y a direct normalization. Fi- nally , we mak e an assumption regarding the sampling perio d ∆ and bandwidth h , whic h dep ends on the scaling factor Υ of X t . Assumption 6. Assume ∆ = ∆( T ) → 0, h = h ( T ) → 0, and Υ( T ) h 2 d/r → ∞ , where r is from Assumption 3 , as T → ∞ . This assumption pla ys a crucial role in our analysis. Letting T → ∞ allows us to lev erage the equilibrium b ehavior of the dynamics to estimate b oth drift and diffusion terms, while ∆ → 0 provides sufficien tly fine temp oral resolution to capture the dynamics accurately , often referred to as infil ling asymptotics. 4.3. Asymptotic result for prop osed estimators. Recall notation in Section 3 . W e start with the asymptotic behavior of ˆ L ( o ) ( x ), whic h is stated in the follo wing Theorem and its pro of is p ostp oned to Section S.3 . Theorem 4.3 (Occupation densit y estimator) . Assume Assumptions 1 - 6 hold. If ∆ h 2 = o (1) as T → ∞ , we have ˆ L ( o ) ( x ) Υ( T ) d − → g α (1) p X ( x ) and E M λ ( ˆ L ( o ) ( x )) = Θ(Υ( T )) . (16) Geometrically , ˆ L ( o ) ( x ) is an estimate of the inv ariance measure asso ciated with X t , which can b e viewed as a kernel density estimation of p X ( x ) when the samples are dependent and mo deled by a diffusion pro cess. 12 JACOB MCERLEAN AND HA U-TIENG WU Next, we discuss the diffusion estimator. T o describ e the bias of the estimator ˆ π ( o ) ( x ), define b ( o ) π ( x ) := κ 1 , 2  ∇ π ( o ) ( x ) · ∇ log p X ( x ) + 1 2 ∆ π ( o ) ( x )  . (17) W e hav e the following theorem describing the asymptotic behavior of ˆ π ( o ) ( x ). Theorem 4.4 (Diffusion estimator) . Supp ose Assumptions 1 - 6 hold. F urther sup- p ose that h d Υ( T ) ∆ − → ∞ , h d +4 Υ( T ) ∆ − → C for a c onstant C > 0 , and h d − 4 Υ( T )∆ − → 0 . When T → ∞ , we have s h d ˆ L ( o ) ( x ) ∆ ( ˆ π ( o ) ( x ) − π ( o ) ( x ) − h 2 b ( o ) π ( x )) d − → N  0 , 2 κ 2 , 0 π ( o ) ( x ) ⊗ π ( o ) ( x )  , (18) wher e ⊗ is the Kr one cker pr o duct. When M = R d , our framework and results reduce to those of [ 8 , Theorem 5], where the result is introduced using half-vectorization. Denote vech and vec to b e the half-vectorization and v ectorization op erators that conv ert an y symmetric matrix A ∈ R m × m in to vech ( A ) ∈ R m ( m +1) / 2 and vec ( A ) ∈ R m 2 . Denote D ∈ R m 2 × m ( m +1) 2 to be the standard duplication matrix that con v erts vech ( A ) in to vec ( A ) for any symmetric matrix A ∈ R m × m . Then, ( 18 ) can b e rewritten as s h d ˆ L ( o ) ( x ) ∆ v ech( ˆ π ( o ) ( x ) − π ( o ) ( x ) − h 2 b ( o ) π ( x )) d − → N (0 , 2 κ 2 , 0 P D  π ( o ) ( x ) ⊗ π ( o ) ( x )  P ⊤ D ) , where P D = ( D ⊤ D ) − 1 D ⊤ . In the manifold setting, the deviation of ( x i +1 − x i ) from ι ∗ T x i M comes in to pla y , whic h complicates the bias terms in ( 18 ) through curv ature effects. Notably , b oth the bias and v ariance dep end on the in trinsic manifold dimension d , rather than the ambien t dimension p . R emark 3 . Note that [ 8 , Theorems 5] imp oses the conditions h d ˆ L ( o ) ( x ) ∆ p − → ∞ , h d +4 ˆ L ( o ) ( x ) ∆ p − → C , and h d − 4 ˆ L ( o ) ( x )∆ p − → 0. W e choose to replace ˆ L ( o ) ( x ) b y Υ( T ) in our assumption to av oid potential con tradiction. Indeed, if w e impose h d +4 ˆ L ( o ) ( x ) ∆ p − → C , with Theorem 4.3 that ˆ L ( o ) ( x ) Υ( T ) d − → g α (1) p X ( x ), Slutsky’s theorem and con tinuous mapping theorem give h d +4 Υ( T ) ∆ p − → C g α (1) p X ( x ) . When α ∈ (0 , 1), since h d +4 Υ( T ) ∆ is a deterministic sequence and g α (1) is a nondegenerate random v ariable, we m ust ha ve C = 0, and hence lose the control of h d +4 ˆ L ( o ) ( x ) ∆ . As discussed in [ 8 , Remark 12], it is not a problem to replace ˆ L ( o ) ( x ) by Υ( T ) in [ 8 , Theorems 5] (and [ 8 , Theorems 4] as well), which av oids the contradiction. The same comment holds for Theorem 4.6 . Estimating the drift estimator is more delicate, as it requires estimating the tangen t-space pro jection of the embedded manifold from the data. The following theorem pro vides a key result for tangent space estimation. Theorem 4.5 (T angent space estimator) . Supp ose Assumptions 1 - 6 hold. Denote P x to b e an ortho gonal pr oje ction to ι ∗ T x M . Supp ose mor e over that h d Υ( T ) − → ∞ , h d +4 Υ( T ) − → C for a c onstant C > 0 , and ∆ 2 h d − 4 Υ( T ) − → 0 . Denote the eigenvalue de c omp osition ˆ π ( o ) ( x ) = ˆ U ˆ D ˆ U ⊤ , wher e the eigenvalues ar e or der e d descr e asingly. 13 L et ˆ U d b e the p × d matrix forme d fr om the first d c olumns of ˆ U asso ciate d with the lar gest d eigenvalues. Then, ˆ P x := ˆ U d ˆ U ⊤ d = P x + h 2 b ( t ) ( x ) , (19) wher e b ( t ) ( x ) = b ( t ) ( x )+ ϵ ( t ) ( x ) ∈ R p × p , b ( t ) ( x ) = O (1) , and ϵ ( t ) ( x ) = o p  1 √ h d +4 Υ( T )  . Note that the central limit theorem for the diffusion estimator dep ends on the scale q ∆ h d ˆ L ( o ) ( x ) , whereas tangent space recov ery only requires the coarser scale 1 √ h d ˆ L ( o ) ( x ) . This difference arises b ecause the latter scale is already sufficient for establishing the cen tral limit theorem for drift estimation. W e do not claim opti- malit y of the prop osed diffusion-based tangent space estimator. W e also note that sev eral metho ds for tangent space estimation hav e b een proposed in the literature (e.g., [ 47 , 50 ]), primarily based on lo cal principal comp onent analysis (PCA). In this approac h, one constructs a lo cal cov ariance matrix C x := n − 1 X k =0 K  D x ( x k ) h  ( x k − x )( x k − x ) ⊤ , and estimates ι ∗ T x M using its top d eigenv ectors. How ever, these metho ds t ypically assume i.i.d. samples from the manifold and th us do not directly apply in our setting. Additional analysis is required to understand the b ehavior of local PCA under the manifold-v alued diffusion mo del. T o state the asymptotic b eha vior of ˆ µ ( o ) , define the bias term: b ( o ) µ ( x ) = κ 1 , 2 P x  ∇ µ ( o ) ( x ) · ∇ log p X ( x ) + 1 2 ∆ µ ( o ) ( x )  − b ( t ) ( x ) µ ( o ) ( x ) ∈ R p , (20) where b ( t ) ( x ) is in ( 19 ). Theorem 4.6 (Drift estimator) . Supp ose Assumptions 1 - 6 hold. Supp ose mor e over that h d Υ( T ) − → ∞ , h d +4 Υ( T ) − → C for a c onstant C > 0 , and ∆ 2 h d − 4 Υ( T ) − → 0 . Supp ose P x is the ortho gonal pr oje ction to ι ∗ T x M . When T → ∞ , q h d ˆ L ( o ) ( x )( ˆ µ ( o ) ( x ) − µ ( o ) ( x ) − h 2 b ( o ) µ ( x )) d − → N ( 0 , κ 2 , 0 P x π ( o ) ( x ) P ⊤ x ) . (21) In the case where M = R d , the drift estimator reduces to that considered in [ 8 ]. Since the SDE can b e constructed via an em b edding pro cedure, it is tempting to directly apply the estimator from [ 8 ]. Ho w ev er, this naiv e approach in tro duces an additional bias of order 1 in the normal direction due to curv ature. More sp ecifically , if w e omit ˆ P x and simply use ˆ µ E ( x ) ( 9 ), we hav e q h d ˆ L ( o ) ( x )( ˆ µ E ( x ) − ( µ ( o ) ( x ) + µ ( o ) η ( x )) − h 2 b ( o ) µ ( x )) d − → N (0 , κ 2 , 0 π ( o ) ( x )) , whic h differs from ( 21 ) by a nontrivial term µ ( o ) η ( x ) = 1 2     P d a =1 P d b =1 ⟨ σ ( o ) a ( y ) , σ ( o ) b ( y ) ⟩ e ⊤ 1 I I y ( ∂ a , ∂ b ) . . . P d a =1 P d b =1 ⟨ σ ( o ) a ( y ) , σ ( o ) b ( y ) ⟩ e ⊤ p I I y ( ∂ a , ∂ b )     ∈ R p 14 JACOB MCERLEAN AND HA U-TIENG WU living in the normal space at x . The deviation µ ( o ) η ( x ) cannot b e eliminated even if w e can access the geo desic distance of the manifold and set D x ( x k ) = d g ( x, x k ) in ˆ µ E ( x ). See Figure 2 in the n umerical section for examples. W e emphasize a key tec hnical distinction betw een the analyses of the diffusion and drift estimators. F or the drift, four error terms must be con trolled in the expansion of the exp onen tial map. This arises because the drift of a diffusion pro cess acquires second-order contributions under a co ordinate transformation via geometry-sto c hastics relationship, which is made clear via Itˆ o’s lemma. As a result, curv ature, the second-order structure of the manifold, induces a non-negligible dis- tortion in the drift that p ersists asymptotically . Note that ev en in the Euclidean setting, the bias con tains a second-order term originating from the drift itself [ 8 , Theorem 4]. W e compare the bandwidth-sampling size pair ( h, n ) for the drift and diffusion estimators with the c hoices commonly used in the manifold learning literature. In manifold learning, observ ations are t ypically assumed to b e indep endent. F or p oin t wise conv ergence, the bandwidth h = h ( n ) is usually c hosen such that nh d → ∞ and h → 0 as n → ∞ . Geometrically , if the sampling density is b ounded a w a y from 0, the condition nh d → ∞ ensures that asymptotically there are infinitely man y samples in a shrinking ball of radius h , pro viding sufficient lo cal data for estimation. While the relationship is more complicated, a parallel condition for the drift estimator is h d Υ( T ) → ∞ as T → ∞ . When X t is p ositive Harris recurrent, for example, M is compact, Υ( T ) = T . Using the sampling relation T = n ∆ with ∆ → 0 as T → ∞ , the assumption can b e rewritten as h d n ∆ → ∞ as n → ∞ . The interpretation is similar but not identical to the indep enden t-sampling case. Since ∆ → 0, the condition implies h d n → ∞ , so that asymptotically infinitely man y observ ations fall in a lo cal neighborho o d. The additional factor ∆ slows the effective rate and requires more observ ations when the sampling in terv al is small, reflecting the stronger temp oral dep endence b et ween closely spaced samples. Under the same p ositive Harris recurrent condition, the condition h d Υ( T ) ∆ → ∞ when T → ∞ for the diffusion estimator is equiv alent to h d n → ∞ as n → ∞ . Th us, when T and ∆ are fixed across pro cedures, and hence the same sample size, ˆ π ( o ) con verges faster than ˆ µ ( o ) b y a factor of √ ∆. This theoretical finding has an in tuitive in terpretation. F rom ( 2 ), ι ( X ( k +1)∆ ) − ι ( X k ∆ ) =  ι ∗ ν + 1 2 D ι ∗ σ α ( ι ∗ σ α )  ( ι ( X k ∆ ))∆ + ι ∗ σ α ( X k ∆ ) √ ∆ Z α + o p (∆ 3 / 2 ) , where Z α ∼ N (0 , 1). In this approximation, the drift term is of order ∆, while the diffusion term, or noise, is of order √ ∆. When ∆ gets smaller, the signal-to- noise ratio for the drift decreases at rate √ ∆; the sto c hastic fluctuations dominate the deterministic drift contribution in each incremen t, making its recov ery more difficult. In contrast, ( ι ( X ( k +1)∆ ) − ι ( X k ∆ ))( ι ( X ( k +1)∆ ) − ι ( X k ∆ )) ⊤ = ( ι ∗ σ α ( X k ∆ ) Z α )( ι ∗ σ α ( X k ∆ ) Z α ) ⊤ ∆ + O p (∆ 3 / 2 ) , so the diffusion co efficient app ears in the leading term of the increment. Conse- quen tly , the diffusion comp onen t is statistically easier to estimate than the drift. 15 5. Simula ted Experiments In this section, w e examine our prop osed estimators using tw o 2-dimensional manifolds, the standard 2-sphere S 2 = { x ∈ R 3 : ∥ x ∥ = 1 } ⊂ R 3 and the Klein b ottle embedded in R 4 . Throughout this section, we use a smo oth, com- pactly supp orted kernel function defined by K ( s ) = exp( − (1 − ( s/ 3) 2 ) − 1 ) when s ∈ [0 , 3) and 0 otherwise. Bandwidth selection is delicate in practice, particularly when the sampling densit y is non uniform. Since developing a bandwidth selec- tion algorithm is b ey ond the scop e of this work, w e follow the empirical practice in manifold learning and select h so that the kernel is supp orted on a neighbor- ho od that is of size ∼ 1% of the total tra jectory length. The problem of iden- tifying an optimal bandwidth will b e inv estigated in future studies (see Discus- sion). The Python co de producing figures and results are av ailable in https: //github.com/jacobmcerlean/Functional- Estimation- Manifold- SDEs . 5.1. 2 -sphere. Consider M as a 2-dim ellipsoid with eccentricit y ( a, b, c ) em b edded in R 3 . T o simulate SDE tra jectories on M , consider S 2 ⊂ R 3 and diffeomorphism from S 2 to M via φ : ( x, y , z ) 7→ ( ax, by , cz ). Consider the SDE with the drift µ ( l ) ( x, y , z ) = ( y , − x, 0) and Riemannian Bro wnian motion on S 2 . W e simulate the pro cess using a r etr action-b ase d Euler scheme [ 45 ], which is detailed b elow. Giv en a num b er of observ ations n ∈ N , time-step size ∆, and an initial condition Y 0 = y 0 ∈ S 2 , we generate a discrete tra jectory { Y k } n k =0 on S 2 as follows. F or k = 0 , . . . , n − 1, we first sample a random unit vector w k ∈ R 3 and an independent random χ 2 (2)-distributed radius r k . Then, we set a random tangen t v ector v k := r k  w k − w ⊤ k Y k Y k | w k − w ⊤ k Y k Y k |  ∈ T Y k S 2 . This produces an isotropic tangent incremen t consistent with Riemannian Brownian motion on S 2 . The Euler incr ement is then defined by δ Y k = √ ∆ v k + ∆ µ ( Y k ) . (22) The next state of the SDE tra jectory is obtained via a retraction map given by radial pro jection; that is, Y k +1 := Y k + δ Y k ∥ Y k + δ Y k ∥ . Then iterate with k . As is sho wn in [ 45 ], this iteration approximates the in trinsic SDE on S 2 . The drift of X t = φ ( Y k ) is µ (o) ( x, y , z ) =  ay b , − bx a , 0  + P ( x,y ,z ) ( − ( x, y , z )) , where P ( x,y ,z ) w = w − ⟨ w , n ( x, y , z ) ⟩ ⟨ n ( x, y , z ) , n ( x, y , z ) ⟩ n ( x, y , z ) , n ( x, y , z ) =  x a 2 , y b 2 , z c 2  . and diffusion is π (o) ( x, y , z ) = D φD φ ⊤ − ( x, y , z )( x, y , z ) ⊤ , where D φ ∈ R 3 × 3 is the matrix diag( a, b, c ). No w we rep ort results on an ellipsoid M with eccentricit y ( a, b, c ) = (2 , 1 . 5 , 1), normalized b y applying a global scaling q 3 a 2 + b 2 + c 2 , and p ostp one results for other ellipsoids with eccentricities (1 , 1 , 1) and (1 . 5 , 1 , 1) with the same normalization to Section S.7 . W e start with demonstrating the rate of conv ergence of the empirical density of an SDE tra jectory to the in v arian t measure of X t . Simulate a long tra jectory of length n max = 10 8 and ∆ = 10 − 2 , for physical time T = 10 6 , on S 2 , and map this tra jectory to M . In Figure 1 , we rep ort the log-log plot of ∥ ˆ L ( o ) − ϕ X ∥ L 2 to demonstrate the rate of conv ergence of the empirical density . The decay rate of 16 JACOB MCERLEAN AND HA U-TIENG WU around n − 1 / 2 agrees with the positive Harris recurrence nature of the dynamics. F or a visualization, we plot the empirical density ˆ L ( o ) ( x ) of the tra jectory at different lengths n i , where log 10 ( n i ) ∈ { 4 , 5 , 6 , 7 , 8 } , in Figure S.7 . Figure 1. Using the o ccupation density for tra jectory length n = 10 8 , time-step ∆ = 10 − 2 , and physical time T = 10 6 as an accurate estimate for the inv arian t densit y ϕ X , we compare the inv arian t densit y to the estimate ˆ L ( o ) based on the first n data-p oints of the tra jectory on each observed ellipsoid to measure the conv ergence rate of the empirical density . Next, w e simulate an SDE tra jectory of length n = 10 8 , with the initial p oint sampled uniformly from N , time step ∆ = 10 − 4 , and physical time T = 10 4 . The dataset is obtained b y do wnsampling the tra jectory , k eeping ev ery 100 th observ a- tion. This preserv es the ph ysical time while reducing the sample size to n = 10 6 with uniform time step ∆ = 10 − 2 . Figure 2 illustrates ˆ µ E ( x ) and ˆ µ ( o ) ( x ) computed from the dataset on M is (2 , 1 . 5 , 1), shown on a spherical cap near (2 , 0 , 0). Additional examples with differ- en t eccentricities are provided in Figures S.8 and S.9 . The curv ature-induced bias of ˆ µ E ( x ) is clearly visible in Figure 2 . F or comparison, we also plot P ˆ µ E , which is defined as ( P ˆ µ E )( x ) := P x ˆ µ E ( x ), where P x is the true pro jection op erator onto ι ∗ T x M . As expected, when the tangen t space is known, P ˆ µ E visually p erforms b etter than ˆ µ ( o ) . T o quan tify these observ ations, w e sample 10 4 p oin ts uniformly on S 2 , map them to M , and ev aluate drift and diffusion estimation errors at these base-p oints. Due to the top ological constraints, any contin uous drift on S 2 m ust v anish somewhere. T o a void numerical instability , we stratify p oints by ∥ µ ( o ) ( x ) ∥ / ∥ µ ( o ) ∥ ∞ with threshold c = 0 . 05. F or the subset of p oin ts with {∥ µ ( o ) ( x ) ∥ / ∥ µ ( o ) ∥ ∞ > 0 . 05 } (99 . 9% out of 10 4 ), w e report: (i) relative vector error, or normalized root mean square error (NRMSE), ∥ ˆ µ − µ ( o ) ∥ ∥ µ ( o ) ∥ , where ˆ µ can b e any drift estimator, (ii) relative norm error |∥ ˆ µ ∥−∥ µ ( o ) ∥| ∥ µ ( o ) ∥ , and (iii) angle error Θ( ˆ µ, µ ( o ) ) with the unit radians. F or the remaining p oin ts, where the drift is near zero, we rep ort absolute drift errors ∥ ˆ µ − µ ( o ) ∥ only . F or diffusion, we rep ort the relativ e F rob enius error, or NRMSE, ∥ ˆ π − π ∥ F ∥ π ∥ F and a subspace metric based on principal angles b et ween the leading tw o-dimensional 17 eigenspaces, U 2 (resp. b U 2 ), of π (resp. ˆ π ), ∥ sin Θ 2 ∥ F , where sin Θ 2 := sin Θ( U 2 , b U 2 ), with Θ( U 2 , b U 2 ) the diagonal matrix so that the singular v alues of U ⊤ 2 b U 2 are the cosines of the diagonal matrix Θ( U 2 , b U 2 ). Note that ∥ sin Θ 2 ∥ F measures the error of tangen t space estimation. Summary statistics (means ± standard deviations) are reported in T able 1 , with the associated histograms in Figures S.10 and S.11 . Results for M with other ec- cen tricities are sho wn in T able S.3 . Consistently smaller diffusion errors compared to drift errors reflect the greater difficulty of drift estimation discussed after The- orem 4.6 . The drift estimators exhibit the predicted ordering: ˆ µ E has the largest error, ˆ µ ( o ) impro ves upon it, and P ˆ µ E p erforms best. This pattern holds across rel- ativ e vector, norm, and angle errors. Paired one-sided Wilco xon signed-rank tests with Bonferroni correction confirm P ˆ µ E < ˆ µ ( o ) < ˆ µ E for all drift metrics (adjusted p -v alues < 10 − 5 ). Finally , a detailed study at (0 , 0 , 1) ⊤ , based on 1000 indep enden t SDE simula- tions with n = 10 6 and ∆ = 10 − 2 can b e found in Figures 3 and 4 . The approx- imately Gaussian error distributions align with the asymptotic normality result; QQ-plots are provided in Figures S.12 and S.13 . Figure 2. F rom left to right: visualizations of ˆ µ E ( x ) , ˆ µ ( o ) ( x ) , and P x ˆ µ E ( x ), where P x is the pro jection op erator on to the tan- gen t space T x M , for base-p oint samples x drawn uniformly from a spherical cap centered at (1 , 0 , 0) ⊤ and observed on an ellipsoid with eccentricit y (2 , 1 . 5 , 1), sho wn from tw o viewing angles. The ground-truth drift vector is sup erimp osed as blue arro ws. 5.2. The Klein Bottle in R 4 . Set N := [0 , 2 π ) × [0 , 2 π ). Consider the manifold M to b e the Klein b ottle K ⊂ R 4 , a compact, non-orien table, 2-dimensional smo oth manifold without b oundary , em b edded in R 4 via the smo oth map φ : ( u, v ) ∈ N → [cos( u )( a + r sin v ) , sin( u )( a + r sin v ) , r cos( u/ 2) sin v , r sin( u/ 2) sin v ] ⊤ , where 18 JACOB MCERLEAN AND HA U-TIENG WU Error type ˆ µ E ˆ µ ( o ) P x ˆ µ E ˆ π ( o ) ∥ ˆ µ ( x ) − µ ( o ) ( x ) ∥ ∥ µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 1 . 060 ± 0 . 566 0 . 215 ± 0 . 215 0 . 208 ± 0 . 214 — |∥ ˆ µ ( x ) ∥−∥ µ ( o ) ( x ) ∥| ∥ µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 0 . 474 ± 0 . 507 0 . 150 ± 0 . 169 0 . 147 ± 0 . 167 — Θ( ˆ µ ( x ) , µ ( o ) ( x )) ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 0 . 782 ± 0 . 190 0 . 129 ± 0 . 192 0 . 124 ± 0 . 193 — ∥ ˆ µ ( x ) − µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ < 0 . 05) 0 . 644 ± 0 . 013 0 . 126 ± 0 . 060 0 . 125 ± 0 . 059 — ∥ ˆ π ( x ) − π ( o ) ( x ) ∥ F / ∥ π ( x ) ∥ F — — — 0 . 038 ± 0 . 013 ∥ sin Θ 2 ( x ) ∥ F — — — 0 . 017 ± 0 . 008 T able 1. Summary of v arious ev aluation metrics. ˆ µ ( x ) is the estimator of µ ( o ) ( x ), whic h can b e ˆ µ E , ˆ µ ( o ) , or P x ˆ µ E , where P x is the pro jection to T x M , listed in the top. ˆ π ( x ) is the estimator of π ( o ) ( x ), whic h is ˆ π ( o ) ( x ). Θ( ˆ µ, µ ( o ) ) is the angle b etw een µ ( o ) and ˆ µ with the unit radian. ∥ sin Θ ∥ F is the subspace distance b et w een the dominant 2D eigenspaces of ˆ π ( o ) and π ( o ) . Figure 3. Histograms of drift estimation errors ov er 1000 inde- p enden t SDE simulations, comparing ˆ µ E and ˆ µ ( o ) to the true drift v ector field µ ( o ) at (0 , 0 , 1) ⊤ . a > r > 0. Let Γ b e the fundamental group of the Klein b ottle, whic h is a discrete group of diffeomorphisms of R 2 generated b y g 1 : ( u, v ) 7→ ( u, v + 2 π ) , g 2 : ( u, v ) 7→ ( u + 2 π , − v ) , (23) where ( u, v ) ∈ R 2 , fulfilling the relation g 1 g 2 g − 1 1 = g 2 . Note that g 1 leads to a cylinder, while g 2 t wist and glue the cylinder’s ends to generate a Klein b ottle. Also note that N is the fundamental region for the action of Γ on R 2 . Let π : R 2 → R 2 / Γ b e the quotient map. 19 Figure 4. Histograms of the errors of the estimated diffusion ma- trix u ⊤ i ˆ π ( o ) u j o ver 1000 indep endent SDE simulations (lab eled as ( i, j ) in the subplots) compared to the true vector fields u ⊤ i π ( o ) u j , for ellipsoids of v arying eccentricities. By construction, φ is Γ-compatible; that is, φ ( u, v ) = φ ( γ · ( u, v )) for all γ ∈ Γ, with γ · denoting the group action. A smo oth vector field µ on R 2 is called Γ - invariant if for an y γ ∈ Γ, γ ∗ µ ( u, v ) = µ ( γ · ( u, v )). A Γ-in v ariant vector field µ therefore induces a smo oth v ector field on N , and a smo oth vector field on M lifts to a Γ-inv arian t vector field on R 2 . Th us, smo oth vector fields on M are in one-to-one corresp ondence with Γ-inv arian t smo oth vector fields on R 2 . With this prop erty , in this section w e sim ulate the M -v alued SDE via simulating the Euclidean SDE on R 2 . F or the SDE, we set the drift as µ ( u, v ) = (1 + 1 2 cos  u 2  sin( v ) , 1 2 sin (2 v )) ⊤ (sho wn in Figure S.15 ) on N and the diffusion to b e Riemannian B ro wnian motion to enforce the Γ-in v ariant condition, so that the SDE is v alid on N with the drift µ ( l ) := π ∗ µ , the drift µ ( o ) := φ ∗ µ ( l ) + 1 2 ι ∗ ( ∇ φ ∗ σ α φ ∗ σ α ) for σ α v ector fields generat- ing Brownian motion on N , and the diffusion π ( o ) = D φDφ ⊤ . W e use a standard Euler-sc heme as in ( 22 ) on the Euclidean plane for generating the SDE, and then apply φ to obtain the SDE on M embedded in R 4 . Belo w, set a = 2 and r = 1 in the parameterization of M b y φ . Sim ulate a long tra jectory with n = 10 8 p oin ts and time-step ∆ = 10 − 2 , and then view the estimated density as a surrogate of ϕ X . In Figure 5 , w e rep ort the log-log plot of the ∥ ˆ L ( o ) − ϕ X ∥ L 2 to demonstrate the rate of conv ergence of the empirical density . The deca y rate at around n − 0 . 5 is consistent with that of ellipsoid and close to n − 0 . 5 , as exp ected. F or a visualization, we plot the empirical density ˆ L ( o ) ( x ) of the tra jectory at different lengths n i , where log 10 ( n i ) ∈ { 4 , 5 , 6 , 7 , 8 } , in Figure S.14 . Next, w e realize an SDE tra jectory with n = 10 8 with initial p oint c hosen uni- formly at random from the sphere. W e set the time-step ∆ = 10 − 4 , so that the ph ysical time of the tra jectory is T = 10 4 . The tra jectory is then down-sampled b y every 100 observ ations, which preserves the ph ysical time but reduces sampled 20 JACOB MCERLEAN AND HA U-TIENG WU Figure 5. Using the o ccupation density for tra jectory length n = 10 8 and ∆ = 10 − 2 as an accurate estimate for the in v ariant densit y ϕ X , w e compare the in v ariant densit y to the estimate ˆ L ( o ) based on the first n data-p oints of the tra jectory to measure the con vergence rate of the empirical density . p oin ts to n = 10 6 with uniform time-step ∆ = 10 − 2 . Thus, the resolution of the tra jectory generation is higher than that used in the estimation pro cedure. The do wn-sampled single tra jectory is then mapp ed b y φ to the Klein bottle in R 4 . T o visualize and compare the drift estimators, we reduce the dimension of a lo cal patc h using singular v alue decomp osition (SVD). T ake a lo cal patch of B = 465 tra jectory p oin ts that lie within [ π − 1 , π + 1] × [2 π − 1 2 , 2 π / 2 + 1 2 ] ⊂ N , and are mapp ed to R 4 b y φ . Denote the center ( c u , c v ) = ( π , 2 π ) ∈ N , with image x c = Φ( c u , c v ). F orm a centered data matrix X =  x 1 − x c , . . . , x B − x c  ∈ R 4 × B and compute its SVD as X = U S V ⊤ with singular v alues ordered decreasingly . De- fine U 3 = [ u 1 , u 2 , u 3 ] ∈ R 4 × 3 from the top three left singular v ectors, and construct z i = U ⊤ 3 ( x i − x c ) ∈ R 3 , i = 1 , . . . , B . Then, plot the pro jected v ectors U ⊤ 3 µ ( o ) ( x i ), U ⊤ 3 ˆ µ E ( x i ), U ⊤ 3 ˆ µ ( o ) ( x i ), and U ⊤ 3 ( P x ˆ µ E )( x i ) on { z i } . T o aid visualization, w e rotate z i so that the least w eighted direction of z i is on the z axis and plot the rotated patc h together with the rotated drift vectors from three viewing angles in Figure 6 . The curv ature-induced bias in ˆ µ E is visually clear. T o aid visualization and a v oid cro wding, we randomly select only 58, or approxi mately B / 8, of the base-p oin ts in [ π − 1 , π + 1] × [2 π − 1 2 , 2 π / 2 + 1 2 ] to plot the drift estimators. T o quantify these visual findings, we follow the same principle as that in the ellipsoid. W e sample 10 4 p oin ts on M by taking a uniform 100 × 100 grid on N and mapping by φ . Since the vector field on N is non-v anishing, we ev aluate NRMSE at all p oints. The results are shown in T able 2 . The asso ciated histograms can b e found in Figure S.16 . F or NRMSE and angle error of drift estimates, ˆ µ E has substan tially larger error than ˆ µ ( o ) , and ˆ µ ( o ) is slightly w orse than P x ˆ µ E (adjusted p -v alues < 10 − 5 ). F or the relativ e norm error, ˆ µ ( o ) has significantly smaller relativ e norm error than ˆ µ E (adjusted p -v alues < 10 − 5 ) while the comparison b etw een P x ˆ µ E and ˆ µ ( o ) do es not hav e statistical significance. 21 Figure 6. F rom left to righ t: visualizations of ˆ µ E ( x ) , ˆ µ ( o ) ( x ) , and P x ˆ µ E ( x ), where P x is the pro jection op erator onto the tangen t space T x M . The ground-truth drift vector is superimp osed as blue arro ws. The asso ciated lo cal patc h mapp ed via U 3 and rotation is o verlaid to enhance the visualization. Error t yp e ˆ µ E ˆ µ ( o ) P x ˆ µ E ˆ π ( o ) ∥ ˆ µ ( x ) − µ ( o ) ( x ) ∥ ∥ µ ( o ) ( x ) ∥ 0 . 363 ± 0 . 206 0 . 144 ± 0 . 083 0 . 137 ± 0 . 086 — |∥ ˆ µ ( x ) ∥−∥ µ ( o ) ( x ) ∥| ∥ µ ( o ) ( x ) ∥ 0 . 117 ± 0 . 090 0 . 100 ± 0 . 079 0 . 102 ± 0 . 080 — Θ( ˆ µ ( x ) , µ ( o ) ( x )) 0 . 319 ± 0 . 202 0 . 089 ± 0 . 065 0 . 074 ± 0 . 065 — ∥ ˆ π ( x ) − π ( o ) ( x ) ∥ F ∥ π ( x ) ∥ F — — — 0 . 048 ± 0 . 016 ∥ sin Θ 2 ( x ) ∥ F — — — 0 . 052 ± 0 . 041 T able 2. Summary of v arious ev aluation metrics. ˆ µ ( x ) is the estimator of µ ( o ) ( x ), whic h can b e ˆ µ E , ˆ µ ( o ) , or P x ˆ µ E , where P x is the pro jection to T x M , listed in the top. ˆ π ( x ) is the estimator of π ( o ) ( x ), whic h is ˆ π ( o ) ( x ). Θ( µ ( o ) , ˆ µ ) is the angle b etw een µ ( o ) and ˆ µ with the unit radian. ∥ sin Θ ∥ F is the subspace distance b et w een the dominant 2D eigenspaces of ˆ π ( o ) and π ( o ) . 6. Discussion This work develops drift and diffusion estimators, along with tangent space and o ccupation density estimators, from uniform and high-frequency sampling under 22 JACOB MCERLEAN AND HA U-TIENG WU a manifold-based time-homogeneous SDE framework with theoretical guarantees. Sev eral imp ortan t issues merit further discussion. A central practical question is bandwidth selection, which remains delicate in diffusion-based regression. Ac hiev- ing an optimal bandwidth choice of h is complicated, particularly when X t is null Harris recurrent; see [ 8 , Remark 12]. Existing metho ds [ 3 , 7 ] pro vide guidance for Euclidean settings, but their extension to manifold-supp orted, temp orally dep en- den t data is largely unexplored. Identifying data-driven bandwidth choices that balance curv ature effects, sampling density , and temp oral dependence is a promis- ing direction that is also important for manifold learning algorithms. The present analysis assumes a fixed sampling in terv al ∆. In man y biomedical applications [ 15 , 55 ], how ever, ∆ is nonuniform and dep endent on X t . A thorough study of non uniform sampling, particularly for high-frequency data corrupted b y noise or sync hronization errors, is b eyond the scop e of this pap er but represents a natural extension. Related c hallenges include mo del missp ecification [ 43 ], lo w-frequency sampling [ 22 ], high-frequency noise amplification [ 43 , 57 ], and asynchron y or ran- domized sampling schemes [ 5 ]. All of these issues b ecome more intricate on mani- folds. Another practical question is the con vergence rate of the prop osed estimators. W eak conv ergence follows from technical limit theorems [ 27 ] and we need differen t tec hniques to deriv e the rate. More generally , biomedical time series are often time-inhomogeneous and contaminated by nonstationary noise or measurement ar- tifacts. Extending the framework to accommo date such b eha vior, possibly through generalized F ourier-Mallia vin tec hniques [ 5 , 38 ], is an imp ortan t and largely op en di- rection. Lo cally linear manifold regression metho ds [ 17 ], or higher-order v ariations, ma y further reduce bias in curved regions, though controlling curv ature-dep endent remainder terms in the higher-order v ariations requires additional tec hnical devel- opmen t. F rom a manifold-learning persp ective, an imp ortant direction is to adapt existing algorithms to dep endent sampling schemes; for instance, can we still ap- pro ximate the Laplace-Beltrami op erator from the graph Laplacian? If not, how to mo dify it if we need the Laplace-Beltrami op erator? W e leav e these extensions to future w ork. References [1] Y acine A ¨ ıt-Sahalia. Closed-form likelihoo d expansions for multiv ariate diffu- sions. Ann. Statist. , 36(2):906–937, 2002. [2] Y acine A ¨ ıt-Sahalia and Lars Peter Hansen. Handb o ok of financial e c onometrics: to ols and te chniques , volume 1. Elsevier, 2009. [3] Y acine A ¨ ıt-Sahalia and Jo on Y Park. Bandwidth selection and asymp- totic prop erties of lo cal nonparametric estimators in p ossibly nonstationary con tinuous-time mo dels. J. Ec onom. , 192(1):119–138, 2016. [4] Y acine A ¨ ıt-Sahalia and Dacheng Xiu. Principal c omponent analysis of high- frequency data. J. A m. Stat. Asso c. , 114(525):287–303, 2019. [5] Jiro Ak ahori, Reik a Kambara, Nien-Lin Liu, Maria Elvira Mancino, T ommaso Mariotti, and Y ukie Y asuda. Symmetric positive semi-definite fourier estimator of spot cov ariance matrix with high frequency data. R isks , 13(10):197, 2025. [6] K. B. A threy a and P . Ney . A new approach to the limit theory of recur- ren t marko v chains. T r ansactions of the Americ an Mathematic al So ciety , 245:493–501, 1978. 23 [7] F ederico Bandi, V alentina Corradi, and Guillermo Molo che. Bandwidth selec- tion for contin uous-time marko v pro cesses. 2009. [8] F ederico M. Bandi and Guillermo Molo che. On the functional estimation of m ultiv ariate diffusion pro cesses. Ec on. The ory , 34(4):896–946, 2018. [9] Lorenzo Boninsegna, F eliks N¨ usk e, and Cecilia Clementi. Sparse learning of sto c hastic dynamical equations. The Journal of chemic al physics , 148(24), 2018. [10] Denis Bosq. Nonp ar ametric statistics for sto chastic pr o c esses: estimation and pr e diction , volume 110. Springer Science & Business Media, 2012. [11] George EP Box, Gwilym M Jenkins, Gregory C Reinsel, and Greta M Ljung. Time series analysis: for e c asting and c ontr ol . John Wiley & Sons, 2015. [12] Eugene N Bruce. T emp oral v ariations in the pattern of breathing. J. Appl. Physiol. , 80(4):1079–1087, 1996. [13] Ruijun Bu, Degui Li, Oliver Lin ton, and Hanchao W ang. Nonparametric esti- mation of large spot v olatilit y matrices for high-frequency financial data. Ec on. The ory , pages 1–38, 2025. [14] P Burgiere. Th´ eoreme de limite centrale pour un estimateur non param´ etrique de la v ariance d’un pro cessus diffusion multidimensionelle. In Annales de l’Institut Henri Poinc ar´ e, Se ction B, Calcul des Pr ob abilities et Statistique , v olume 29, pages 357–389, 1993. [15] Cheng-Hsi Chang, Shen-Chih W ang, W en-Kuei Chang, Chien-Kun Ting, Ming-Jen Lu, Chia-Hsun Lin, Chai-Hock Chua, Hau-Tieng W u, and Y u-Ting Lin. Intraoperative arterial blo o d pressure wa veform v ariation predicts short- term acute kidney injury after cardiac surgery . me dRxiv , 2025. [16] Dac huan Chen, Per A Mykland, and Lan Zhang. The five trolls under the bridge: Principal comp onent analysis with asynchronous and noisy high fre- quency data. J. A m. Stat. Asso c. , 115(532):1960–1977, 2020. [17] Ming-Y en Cheng and Hau-tieng W u. Lo cal linear regression on manifolds and its geometric interpretation. J. A m. Stat. Asso c. , 108(504):1421–1434, 2013. [18] Daan Crommelin and Eric V anden-Eijnden. Diffusion estimation from mul- tiscale data by operator eigenpairs. Multisc ale Mo deling & Simulation , 9(4):1588–1623, 2011. [19] Lea Dunck er, Gergo Bohner, Julien Boussard, and Maneesh Sahani. Learning in terpretable contin uous-time mo dels of latent stochastic dynamical systems. In ICML , pages 1726–1734. PMLR, 2019. [20] K.D. Elw orth y . Sto chastic differ ential e quations on manifolds . Cambridge Univ ersity Press, 1982. [21] Jianqing F an, Yingying F an, and Jinc hi Lv. Aggregation of nonparametric estimators for volatilit y matrix. Journal of Financial Ec onometrics , 5(3):321– 357, 2007. [22] Emman uel Gob et, Marc Hoffmann, and Markus Reiß. Nonparametric estima- tion of scalar diffusions based on lo w frequency data. Ann. Statist. , 32(5):2223– 2253, 2004. [23] Alexander Grigor’y an. Heat kernel upp er b ounds on a complete non-compact manifold. R evista Matem´ atic a Ib er o americ ana , 10(2):395–452, 1994. [24] Alexander Grigor’y an and Lauren t Saloff-Coste. Heat k ernel on manifolds with ends. Annales de l’institut F ourier , 59(5):1917–1997, 2009. 24 JACOB MCERLEAN AND HA U-TIENG WU [25] T ak aki Hay ashi and Nak ahiro Y oshida. On co v ariance estimation of non- sync hronously observ ed diffusion pro cesses. Bernoul li , 11(2):359–379, 2005. [26] Y u-Chieh Ho, T e-Sheng Lin, Shen-Chih W ang, Cheng-Hsi Chang, and Y u-Ting Lin. V ariability of morphology in photoplethysmographic wa veform quan tified with unsup ervised wa ve-shape manifold learning for clinical assessment. Phys- iolo gic al Me asur ement , 45(9):095005, 2024. [27] R. H¨ opfner and E. L¨ ocherbac h. Limit The or ems for Nul l R e curr ent Markov Pr o c esses . American Mathematical So ciety , 2003. [28] E. P . Hsu. Sto chastic Analysis on Manifolds , volume 38. American Mathemat- ical Society , 2002. [29] Satoshi Ishiwata and Hiroshi Ka wabi. A graph discretized approximation of semigroups for diffusion with drift and killing on a complete riemannian man- ifold. Mathematische A nnalen , 390(2):2459–2495, 2024. [30] Mark Kac. Can one hear the shap e of a drum? The americ an mathematic al monthly , 73(4P2):1–23, 1966. [31] Ioannnis Karatzas and Steven E. Shrev e. Br ownian Motion and Sto chastic Calculus . Springer-V erlag, 1988. [32] Y uji Kasahara. Limit theorems for l´ evy pro cesses and p oisson p oint pro cesses and their applications to bro wnian excursions. Journal of Mathematics of Kyoto University , 24(3):521–538, 1984. [33] ES Lobanov a and FI Ataullakhano v. Running pulses of complex shap e in a reaction-diffusion model. Physic al r eview letters , 93(9):098303, 2004. [34] Ev a L¨ ocherbac h and Dasha Loukianov a. On nummelin splitting for con tin uous time harris recurrent mark o v pro cesses and application to kernel estimation for m ulti-dimensional diffusions. Sto chastic Pr o c esses and their Applic ations , 118(8):1301–1321, 2008. [35] Ev a L¨ ocherbac h and Dasha Loukianov a. The law of iterated logarithm for additiv e functionals and martingale additive functionals of harris recurren t mark ov pro cesses. Sto chastic Pr o c esses and their Applic ations , 119(7):2312– 2335, 2009. [36] John Malik, Chao Shen, Hau-Tieng W u, and Nan W u. Connecting dots: from lo cal co v ariance to empirical intrinsic geometry and lo cally linear embedding. Pur e and Applie d Analysis , 1(4):515–542, 2019. [37] Marek Malik and A John Camm. Heart rate v ariability . Clinic al c ar diolo gy , 13(8):570–576, 1990. [38] P aul Malliavin and Maria Elvira Mancino. A fourier transform metho d for nonparametric estimation of multiv ariate v olatilit y . Ann. Statist. , 37(4):1983– 2010, 2009. [39] Sean P Meyn and Richard L Tweedie. Stabilit y of mark o vian pro cesses iii: F oster–lyapuno v criteria for contin uous-time pro cesses. A dvanc es in Applie d Pr ob ability , 25(3):518–548, 1993. [40] Sean P Meyn and RL Tw eedie. A surv ey of foster-lyapuno v techniques for general state space mark o v pro cesses. In Pr o c e e dings of the Workshop on Sto chastic Stability and Sto chastic Stabilization, Metz, F r anc e , 1993. [41] Harold Mo dell, William Cliff, Jo el Michael, Jenny McF arland, Mary Pat W en- deroth, and Ann W right. A physiologist’s view of homeostasis. A dvanc es in physiolo gy e duc ation , 2015. S.1 [42] E. Nummelin. A splitting technique for harris recurrent marko v chains. Z. Wahrscheinlichkeitsthe orie verw Gebiete , 43:309–318, 1978. [43] Grigorios A Pa vliotis and AM Stuart. P arameter estimation for m ultiscale diffusions. Journal of Statistic al Physics , 127(4):741–781, 2007. [44] Raman N Pillai. On mittag-leffler functions and related distributions. A nnals of the Institute of statistic al Mathematics , 42(1):157–161, 1990. [45] Simon Sc h warz, Michael Herrmann, Anja Sturm, and Max W ardetzky . Effi- cien t random w alks on riemannian manifolds. F oundations of Computational Mathematics , 25(1):145–161, 2025. [46] R Seeley . Contraction semigroups for diffusion with drift. T r ansactions of the A meric an Mathematic al So ciety , 283(2):717–728, 1984. [47] Amit Singer and H-T W u. V ector diffusion maps and the connection laplacian. Commun. Pur e Appl. Math. , 65(8):1067–1144, 2012. [48] Helle Sørensen. Parametric inference for diffusion pro cesses observed at dis- crete p oints in time: a survey . International statistic al r eview , 72(3):337–354, 2004. [49] Daniel W. Stroo c k. Partial Differ ential Equations for Pr ob abilists , v olume 112. Cam bridge Univ ersity Press, 2010. [50] H. Ty agi, E. V ural, and P . F rossard. T angent space estimation for smo oth em b eddings of Riemannian manifolds. Information and Infer enc e , 2(1):69– 114, 2013. [51] NP V an Der Aa, HG T er Morsche, and RMM Mattheij. Computation of eigen- v alue and eigenv ector deriv ativ es for a general complex-v alued eigensystem. Ele ctr onic Journal of Line ar Algebr a , 16:300–314, 2007. [52] Mark V eraar. The sto c hastic fubini theorem revisited. Sto chastics A n Inter- national Journal of Pr ob ability and Sto chastic Pr o c esses , 84(4):543–551, 2012. [53] Wilhelm von Rosenberg, Marc-Oscar Hoting, and Danilo P Mandic. A phys- iology based mo del of heart rate v ariability . Biome dic al Engine ering L etters , 9(4):425–434, 2019. [54] F eng-Y u W ang. Analysis for diffusion pr o c esses on R iemannian manifolds , v olume 18. W orld Scientific, 2014. [55] Shen-Chih W ang, Chien-Kun Ting, Cheng-Y en Chen, Chinsu Liu, Niang- Cheng Lin, Che-Chuan Lo ong, Hau-Tieng W u, and Y u-Ting Lin. Arterial blo od pressure w a veform in liver transplan t surgery p ossesses v ariability of morphology reflecting recipien ts’ acuity and predicting short term outcomes. J Clin Monit Comput. , 37(6):1521–1531, 2023. [56] Hau-Tieng W u and Nan W u. Think globally , fit lo cally under the mani- fold setup: Asymptotic analysis of lo cally linear embedding. Ann. Statist. , 46(6B):3805–3837, 2018. [57] Lan Zhang, Per A Mykland, and Y acine A ¨ ıt-Sahalia. A tale of t wo time scales: Determining in tegrated volatilit y with noisy high-frequency data. J. A m. Stat. Asso c. , 100(472):1394–1411, 2005. Appendix S.1. Necessar y ma thema tical back ground In this section, we summarize necessary mathematical bac kground for the anal- ysis of the prop osed estimators. S.2 JACOB MCERLEAN AND HA U-TIENG WU S.1.1. Life-cycle decomp osition and L¨ oc herbach-Loukiano v a em b edding. Recall that an additive functional [ 27 , Definition 1.5] of a Harris recurrent X = ( X t ) t ≥ 0 on (Ω , F , P ) taking v alues in a manifold M with c` adl` ag paths, and with X 0 = x ∈ M a.s., is a process ( A t ) t ≥ 0 ⊂ R + ∪ {∞} so that A t is F adapted, A 0 = 0, all paths are nondecreasing and righ t-con tinuous, and for every x ∈ M and for all s, t ≥ 0, w e hav e A t + s = A t + A s ◦ ϑ t P x -a.s., where ϑ t is the shift op erator. In our asymptotic analysis of the prop osed estimators, we encoun ter additive functionals of the form A t := Z t 0 f ( X s ) ds , (S.1) where t ≥ 0 and f ≥ 0 is b ounded and measurable, and its discretization. W e fo cus on studying the asymptotic behavior of A t and its Riemann sum discretization when t → ∞ and X t is an Itˆ o diffusion and Harris recurrent with an inv ariant measure ϕ X . F or suc h f , the additive functional A t defines a measure on ( M , B ( M )) via ν A ( B ) := E M ϕ X ( R 1 0 1 B ( X s ) dA s ), where B ∈ B ( M ) is a Borel measurable set. W e call the additiv e functional inte gr able if ν A ( M ) = E M ϕ X ( A 1 ) < ∞ [ 27 , page 12]. Note that E M ϕ X  Z 1 0 1 B ( X s ) dA s  = t − 1 E M ϕ X  Z t 0 1 B ( X s ) dA s  (S.2) for an y t > 0 since t 7→ E M ϕ X ( A t ) is linear. So, alternativ ely when ⟨ ϕ X , f ⟩ M < ∞ , the additiv e functional is integrable [ 35 ], and we hav e E M ϕ X ( A 1 ) = ⟨ ϕ X , f ⟩ M . S.1.1.1. R e curr ent atom. The analysis of ( S.1 ) w ould become straigh tforward if this additiv e functional of X t could b e decomposed into i.i.d. (or even stationary and strongly mixing) comp onents. Suc h a decomp osition would exist if the inv arian t measure ϕ X of the pro cess X t con tained a r e curr ent atom A ; i.e., whenev er the pro cess enters A , it returns to A infinitely often with probability 1. Definition S.1 (Definition 1.9.A, [ 27 ]) . Consider a random pro cess X t on the P olish state space ( E , E ) adapted to the filtration F . The set A is an atom for X t if (1) σ A := inf { t > 0 : X t ∈ A } and τ A = inf { τ > 0 : X t / ∈ A } are F -stopping times; (2) for x ∈ A , the distribution of X τ A conditional on X 0 = x do es not dep end on x . An atom is called r e curr ent if for all x ∈ A , then a.s. for ev ery N there is some t > N with X t ∈ A given X 0 = x . The existence of a recurren t atom A implies a sequence of stopping times { R m } ∞ m =0 , called a life-cycle de c omp osition [ 27 , Definition 1.9.B], so that X R m ∈ A , and for all m , w e ha v e (1) the stopping time R m < ∞ with R m +1 = R m + R 1 ◦ ϑ R m , where ϑ is the shift operator; (2) X t + R m is independent of F R m for t ≥ 0. Supp osing such a life-cycle decomp osition exists, we hav e standard to ols to under- stand the asymptotic b ehavior of the additive functional of the process X t . Ho wev er, Harris recurrence does not in general guarantee life-cycle decomp o- sition, even in the Euclidean setting, since the pro cess X t ma y fail to admit a S.3 recurren t atom. Nev ertheless, the desired result can still b e obtained by in v oking the em b edding tec hnique introduced in [ 34 ], which we call L¨ ocherb ach-L oukianova (LL) emb e dding . LL embedding extends the classical constructions of of [ 6 , 42 ] to con tinuous-time processes. The key idea is to em b ed X t in to an augmented pro cess Z t that p ossesses a recurrent atom A , thereby providing a surrogate notion of life- cycle decomp osition suitable for our analysis. This technique, central to [ 8 ], will lik ewise prov e applicable in our setting. F or self-con tainedness, and to facilitate its extension to the manifold-v alued setting, we outline the construction b elow. S.1.1.2. L¨ ocherb ach-L oukianova emb e dding. W e summarize the LL embedding, Z t := Z ( X t ) of the pro cess X t defined to solve a manifold-v alued SDE, as follo ws (full details can b e found in Section 2.2 of [ 34 ]). Supp ose X t satisfies Assumptions 1 , 2 , and 4 . As discussed in Section 2.1 , it is strong Mark o v [ 20 , IX.3-5] with c` adl` ag paths [ 20 , VII I.6]. When M is com- pact, w e hav e all necessary conditions. Since X t is p ositive Harris recurrent, we ha ve an inv ariant probability measure ϕ X with a smo oth density function p X > 0 b y Assumption 4 . By [ 49 , Theorem 6.4.1], there exists smooth k ernel functions p t ( x, y ) on (0 , ∞ ) × M × M such that p t ( x, y ) > 0 for all t > 0 and x, y ∈ M , and p t ( x, y ) has a Gaussian upp er and lo wer bounds. Therefore, Assumption 2.1 in [ 34 ] is satisfied. Moreo ver, [ 34 , condition (2.8)] ab out the resolven t control, sup x,x ′ ∈ M R ∞ 0 te − t p t ( x,x ′ ) u 1 ( x,x ′ ) dt < ∞ , where u 1 ( x, x ′ ) := R ∞ 0 e − t p t ( x, x ′ ) dt , holds by a direct calculation. Clearly , dν x,x ′ ( t ) := e − t p t ( x,x ′ ) u 1 ( x,x ′ ) dt is a probability measure on R + . With this v alidation, Assumption 3.1 in [ 34 ] is also satisfied. When M is noncompact, we use the imp osed Assumption 2 so that Assumption 3.1 in [ 34 ] is satisfied and the LL embedding can b e applied. W e first assem ble the ingredients required for the LL embedding. By Harris recurrence, the pro cess returns infinitely often to an y set with p ositive ϕ X measure almost surely . Choose a compact set C ⊂ M so that ϕ X ( C ) > 0, where ϕ X is the in v ariant measure of X t ; the particular c hoice of C is immaterial to the result. Let ξ b e the probability measure equiv alent to ϕ X ( · ∩ C ). Recall that the transition k ernel of X t is given b y P t ( x, dy ) = p t ( x, y ) dV g ( y ), where dV g ( y ) is the Riemannian v olume measure. Recall that the discrete-time random pro cess { ¯ X n := X T n } n ∈ N , where T n := P n j =1 W j and W j is an i.i.d. exp(1)-w aiting times, is Harris recurrent with its transition k ernel U 1 ( x, dy ) satisfying the minorisation condition: [ 34 , (2.3)] U 1 ( x, dy ) := Z ∞ 0 e − t P t ( x, dy ) dt ≥ β 1 C ( x ) ξ ( dy ) , where β ∈ (0 , 1). By Assumption, U 1 ≪ dV g and the asso ciated densit y function if u 1 ( x, u ) := R ∞ 0 e − t p t ( x, y ) dt . Let U ( du ) denote the uniform probabilit y measure on [0 , 1]. F ollo wing [ 34 , (2.4)], define the transition kernel Q (( x, u ) , dy ) from M × [0 , 1] to M b y Q (( x, u ) , dy ) =      ξ ( dy ) ( x, u ) ∈ C × [0 , β ] (1 − β ) − 1 ( U 1 ( x, dy ) − β ξ ( dy )) ( x, u ) ∈ C × ( β , 1] U 1 ( x, dy ) x / ∈ C . By construction, R 1 0 Q (( x, u ) , dy ) du = U 1 ( x, dy ) [ 34 , (2.5)]. W e now quickly review the LL embedding of X t b y constructing a new random pro cess Z t = ( Z 1 t , Z 2 t , Z 3 t ) that tak es v alues in M := M × [0 , 1] × M using Q . Set S.4 JACOB MCERLEAN AND HA U-TIENG WU T 0 = 0. Initially , set Z 1 0 = X 0 = x , Z 2 0 ∼ U , where U is the uniform distribution on [0 , 1], and set Z 3 0 ∼ Q (( x, u ) , dx ′ ) conditional on Z 2 0 = u . The construction is iterativ ely on m ∈ N , and it depends on a sequence of indep endent jump times { T m } ∞ m =1 iterativ ely defined as T m +1 := T m + σ m +1 , where σ m follo ws the la w e − t p t ( x,x ′ ) u 1 ( x,x ′ ) on R + with x = Z 1 T m and z ′ = Z 3 T m , and { T m } ∞ m =0 are indep endent of the pro cess X t . Denote I m := [ T m , T m +1 ). Then, on I m , construct Z 1 that bridges X T m and X T m +1 , and set Z 2 t = Z 2 T m and Z 3 t = Z 3 T m for all t ∈ I m . At the jump time T m +1 , set Z 1 T m +1 := Z 3 T m , c ho ose Z 2 T m +1 ∼ U [0 , 1] indep endent of Z s , s < T m +1 , and conditional on Z 1 T m +1 = x ′ and Z 2 T m +1 = u ′ , choose Z 3 T m +1 ∼ Q (( x ′ , u ′ ) , dx ′′ ). Clearly , the evolution of Z t during I m do es not depend on Z 2 T m . Then, iterate. Here, Z 2 t and Z 3 t are piecewise constan t, Z 2 t is used to realize the splitting and Z 3 t records the future information. W e refer readers with interest in details to [ 34 , Section 2]. Denote F t b e the filtration generated by Z t and G t b e the filtration generated b y ( Z 1 t , Z 2 t ). Some prop erties follow regarding this construction. While we do not need all of them, w e collect them here for the self-containedness. Prop osition S.1.1. The LL emb e dde d pr o c ess Z t satisfies: (1) [ 34 , The or em 2.12] Z t is str ong Markovian for any stopping time S such that Z 1 S  = Z 3 S a.s. and Z t is c` ad l` ag. (2) Z t is Harris r e curr ent with the invariant pr ob ability me asur e Π( dy , du, dz ) = dV g ( y ) U ( du ) Z M ϕ X ( dx ) u 1 ( x, y ) u 1 ( y , z ) u 1 ( x, z ) Q (( x, u ) , dz ) . (3) By c onstruction, while { T m } m is not indep endent of Z t , it is indep endent of Z 1 t . A lso, { T m } m is the jump times of ( Z 2 t , Z 3 t ) . (4) [ 34 , Pr op osition 2.6] ( T n − T n − 1 ) n ≥ 1 ar e i.i.d. exp(1) r andom variables, and T n +1 − T n is indep endent of F T n − . The next prop erty is the most imp ortant one concerning us is the existence of a “recurren t atom” on the extended space M . Prop osition S.1.2. [ 34 , Pr op osition 2.8(a)][ 35 , Pr op osition 4.1] The set A = C × [0 , β ] × M , (S.3) wher e C c omp act with ϕ X ( C ) > 0 and β ∈ (0 , 1) , is a “r e curr ent atom” for Z t with r esp e ct to the me asur e ¯ λ in ( S.4 ) in the sense that, for R := inf { n | Z T n ∈ A } , then [ Z T R +1 | Z 1 T R , Z 2 T R ] has density ¯ λ . The law of ( X T n ) n ≥ 0 c onditional on X 0 = x ∈ M is the same as ( Z 1 T n ) n ≥ 0 c onditional on Z 1 0 = x ∈ M , and the law of ( X t ) t ≥ 0 c onditional on X 0 = x ∈ M is the same as ( Z 1 t ) t ≥ 0 c onditional on Z 1 0 = x ∈ M . Mor e over, the law of [ Z 3 t | Z 1 t ] fol lows u 1 ( Z 1 t , x ′ ) dx ′ . This prop osition says that X t and Z 1 t ha ve the same law, which justified the term “em b edding”. Moreov er, in this ric her random pro cess, we possess a recurrent atom A . Note that the third coordinate M of A is trivial as the process Z 3 t is alw ays con tained in M . Before in tro ducing the “generalized life-cycle decomposition, we shall mak e clear the origin and relation of several measures relev an t to the embedding construction, whic h is critical for the up coming analyses. Let λ b e a generic, initial measure on S.5 M mo deling the distribution of X 0 . The asso ciated exp ectation E M λ (or E λ when there is no confusion) is defined as E M λ Z T 0 f ( X t ) dt ! := Z X 0 = x ∈ M Z T 0 f ( X t ) dt dλ ( x ) , where f is a real-v alued, measurable function on M . R emark 4 . Although not strictly necessary , it is useful to record an alternative represen tation of E M λ  R T 0 f ( X t ) dt  . Let W r 0 := { ω ∈ C ([0 , ∞ ) , R r ) : ω (0) = 0 } , equipp ed with its natural filtration {F t } t> 0 and Wiener measure P W . Let W ( M ) = C ([0 , ∞ ) , M ), endow ed with the filtration B t ( W ( M )) generated by cylinder sets, and denote b y P M W the Wiener measure on W ( M ). Equip M × W r 0 with the prod- uct measure P M λ := λ ⊗ P W . Define the path-map F M : M × W r 0 → W ( M ) by F M ( x 0 , ω )( t ) = X t ( x 0 , ω ), where X t ( x 0 , ω ) denotes the lo cation at time t of the tra jectory X t starting from x 0 ∈ M and driven by the Brownian motion deter- mined b y ω . By construction, F M is measurable. Then, we induce a measure P M λ on W ( M ) by P M λ := P M λ ◦ ( F M ) − 1 . With this notation, E M λ  R T 0 f ( X t ) dt  := R γ ∈W ( M ) R T 0 f ( γ ( t )) dt dP M λ ( γ ), which provides a path-space formulation. F rom the initial measure λ for X t , w e consider an initial measure λ ( dx, du, dy ) := λ ( dx ) U ( du ) Q (( x, u ) , dy ) (S.4) for Z t on M := M × [0 , 1] × M , where the v ariables dx , du , and dy indicate a placeholder for the up coming integration. Since M = M × [0 , 1] × M is the space in which Z t ev olves, we denote the path-space W ( M ) to b e the space of paths induced b y W ( M ) and the LL embedding construction, with filtration B t ( W ( M )) generated by cylinder sets, and P M W the Wiener measure on W ( M ). Note that paths in W ( M ) are not contin uous by construction, and B t ( W ( M )) is the same as F t . Similarly , for g a real-v alued, measurable functions on M , w e define E M λ Z T 0 g ( Z t ) dt ! := Z Z 0 = z ∈M Z T 0 g ( Z t ) dt d ¯ λ ( z ) , (S.5) where ¯ λ is the initial measure of Z t related to the generic initial measure λ of X t via ( S.4 ). In our analysis, w e study the deterministic e quivalent of R T 0 f ( X t ) dt , defined as E M λ  R T 0 f ( X t ) dt  , which is unique up to a p ositiv e constant b eing asymptotically of the same order. Note that since Z 1 t follo ws the same la w as X t b y the LL em- b edding construction, for a function g defined on M that only dep ends on the first co ordinate; that is, f = f 0 ◦ P 1 , where P 1 is the pro jection to the first coordinate, w e ha ve (S.6) E M λ Z T 0 f ( Z t ) dt ! = E M λ Z T 0 f 0 ( Z 1 t ) dt ! = E M λ Z T 0 f 0 ( X t ) dt ! . S.1.1.3. Gener alize d life-cycle de c omp osition and gener alize d r e gener ation times. With the LL embedding and the recurrent atom A of Z t ( S.3 ), we introduce the desired gener alize d life-cycle de c omp osition . S.6 JACOB MCERLEAN AND HA U-TIENG WU Definition S.2 (Generalized life-cycle decomp osition) . Let A b e the recurrent atom of Z t and { T m } ∞ m =0 the asso ciated sequence of exp onential jump-times used to construct Z t . W e construct a sequence of F t -stopping times { S m , R m } ∞ m =0 , called the a gener alize d life-cycle de c omp osition , b y the follo wing. Initially , set R 0 = S 0 = 0, and subsequent times are iterativ ely set b y S m +1 := inf { m ′ | T m ′ > R m , Z T ′ m ∈ A } and R m = inf { m ′ | T m ′ , T m ′ > S m } for all m ∈ N . By this construction, w e summarize prop erties of S m and R m in the follo wing prop osition. Prop osition S.1.3. With the c onstruction in Definition S.2 , we have: (1) [ 34 , Pr op osition 2.13(a)] F or al l m , Z R m + is indep endent of G S n and F S n − ; that is, at R n , we start fr esh and have indep endenc e after a waiting time. Henc e, Z R m + is also indep endent of F R n − 1 . (2) [ 35 , Pr op osition 4.2 (c)] The law of Z 1 R n c onditional on G S n is ξ . (3) [ 34 , Pr op osition 2.13(b)] E M λ ( R n − S n |F S n − ) ≤ 1 /β for al l n ∈ N . (4) [ 35 , Pr op osition 4.2 (b)] We have Z R n ∼ ξ ( dx ) U ( du ) Q (( x, u ) , dx ′ ) n ≥ 1 . (S.7) (5) [ 35 , Pr op osition 4.4] Denote U m := Z R m +1 R m f ( X s ) ds for m ≥ 0 . The se quenc e ( U n ) n ≥ 0 is a stationary er go dic se quenc e under P ξ , and the se quenc e ( U 2 k ) k ≥ 0 is i.i.d, as wel l as the se quenc e ( U 2 k +1 ) k ≥ 0 . The second p oint of Prop osition S.1.3 says that the sequence of F -stopping time { R m } ∞ m =0 is a life-cycle de c omp osition for the pro cess Z t , and the sequence Z R n , n ≥ 1, is i.i.d.. The fourth p oint of Prop osition S.1.3 says that at regeneration times, we start from a fixed distribution that do es not depend on the past. The follo wing prop osition is needed in our analysis. Prop osition S.1.4. [ 34 , Pr op osition 2.20][ 35 , Pr op osition 4.3] F or a Harris r e cur- r ent X t with invariant me asur e ϕ X and ⟨ ϕ X , | f |⟩ M < ∞ , denote A t := R t 0 f ( X s ) ds . F or any initial me asur e λ and n ≥ 1 , we have E M λ ( A R n +1 − A R n ) = E M ξ ( A R 1 ) = C X ⟨ ϕ X , f ⟩ M , wher e C X  = 0 do es not dep end on f . Note that the additive functional o v er the first cycle, A R 1 , dep ends on the initial measure. When the initial measure is ξ , it is the same as the future cycles A R n +1 − A R n , when n > 1. This is b ecause of ( S.7 ) and Z t is strong Marko v. If the initial measure is not ξ , then A R 1 is different from the future cycles, and we need assumptions to control it. F or example, in [ 34 , (2.8) in Prop osition 2.16], if sup x,x ′ ∈ M Z ∞ 0 te − t p t ( x, x ′ ) u 1 ( x, x ′ ) dt < ∞ , where u 1 ( x, x ′ ) = R ∞ 0 e − t p t ( x, x ′ ) dt , then E M λ ( A R 1 ) and E M λ ( A S 1 ) are b oth b ounded. This assumption is used to con trol R M λ ( T 1 ). It holds when M is compact since the diffusion kernel has a nice Gaussian con trol. How ev er, when M = R and X t is Bro wnian motion, which is null Harris recurrent ( µ = 0 and σ = 1), p t ( x, x ′ ) = S.7 1 √ 4 π t exp( − d ( x, x ′ ) 2 / 4 t ) and R ∞ 0 te − t p t ( x,x ′ ) u 1 ( x,x ′ ) dt ≍ d ( x, x ′ ) when d ( x, x ′ ) → ∞ . The assumption fails. Combined with the motiv ation from biomedical signal analysis, w e in tro duce the assumption that X 0 follo ws a distribution supported on a compact set C ⊂ M with a bounded density function. Recall that due to ( S.2 ), we hav e E M ϕ X ( A 1 ) = ⟨ ϕ X , f ⟩ M , so from time to time, when f is p ositive measurable, we may use E M ϕ X ( A 1 ) < 0 to indicate that the asso ciated additive functional is integrable. S.1.2. A quic k review of Mittag-Leffler pro cess. Next, we define quantities that we use to describ e the asymptotic b eha vior of additive functionals of Harris recurren t processes. W e fo cus on our setup and assume the random pro cess X t is manifold-v alued, while is can b e easily generalized. See the setup in [ 27 ] for details. Definition S.3. [ 27 , Definition 2.5] F or α ∈ (0 , 1), a stable incr e asing pr o c ess of index α is a pro cess S α ( t ), t ≥ 0, with the prop erties that (1) all paths of S α are c` adl` ag, non-decreasing, and S α (0) = 0 almost surely; (2) the pro cess has indep endent and stationary incremen ts satisfying E [ e − λS α ( t ) ] = e − λ α t for all t, λ ≥ 0. F or α = 1, we define the deterministic pro cess S 1 ( t ) = t . By construction, the stable increasing pro cess of index α defines a unique prob- abilit y law on the Skorohod space D ( R + , R ) with Borel σ -algebra and canonical filtration. See [ 27 , Definition 2.5] for more details. Definition S.4. [ 27 , Definition 2.6] F or α ∈ (0 , 1), the pro cess inv erse for S α ( t ) is the Mittag-L effler pr o c ess of index α , denoted as g α ( t ) and defined by g α ( t ) = inf { s > 0 : S α ( s ) > t } for all t ≥ 0. F or α = 1, we define the determ inistic process g 1 ( t ) = t . By construction, then almost surely g α (0) = 0 and g α ( t ) is c` adl` ag, contin uous, and increasing to ∞ as t → ∞ . Prop osition S.1.5. [ 27 , R emark 2.8] L et g α b e the Mittag-L effler pr o c ess of index α ∈ (0 , 1] . Then, g α (1) ∼ G α , wher e G α is a Mittag-L effler r andom variable of index α . R emark 5 . W e note that tw o closely related but distinct families of Mittag-Leffler random v ariables app ear in the literature. T o a void confusion, we briefly describ e their construction and relationship. Both are associated with the Mittag-L effler function E α ( z ) := P ∞ r =0 z r Γ(1+ αr ) defined for α ∈ C with ℜ ( α ) > 0. F or α ∈ (0 , 1], the first family has cum ulativ e distribution function 1 − E α ( − s α ). These distributions are hea vy-tailed and do not p ossess finite mean for 0 < α ≤ 1[ 44 ]. The second family , whic h is relev an t for our purposes (see Prop osition S.1.5 ), arises from the inv erse stable subordinator. In con trast to the first family , these distributions are not hea vy-tailed and admit finite momen ts given b y [ 32 ] E ( g n α ) = n ! Γ(1+ αn ) , n ∈ N . By definition, when α = 0, this family reduces to the exp onential random v ariable with parameter 1, which is precisely the case app earing in the classical Darling-Kac theorem (Theorem 4.2 ). S.8 JACOB MCERLEAN AND HA U-TIENG WU S.1.3. Long-Time Asymptotics for Additive F unctionals. While we do not directly need it, for the sak e of self-con tainedness w e men tion a stronger ratio limit theorem called Chac on-Ornstein limit the or em [ 34 ]. Recall that w e call a measurable function f : M → R + sp e cial [ 34 , Definition 2.3] when the function x 7→ E x R ∞ 0 exp h − R t 0 h ( X s ) ds i f ( X t ) dt is b ounded for all b ounded and p ositiv e measurable functions h such that ⟨ ϕ X , h ⟩ M := R M h ( x ) ϕ X ( dx ) > 0. Since X t is strongly F eller, all p ositive b ounded functions with compact supp ort are sp ecial [ 34 , Corollary 2.22]. W e ha ve a stronger ratio limit theorem called Chacon-Ornstein limit theorem [ 34 ]. F or t w o sp ecial functions f , g suc h that ⟨ ϕ X , g ⟩ M > 0, for any initial measures λ 1 , λ 2 , ϕ X -a.s. we hav e lim T →∞ E λ 1 R T 0 f ( X s ) ds E λ 2 R T 0 g ( X s ) ds = ⟨ ϕ X , f ⟩ M ⟨ ϕ X , g ⟩ M . (S.8) W e now state the Darling-Kac theorem that describe the asymptotic b eha vior of the prop osed estimators. Recall that for a random pro cess X t , the quadr atic variation is defined by [ X t ] := lim n →∞ P n − 1 k =0 ( X t ( k +1) /n − X tk/n ) 2 , whenever the limit exists in probability . Since we work with Itˆ o processes with con tinuous sample paths, we do not distinguish b et w een predictable quadratic v ariation and quadratic v ariation. Moreov er, if M t is a contin uous, square-in tegrable, and real-v alued lo cal martingale, then by the Do ob-Meyer de c omp osition [ 31 , Theorem 4.10, pg. 24], the pro cess M 2 t − [ M t ] is itself a lo cal martingale. In our analysis, we encounter additiv e functionals lik e A t := R t 0 g ( X t ) dW t , where X t is an Itˆ o pro cess and ⟨ ϕ X , | g |⟩ M < ∞ . In this case, [ A t ] = R t 0 g 2 ( X t ) dt . T o study the asymptotic b ehavior of such terms, w e will rely on Theorem 3.16 in [ 27 ]. Recall notation in Theorem 4.2 . Prop osition S.1.6. [ 27 , The or em 3.16 and the fol lowing discussion] L et A t b e a p -dim additive functional of a Harris r e curr ent X t with invariant me asur e ϕ X . Assume A t is a lo c al ly squar e inte gr able lo c al martingale that is lo c al ly b ounde d. Denote J ∈ R p × p such that J k,l := E M ϕ X ([ e ⊤ k A 1 , e ⊤ l A 1 ]) , and assume J k,k < ∞ for k = 1 , . . . , p . Then, for some 0 < α ≤ 1 , we have a we ak c onver genc e of p airs A sT p Υ( T ) , [ A sT ] Υ( T ) ! s ≥ 0 d − − → ( J 1 / 2 W α , J g α ) in D ( R + , R ) as T → ∞ under P x for al l x ∈ M , wher e g α is a Mittag-L effler pr o c ess of or der α , W α := W ( g α ) , and W is a p -dim standar d Br ownian motion indep endent of g α . Note that W ( g α ) is a c enter e d Gaussian mixtur e with cov ariance g α , or it can b e viewed as a time c hanged Brownian motion with g α ; that is, a Bro wnian mo- tion whose v ariance is randomized b y a Mittag-Leffler clo c k. When α < 1, the marginal of W ( g α ) is non-Gaussian. In our application, w e encounter t = 1, when W α (1) | g α (1) ∼ N (0 , g α (1) I p ), or W α (1) = p g α (1) Z , where Z ∼ N (0 , I p ) and Z is indep enden t of g α (1). While the quadratic v ariation [ A sT ] is itself an additiv e func- tional, Proposition S.1.6 treats the sp ecial case that A sT has martingale b ehavior. The next inequality , Burkholder-Davis-Gundy (BDG) ine quality , is also neces- sary , particularly when w e analyze the diffusion estimator that inv olves triple dis- cretization. S.9 Theorem S.1.7. [ 31 , The or em 3.28, p age 166] F or a c ontinuous lo c al martingale M t with M 0 = 0 and any p > 0 , ther e exist pr o c ess-indep endent c onstants c p and C p so that for any stopping time T , c p E ([ M T ] p ) ≤ E (( M ∗ T ) 2 p ) ≤ C p E ([ M T ] p ) , wher e M ∗ T = sup t ≤ T | M t | is the maximum pr o c ess. S.1.4. Itˆ o’s form ula. Recall that the SDE X t ∈ M in the Stratono vich form satisfies dX t = ν ( X t ) dt + σ α ( X t ) ◦ dW α ( t ) , whic h app ears in the Itˆ o form as dX t =  ν + 1 2 X α ∇ σ α σ α  ( X t ) | {z } := µ ( X t ) dt + σ α ( X t ) dW α ( t ) . With this fact, here is another wa y to see why curv ature comes into play and why pro jection is needed in the drift estimator. Consider f = ι : M → R p , where ι is the isometric em b edding of ( M , g ) into R p . The explicit Euclidean-em b edded process Z t = ι ( X t ) satisfies d Z t = ι ∗ ν ( Z t ) dt + ι ∗ σ α ( Z t ) ◦ dW α t =  ι ∗ ν + 1 2 D ι ∗ σ α ( ι ∗ σ α )  ( Z t ) | {z }  = ι ∗ µ ( Z t ) dt + ι ∗ σ α ( Z t ) dW α t , where D ι ∗ σ α ( ι ∗ σ α ) denotes the cov arian t deriv ative of the vector field ι ∗ σ α along itself in the ambien t space R p , whic h satisfies D ι ∗ σ α ( ι ∗ σ α )( ι ( x )) = ι ∗ | x ∇ σ α σ α + I I x ( σ α , σ α ) and contains a nontrivial normal comp onen t. This computation sho ws that if w e estimate the drift term using the Euclidean estimator, the normal comp onent biases the estimator and it is the dynamics of the extrinsic Z t = ι ( X t ) that w e are dealing with, not the intrinsic drift term of X t that w e w an t. Next w e document the Itˆ o’s formula. Consider f ∈ C 2 ( M , R p ) with supp f ⊂ B r ( x ) for r < inj x ( M ). F or 0 ≤ t 1 ≤ t 2 , the Itˆ o’s form ula giv es f ( X t 2 ) − f ( X t 1 ) = Z t 2 t 1 ( ν f )( X s ) ds + Z t 2 t 1 ( σ α f )( X s ) ◦ dW α s = Z t 2 t 1 h ( ν f )( X s ) + 1 2 r X l =1 ( σ α σ α f )( X s ) i | {z } := µ f ( X s ) ds + Z t 2 t 1 ( σ α f )( X s ) | {z } := σ f,α ( X s ) dW α s . (S.9) Recall that the second equality comes from the definition of Hessian Hess q ( X, Y ) = X ( Y q ) − ( ∇ X Y ) q [ 28 , page 23] for the conv ersion from Stratonovic h formulation to Itˆ o form ulation. W e will use this notation throughout the pro ofs. S.1.5. Sufficien t conditions for Assumption 2 when M is noncompact. F or Assumption 2 (1), the simplest one is the linear growth b ound and lo cally Lipschitz condition via extension. Denote the smo oth extensions of ν and σ k , k = 1 , . . . , r , in ( 3 ) to R p as e ν and e σ k , and e σ ( x ) := [ e σ 1 ( x ) , . . . , e σ r ( x )] ∈ R p × r . The line ar gr owth b ound condition is ∥ e σ ( x ) ∥ + | e ν ( x ) | ≤ C (1 + | x | ) S.10 JACOB MCERLEAN AND HA U-TIENG WU for some C > 0, and their deriv atives of an y order are bounded. The lo c al ly Lipschitz is for any N > 0, there exists C N > 0 such that ∥ e σ ( x ) − e σ ( x ′ ) ∥ + | e ν ( x ) − e ν ( x ′ ) | ≤ C N | x − x ′ | for an y x, x ′ ∈ B 0 ( N ). The linear growth b ound and lo cally Lipschitz condition of e ν and e σ guarantees the solution do es not explo de [ 28 , (1.1.11)]. See [ 54 , Theorem 2.1.1 and Corollary 2.1.2] for other conditions. The F oster-Ly apuno v criterion [ 40 ] provides a sufficient condition for Assump- tions 2 (1) and 2 (4). A function V : M → R + is called a norm-like function if V ( x ) → ∞ as x → ∞ . The norm-like functions are essentially a sp ecific type of Ly apunov function tailored for studying stability and ergodicity of Mark o v pro- cesses. Assume there exist constants c ≥ 0 and a norm-like function V so that ( LV )( x ) ≤ cV ( x ) , then P { d g ( X t , x ) = ∞} = 0 for any x ∈ M , which is another sufficient condition for Assumption 2 (1). T o discuss Harris recurrence, we need more quantities. Define K a := R ∞ 0 P t a ( dt ), where a is a probability distribution on R + . A non-empt y set C ∈ B ( M ) is called ϕ a -p etite if ϕ a is a non-trivial measure on B ( M ) and K a ( x, · ) ≥ ϕ a ( · ) for all x ∈ C . The notion of a p etite set is related to irreducibility of X t . It is kno wn that [ 39 , Theorem 3.2] if all compact subsets of M are p etite , and if there exists a compact set C ⊂ M , a constant d > 0, and a norm-like function V suc h that the c ondition for r e curr enc e ( LV )( x ) ≤ d 1 C ( x ) holds for all x ∈ M , then X t is Harris recurrent. Next, if X t is F eller and non- explosiv e, and the p ositive r e curr enc e c ondition ( LV )( x ) ≤ − cf ( x ) + d 1 C ( x ) holds for all x ∈ M for some c, d > 0, f : M → [1 , ∞ ), a compact set C ⊂ M , and V ≥ 0, then X t is positive Harris recurrent [ 39 , Theorem 4.5]. Note that this sufficient condition do es not in v olv es irreducibility . Another sufficien t condition for a non-explosive X t b eing p ositive Harris recurrent is that the p ositive recurrence condition holds with C a closed p etite set and that V is b ounded on C [ 39 , Theorem 4.2]. The existence of a norm-lik e function satisfying the ab o v e conditions guarantees that the drift term pulls the pro cess back to w ard a compact region, preven ting escap e to infinit y . In practice, constructing an explicit norm-lik e function can b e technically demanding. There are also sufficient conditions for Assumption 2 (3). W e call a probabilit y measure µ on B ( M ) quasi-invariant of the Mark o v op erator P t if µP t is absolutely con tinuous with related to µ , where µP t ( A ) := µ ( P t 1 A ) and A ∈ B ( M ). Assume L = ∆ + Z , where Z ∈ C ∞ ( T M ). Define Ric Z ( X, Y ) = Ric ( X , Y ) − ⟨∇ X Z, Y ⟩ , where X , Y , Z ∈ C ∞ (Γ M ). By [ 54 , Theorem 2.3.3], Ric Z ≥ K for some K ∈ R is equiv alent to the Harnack-typ e inequality that when p > 1, for any b ounded non-negativ e measurable function f : ( P t f ) p ( x ) ≤ P t ( f p )( y ) exp  K pd g ( x, y ) 2 2( p − 1)( e 2 K t − 1)  . S.11 If Ric Z ≥ K for K ∈ R and there exists a quasi-in v ariant probability measure µ of P t , then P t is strong F eller [ 54 , Theorem 1.3.1]. Moreov er, P t has at most one inv arian t probabilit y measure, and if it has, the k ernel p t of P t related to the inv arian t probability measure is strictly p ositive. See [ 29 , 46 ] for sufficient conditions for F eller. Assumption 2 (3) might b e the most tric ky one. It holds when the diffusion k ernel has a Gaussian-like control. Such Gaussian-like control holds, for example, when L = ∆ and M has non-negativ e Ricci curv ature or is quasi-isometric to those with non-negative Ricci curv ature, or that cov er a compact manifold with deck- transformation group having p olynomial volume growth [ 24 ]. It is possible that the diffusion k ernel do es not hav e Gaussian-lik e control but fulfills Assumption 2 (3). A concrete example is when M is a 3-dim Cartan-Hadamard manifold H 3 k with constant curv ature − k 2 < 0, where k > 0. Its heat kernel is p t ( x, y ) = 1 (4 π t ) 3 / 2 kd ( x,y ) sinh( kd ( x,y )) exp  − d ( x,y ) 2 4 t − k 2 t  [ 23 ]. See [ 24 ] for v arious examples of non- compact manifolds with finite ends, whose heat kernel has a non-Gaussian control. When the drift term is nontrivial, here is a relev ant con trol. Assume again L = ∆ + Z . Assume Z = ∇ V for some V ∈ C 2 ( M ) and the in v ariant measure is µ ( dx ) := e V ( x ) dx . Let p t b e the diffusion kernel of P t with related to µ . When Ric Z ≥ K for K ∈ R , for any δ > 0, there exists c ( δ ) > 0 such that a Gaussian-like upp er b ound holds [ 54 , Theorem 2.4.4]: p t ( x, y ) ≤ ( µ ( B √ t ( x )) µ ( B √ t ( y ))) − 1 / 2 exp  c ( δ )(1 + t ) − ρ ( x, y ) 2 2 δ t  for all t > 0 and x, y ∈ M . If w e further assume ϕ X is a probabilit y measure, w e ha ve the lo w er b ound p t ( x, y ) ≥ exp  − K ρ ( x, y ) 2 / 2( e K t − 1)  for t > 0 and x, y ∈ M . The ab ov e examples and sufficient conditions show the complication of the diffusion kernel control, and the necessity of Assumption 2 (3). S.1.6. General to ols. In the analysis, w e encoun ter v arious kinds of error con trols in the ratio form. W e summarize common ones in the follo wing lemmas. Lemma S.1.1. Supp ose that X t d − − → X as t → ∞ , with X > 0 almost sur ely. Mor e- over, supp ose A t p − − → A and B t p − − → B , for A, B ∈ R and B  = 0 , and C t , D t p − − → 0 as t → ∞ . Then, we have a c onver genc e in pr ob ability for the fol lowing r atio: X t A t + C t X t B t + D t p − − → A B as t → ∞ . Pr o of. Re-write the ratio as X t A t + C t X t B t + D t = A t + c t B t + d t , where c t = C t /X t . W e ha ve c t = o p (1) and d t = o p (1). It follo ws that A t + c t p − − → A and B t + d t p − − → B and so by the con tin uous mapping theorem, A t + c t B t + d t p − − → A B . □ W e hav e another technical lemma related to ratios. Lemma S.1.2. Supp ose X t d − − → X and Y t d − − → Y as t → ∞ , wher e X , Y > 0 almost sur ely. Supp ose A t − → 0 and B t − → 0 deterministic al ly. Then, X t + A t Y t + B t = X t Y t + O p ( A t ) + O p ( B t ) . S.12 JACOB MCERLEAN AND HA U-TIENG WU Pr o of. Rewrite X t + A t Y t + B t = X t Y t + A t  1 Y t Y t Y t + B t  − B t  X t Y 2 t Y t Y t + B t  . Since Y t Y t + B t p − − → 1 b y Lemma S.1.1 and 1 Y t d − − → 1 Y and X t Y 2 t d − − → X Y 2 b y the con tin uous mapping theorem, w e hav e 1 Y t  Y t Y t + B t  d − − → 1 Y and X t Y 2 t Y t Y t + B t d − − → X Y 2 b y the Slutsky’s theorem. Thus, w e ha v e A t  1 Y t Y t Y t + B t  = O p ( A t ) and B t  X t Y 2 t Y t Y t + B t  = O p ( B t ) and hence the pro of. □ Next, we summarize the sto c hastic integration by parts. T o b e self-contended, w e include a p ortion of Theorem 2.2 of [ 52 ] in our Lemma S.1.3 . Lemma S.1.3 (Sto chastic F ubini ([ 52 ] Theorem 2.2, Assertion (iii))) . L et (Ω , A , P ) b e a pr ob ability sp ac e e quipp e d with a c omplete, right-c ontinuous filtr ation ( F t ) t ≥ 0 . L et ( X, Σ , µ ) b e a σ -finite me asur e sp ac e and let S t = M t + A t b e a c ontinuous semimartingale. L et ψ : X × [0 , T ] × Ω → R b e pr o gr essively me asur able, and assume that, for almost al l ω , Z X  Z T 0 | ψ ( x, t, ω ) | 2 d [ M ]( t, ω )  1 / 2 dµ ( x ) < ∞ , (S.10) Z X Z T 0 | ψ ( x, t, ω ) | d | A | ( t, ω ) dµ ( x ) < ∞ . (S.11) Then for almost al l ω and every t ∈ [0 , T ] , Z X  Z t 0 ψ ( x, s, ω ) dS s ( ω )  dµ ( x ) = Z t 0  Z X ψ ( x, s, ω ) dµ ( x )  dS s ( ω ) . (S.12) Lemma S.1.4. L et 0 < a < b and f , g ∈ C ( M ) . Supp ose X t is adapte d to the Br ownian motion W t , and M t is a c ontinuous martingale. Then, Z b a  Z s a M v f ( X v ) dv  g ( X s ) ds = Z b a M s f ( X s ) Z b s g ( X v ) dv ! ds , (S.13) Z b a  Z s a M v f ( X v ) dW v  g ( X s ) ds = Z b a M s f ( X s ) Z b s g ( X v ) dv ! dW s (S.14) = Z b a M s f ( X s ) dW s ! Z b a g ( X t ) dt ! − Z b a M s f ( X s )  Z s a g ( X v ) dv  dW s . Mor e over when M t ≡ 1 and g ( X t ) ≡ 1 , we have the fol lowing sto chastic inte gr ation by p arts formulae: Z b a  Z s a f ( X v ) dv  ds = Z b a ( b − s ) f ( X s ) ds , (S.15) Z b a  Z s a f ( X v ) dW v  ds = Z b a ( b − s ) f ( X s ) dW s . (S.16) Pr o of. F or ( S.13 ), consider the triangle { ( v , s ) : a ≤ v ≤ s ≤ b } in the ( v , s )- plane and apply F ubini’s theorem. F or ( S.14 ), we verify that the sto chastic F ubini theorem stated in Lemma S.1.3 applies. W e take the parameter space X = [0 , b ] S.13 with Leb esgue measure, set T = b , and take the semimartingale S t = W t . Define the progressiv ely measurable function ψ : [0 , b ] × [0 , b ] × Ω → R by ψ ( v , t, ω ) := g ( X v ( ω )) M t ( ω ) f ( X t ( ω )) 1 { a ≤ t ≤ v ≤ b } , ( v , t ) ∈ [0 , b ] 2 . Since S t = W t , we hav e A t ≡ 0, and thus condition ( S.11 ) holds. It suffices to v erify the square-integrabilit y condition ( S.10 ); i.e. for almost all ω , R b 0  R b 0 | ψ ( v , t, ω ) | 2 dt  1 / 2 dv < ∞ . Using ∥ g ∥ ∞ < ∞ and that ψ ( v , t, ω ) = 0 unless a ≤ t ≤ v ≤ b , we obtain for almost all ω , Z b 0  Z b 0 | ψ ( v , t, ω ) | 2 dt  1 / 2 dv = Z b a | g ( X v ( ω )) |  Z v a | M t ( ω ) f ( X t ( ω )) | 2 dt  1 / 2 dv ≤ ∥ g ∥ ∞ ( b − a )  Z b a | M t ( ω ) f ( X t ( ω )) | 2 dt  1 / 2 < ∞ , where the last inequality follo ws since M has con tin uous sample paths and is there- fore b ounded on the compact interv al [ a, b ] almost surely; together with ∥ f ∥ ∞ < ∞ this implies R b a | M t ( ω ) f ( X t ( ω )) | 2 dt < ∞ . Therefore, Lemma S.1.3 yields ( S.12 ), so that w e obtain Z b 0 Z b 0 ψ ( v , r, ω ) dW r ( ω ) ! dv = Z b 0 Z b 0 ψ ( v , r, ω ) dv ! dW r ( ω ) , a.s. By the definition of ψ , suppressing the ω notation, the left-hand side equals Z b a g ( X v )  Z v a M r f ( X r ) dW r  dv = Z b a  Z s a M v f ( X v ) dW v  g ( X s ) ds, and the right-hand side equals Z b a M r f ( X r ) Z b r g ( X v ) dv ! dW r = Z b a M s f ( X s ) Z b s g ( X v ) dv ! dW s . This gives the first equalit y in ( S.14 ). The second equality in ( S.14 ) follows from the iden tit y Z b s g ( X v ) dv = Z b a g ( X v ) dv − Z s a g ( X v ) dv . When M t ≡ 1 and g ≡ 1, the remaining claim follows immediately . □ Appendix S.2. Technical lemmas In this section, we provide tec hnical lemmas to streamline the pro ofs of our main results. The first lemma inv olves calculation on the manifold setup. Lemma S.2.1. Assume Assumptions 1 , 2 , and 4 . Define U 1 ( x ) := 1 h d Z 1 0 K  D x ( X s ) h  H ( X s ) ds , (S.17) wher e H ∈ C b ( M ) is non-ne gative and K ∈ C ( R ) is non-ne gative and supp orte d on [0 , L ] , wher e L > 0 . Supp ose D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ . When h → 0 , we have E M λ ( U 1 ( x )) → C X H ( x ) p X ( x ) Z R d K ( ∥ u ∥ ) du , wher e C X is a c onstant define d in Pr op osition S.1.4 . S.14 JACOB MCERLEAN AND HA U-TIENG WU Define U − 1 ( x ) := 1 h d Z 1 0 K  D x ( X s ) h  ( H ( X s ) − H ( x )) ds , (S.18) wher e we further assume H ∈ C 3 ( M ) and K ∈ C 3 ( R ) . Sinc e D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ , we have E M λ ( U − 1 ( x )) h 2 → C X 2  p X ( x )∆ M H ( x ) + 2 ∇ M H ( x ) · ∇ M p X ( x )  Z R d K ( ∥ u ∥ ) ∥ u ∥ 2 du when h → 0 , wher e ∆ M and ∇ M ar e the L aplac e-Beltr ami op er ator and c ovariant derivative on ( M , g ) . Pr o of. By Prop osition S.1.4 with f ( · ) = 1 h d K  D x ( · ) h  H ( · ), we obtain E M λ ( U 1 ( x )) = C X  ϕ X , 1 h d K  D x ( · ) h  H ( · )  M = C X Z M 1 h d K  D x ( x ′ ) h  H ( x ′ ) p X ( x ′ ) dV g ( x ′ ) . Similarly , by linearity of E M λ , apply Prop osition S.1.4 twice and get E M λ ( U − 1 ( x )) = C X Z M 1 h d K  D x ( x ′ ) h  ( H ( x ′ ) − H ( x )) p X ( x ′ ) dV g ( x ′ ) . Since D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ R p , recall that for x ′ = exp x ( tθ ), where ∥ θ ∥ = 1, we ha ve [ 36 , Lemma 4] ι ( x ) − ι ( x ′ ) = tι ∗ θ + 1 2 t 2 I I x ( θ , θ ) + 1 6 t 3 ∇ θ I I x ( θ , θ ) + O ( t 4 ) , where I I x is the second fundamental form of the embedding ι , and hence ∥ ι ( x ) − ι ( x ′ ) ∥ R p = t + 1 6 t 3 ι ∗ θ ⊤ ∇ θ I I x ( θ , θ ) + O ( t 4 ) when t is sufficien tly small. Also recall that under the exp onential map with p olar co ordinate, when v = tθ , ∥ θ ∥ = 1 and t > 0 is sufficiently small, we hav e [ 56 , Lemma SI.1] | det d exp x ( v ) | = t d − 1 − 1 6 Ric x ( θ , θ ) t d +1 + O ( t d +2 ) . With a c hange of v ariables under the exp onential map exp x with p olar co ordinate on R d and Assumption 4 , we hav e by T a ylor’s expansion: Z M 1 h d K  D x ( x ′ ) h  H ( x ′ ) p X ( x ′ ) dV g ( x ′ ) = Z R d 1 h d  K  t h  + K ′  t h  t 3 ι ∗ θ ⊤ ∇ θ I I x ( θ , θ ) 6 h + O ( t 4 )  ×  H ( x ) + t ∇H| x θ + t 2 2 ∇ 2 H| x ( θ , θ ) + O ( t 3 )  ×  p X ( x ) + t ∇ p X | x θ + t 2 2 ∇ 2 p X | x ( θ , θ ) + O ( t 3 )  ×  t d − 1 − 1 6 Ric x ( θ , θ ) t d +1 + O ( t d +2 )  dtdθ , S.15 where in the T aylor expansion of K we use the fact that t ≤ Lh . Thus, b y collecting all terms with t d − 1 p o w er, w e ha v e E M λ ( U 1 ( x )) − → C X H ( x ) p X ( x ) Z R d K ( ∥ u ∥ ) du , as h → 0, where the conv ergence holds since K is uniformly con tinuous with com- pact supp ort. Regarding E M λ ( U − 1 ( x )), since the dominan t term is deleted and all terms with t d p o w er are o dd and get canceled out due to the kernel symmetry , w e collect terms with t d +1 p o w er, and obtain 1 h 2 E M λ ( U − 1 ( x )) − → 1 2 ∆( H ( x ) p X ( x )) Z R d K ( ∥ u ∥ ) ∥ u ∥ 2 du as h → 0. □ Lemma S.2.2. Assume Assumptions 1 and 2 hold. Denote the LL-emb e dding of X t as Z t , which is define d on M := M × [0 , 1] × M . Denote the gener alize d life-cycle de c omp osition determine d fr om Z t as { S m , R m } ∞ m =0 , wher e R 0 = S 0 = 0 . Set N t = X m ≥ 1 1 { S m ≤ t } , (S.19) wher e t > 0 . Asymptotic al ly when T → ∞ , we have N T − E M λ ( N T ) = O p (Υ( T ) 1 / 2 ) . and N sT Υ( T ) d − − → C − 1 X g α ( s ) , s > 0 , in D ( R + , R ) as T → ∞ , wher e g α is the Mittag-L effler pr o c ess of index α ∈ (0 , 1] . Pr o of. By Prop osition S.1.1 , Z t is Harris recurrent, and by definition, N t is an in tegrable additiv e functional of Z t . First, by Prop osition S.1.4 , E M ϕ X ( N 1 ) = C − 1 X E M ξ ( N R 1 ) , where ξ is the probabilit y measure supported on a nontrivial compact set C used in the construction of Z t . By the construction of the generalized life cycle decom- p osition, N R 1 = X m ≥ 1 1 { S m ≤ R 1 } = 1 . Therefore, by plugging ξ ( x ) ⊗ U ( u ) ⊗ Q ( x, u, dy ) in to the exp ectation, since R M R 1 0 Q ( x, u ) dy du = U 1 ( x, dy ), and R M U 1 ( x, dy ) = 1 for an y x , we hav e E M ξ ( N R 1 ) = Z ( x,u,y ) ∈M Q ( x, u, dy ) du 1 dξ ( x ) = 1 since ξ is a probabilit y measure. T o study N sT Υ( T ) , note that we assume Prop osition 4.2 (1) holds for X t in Assump- tion 3 . W e claim that the same assumption holds for Z t b y the ratio limit theorem (see Theorem 4.1 ). F or a b ounded measurable function f : M → R satisfying S.16 JACOB MCERLEAN AND HA U-TIENG WU f ( x, u, y ) = f 1 ( x ), where f 1 is a bounded measurable function on M , and another b ounded measurable function g : M → R so that ⟨ g , ϕ X ⟩ M  = 0, w e ha v e E M λ ( R T 0 f ( Z t ) dt ) E M λ ( R T 0 g ( X t ) dt ) = E M λ ( R T 0 f 1 ( Z 1 t ) dt ) E M λ ( R T 0 g ( X t ) dt ) = E M λ ( R T 0 f 1 ( X t ) dt ) E M λ ( R T 0 g ( X t ) dt ) → ⟨ f 1 , ϕ X ⟩ M ⟨ g , ϕ X ⟩ M as T → ∞ , where the second equalit y holds by Prop osition S.1.2 and the limit holds due to the ratio limit theorem. Therefore, E M λ ( R T 0 g ( X t ) dt ) ∼ Υ( T ) implies E M λ ( R T 0 f ( Z t ) dt ) ∼ Υ( T ), and hence Z t satisfies the condition in Proposition 4.2 (1). Since 0 < E M ϕ X ( N 1 ) = C − 1 X < ∞ , we hav e N sT Υ( T ) d − − → C − 1 X g α ( s ) in D ( R + , R ) as T → ∞ , where g α is the Mittag-Leffler pro cess of index α . F or the additive functional A t := N t − E M λ ( N t ), note that it is comp ensated and hence a martingale. So we can apply Proposition S.1.6 to A t and obtain A T p Υ( T ) d − − → q E M ϕ X ([ A 1 ]) W α (1) as T → ∞ . It follo ws that A T = O p (Υ( T ) 1 / 2 ). □ The following Master lemm a generalizes results in [ 8 ] to the manifold setup, where w e carefully take care the geometric challenges. Lemma S.2.3 (Master lemma) . Assume Assumptions 1 , 2 , 3 , and 4 hold. Define I ( x ) := 1 h d Z T 0 K  D x ( X s ) h  H ( X s ) ds , (S.20) wher e H ∈ C ( M ) is non-ne gative and b ounde d and K ∈ C ( R ) is non-ne gative and supp orte d on [0 , L ] for L > 0 . When D ( x ′ ) = ∥ x − x ′ ∥ R p , as h → 0 and T → ∞ , I ( x ) Υ( T ) d − − → g α (1) p X ( x ) H ( x ) Z R d K ( ∥ u ∥ ) du . Our drift and diffusion estimators naturally tak e the form of ratios of t wo additive functionals as in ( S.20 ), where the denominator estimates the o ccupation density . Pr o of of L emma S.2.3 . Denote λ to be the initial measure for X t on M . Recall th at w e assume λ is compactly supp orted on C ⊂ M and dV g ( C ) > 0 in Assumption 2 . Adapt the LL em b edding detailed in Section S.1.1 with the probability measure ξ equiv alent to ϕ X ( · ∩ C ). Since we assume p X is strictly p ositive on M , we hav e ϕ X ( C ) > 0. The LL-em b edded X t is denoted as Z t . F or any initial measure λ of X t , let E M λ b e defined as in ( S.5 ), where M := M × [0 , 1] × M . Recall the generalized life- cycle decomp osition for the process, { S m , R m } ∞ m =0 , where R 0 = S 0 = 0, determined from Z t . Denote K h ( · ) := 1 h d K  D x ( · ) h  . F or m ∈ N ∪ { 0 } , set U m := Z R m +1 R m K h ( X s ) H ( X s ) ds , (S.21) S.17 whic h is a decomposition of I ( x ). Recall Proposition S.1.3 . By construction, U l is a stationary ergo dic sequence under P ξ so that U 1 , U 3 , U 5 , . . . and U 0 , U 2 , U 4 . . . are t wo sequences of i.i.d. random v ariables. Also set N T = X m ≥ 1 1 { S m ≤ T } , (S.22) whic h is an integrable additive functional of Z t . Decomp ose I ( x ) Υ( T ) = I ( x ) Υ( T ) − E M λ  I ( x ) Υ( T )  | {z } A + E M λ  I ( x ) Υ( T )  − N T Υ( T ) E M λ ( U 1 ( x )) | {z } B + N T Υ( T ) E M λ ( U 1 ( x )) | {z } C . Note that A is replacing I ( x ) Υ( T ) b y its deterministic equiv alence, B is controlling the error of replacing E M λ  I ( x ) Υ( T )  b y its generalized life-cycle decomp osition, and C is the main term to quantify; i.e., to finish the pro of, we show that asymptotically A and B are negligible and C is the dominan t term. Claim S.2.1. The quantity A = O p  1 Υ( T ) 1 / 2  . Pr o of of Claim S.2.1 . Since X t is strong F eller by Assumption 2 , K  D x ( · ) h  H ( · ) is a special function of p ositive measure (in the sense of [ 34 ]), b y ( S.6 ) we hav e E M λ  I ( x ) Υ( T )  = E M λ  I ( x ) Υ( T )  since the first co ordinate of Z t follo ws the same law of X t (see Prop osition S.1.2 ). By Cheb yshev’s inequalit y , P M λ ( |A| ≥ ζ ) ≤ 1 ζ 2 E M λ  I ( x ) Υ( T ) − E M λ  I ( x ) Υ( T )  2 ! . Recall ˜ g s,t ( a, b ) = ˜ ϕ s,t ( a, b ) − ˜ ϕ s ( a ) ˜ ϕ t ( b ) in ( 15 ). Then, rewrite the ab ov e quantit y in terms of quantifying path-dep endence, i.e., 1 ζ 2 E M λ  I ( x ) Υ( T ) − E M λ  I ( x ) Υ( T )  2 ! = 1 ζ 2 Υ 2 ( T ) Z Z [0 ,T ] 2 Co v ( K h ( X s ) H ( X s ) , K h ( X t ) H ( X t )) dsdt = 1 ζ 2 Υ 2 ( T ) Z Z [0 ,T ] 2 Z Z M × M K h ( a ) H ( a ) K h ( b ) H ( b ) ˜ g s,t ( a, b ) dV g ⊗ g ( a, b ) ! dsdt , where dV g ⊗ g ( a, b ) is the Riemannian volume form of the pro duct metric g ⊗ g on the product manifold M × M , and the last equality comes from the fact that E M λ ( K h ( X s ) H ( X s )) = Z M K h ( a ) H ( a ) ˜ ϕ s ( a ) dV g ( a ) . This formula allo ws us to pro duce an ergo dic-like result as T → ∞ following our assumptions on ˜ g s,t . By assumption, the join t and marginal densities are con tinuous S.18 JACOB MCERLEAN AND HA U-TIENG WU and b ounded on M [ 49 ], and hence on M × M . Th us, ˜ g s,t ( a, b ) ∈ L ∞ ( M × M , dV g ⊗ g ). By H¨ older’s inequalit y , for 1 /q + 1 /r = 1, where r is from Assumption 3 , Z Z M × M K h ( a ) H ( a ) K h ( b ) H ( b ) ˜ g s,t ( a, b ) dV g ⊗ g ( a, b ) ≤ 1 h 2 d (1 − 1 /q ) ∥ ˜ g s,t ( a, b ) ∥ r  Z Z M × M ( K q ) h ( a ) H q ( a )( K q ) h ( b ) H q ( b ) dadb  1 /q , where the functional norms are with respect to the measure dV g ⊗ g on M × M . Note that b y regularit y assumption and approximation of identit y , sup h> 0     Z Z M × M ( K q ) h ( a ) H q ( a )( K q ) h ( b ) H q ( b ) dadb     ≤ C 1 for some constant C 1 > 0. Indeed, by F ubini, ∥ ( K q ) h ( a ) H q ( a )( K q ) h ( b ) H q ( b ) ∥ L 1 ( M × M ) =  Z M ( K q ) h ( a ) H q ( a ) dV g ( a )  2 ≤ ∥H q ∥ 2 L ∞ ( M )  Z M ( K q ) h ( a ) dV g ( a )  2 . By the approximation-of-iden tity estimate, there exists a geometric constan t C geo > 0 suc h that sup h> 0 Z M ( K q ) h ( a ) dV g ( a ) ≤ C geo Z R d K q ( ∥ u ∥ ) du , and one may take C 1 =  ∥H q ∥ L ∞ ( M ) C geo R R d K q ( ∥ u ∥ ) du  2 . Hence, Z Z M × M K h ( a ) H ( a ) K h ( b ) H ( b ) ˜ g s,t ( a, b ) dV g ⊗ g ≤ C 1 /q 1 ∥ ˜ g s,t ( a, b ) ∥ r . Finally , we hav e P M λ ( |A| Υ( T ) 1 / 2 ≥ ζ ) ≤ C 1 /q 1 ζ 2 1 Υ( T ) h 2 d (1 − 1 /q ) Z Z [0 ,T ] 2 ∥ ˜ g s,t ( a, b ) ∥ r dsdt ! , whic h go es to 0 as T → ∞ b y Assumptions 3 and 6 . W e thus conclude A = O p  1 Υ( T ) 1 / 2  . □ Claim S.2.2. The quantity B = O p  1 Υ( T ) 1 / 2  . Pr o of of Claim S.2.2 . Recall that by construction, S 0 = R 0 = 0 and S N T ≤ T < S N T +1 < R N T +1 , whic h leads to I ( x ) = N T X m =0 U m − Z R N T +1 T K h ( X s ) H ( X s ) ds . S.19 Th us, the quantit y B can b e decomp osed into an initial term, a bulk sum, and a terminal term: B = 1 Υ( T ) E M λ N T X m =0 U m − Z R N T +1 T K h ( X s ) H ( X s ) ds ! − N T Υ( T ) E M λ ( U 1 ( x )) = E M λ  U 0 ( x ) Υ( T )  | {z } B 1 + E M λ   1 Υ( T )   X m ≥ 1 1 { S m ≤ T } U m     − N T Υ( T ) E M λ ( U 1 ( x )) | {z } B 2 − E M λ 1 Υ( T ) Z R N T +1 T K h ( X s ) H ( X s ) ds ! | {z } B 3 , where we rewrite the term P N T m =1 U m as P m ≥ 1 1 { S m ≤ T } U m in B 2 and 1 is the indicator function. T o see this, since S l < T for all 1 ≤ l ≤ N T , we hav e U l = 1 { S l < T } U l . On the other hand, since T < S N T +1 < R N T +1 b y construction, the term 1 { S N T +1 < T } U N T +1 = 1 { S N T + k < T } U N T + k = 0 for all k > 1. Note that B 1 and B 3 are b oundary terms, while B 2 is the dominan t term. Also note that the distribution of Z 0 is in general different from that of Z R n , for n ≥ 1, unless w e choose λ = ξ [ 35 , Prop osition 4.4], so B 1 is separated from B 2 . R emark 6 . Rewriting P N T m =1 U m as P m ≥ 1 1 { S m ≤ T } U m ma y at first app ear su- p erficial, but it is in fact necessary . Recall that the generalized life-cycle decomp o- sition of [ 34 ] produces a dual sequence of stopping times { S m , R m } ∞ m =0 satisfying S m < R m < S m +1 < . . . . The construction guarantees that X R m + s , for s > 0, is independent of the sigma algebra σ { X t | t ≤ S m } but not σ { X t | t ≤ R m } . Con- sequen tly , the decision whether to include U m in the sum must b e made at time S m < R m . W e start with B 2 . Since U m and 1 { S m < T } are indep enden t, the exp ectation b ecomes E M λ ( 1 { S m ≤ T } U m ( x )) = E M λ ( U m ( x )) E M λ ( 1 { S m ≤ T } ) . Recall that the sequence U m , for m ≥ 0, is a stationary ergo dic sequence under P ξ [ 35 , Proposition 4.4] arising from the generalized life-cycle decomp osition. By Prop osition S.1.4 , with the general initial measure λ , we ha ve E M λ [ U 1 ( x )] = · · · = E M λ [ U N T ( x )], and hence B 2 = E M λ   1 Υ( T )  X m ≥ 1 1 { S m ≤ T } U m ( x )    − N T Υ( T ) E M λ ( U 1 ( x )) = Υ( T ) − 1   E M λ ( U 1 ( x )) E M λ  X m ≥ 1 1 { S m ≤ T }  − N T E M λ ( U 1 ( x ))   = E M λ ( U 1 ( x ))Υ( T ) − 1 ( E M λ ( N T ) − N T ) . Since E M λ ( U 1 ( x )) = E M λ ( U 1 ( x )) by ( S.6 ), E M λ ( U 1 ( x )) is analyzed by Lemma S.2.1 . With Lemma S.2.2 for E M λ ( N T ) − N T , it follows that B 2 = O p  Υ( T ) − 1 / 2  . S.20 JACOB MCERLEAN AND HA U-TIENG WU Next we control B 1 . By Assumption 2 , the density function of λ is b ounded, so we write λ ( dx ) = p λ ( x ) dV g ( x ), where p λ ( x ) is bounded on C . On the other hand, ϕ X ( x ) = p X ( x ) dV g ( x ), where p X is strictly p ositiv e by assumption. Since λ is compactly supported on C , the same as the supp ort of ξ by construction, w e ha ve b y Radon-Nik o dym that λ ( dx ) = p λ ( x ) p X ( x ) p X ( x ) dV g ( x ) . Since K and H are non-negative, we hav e E M λ ( U 0 ( x )) = Z ( x,u,x ′ ) ∈M U 0 ( x ) λ ( dx ) ⊗ U ( du ) ⊗ Q (( x, u ) , dx ′ ) ≤  sup x ∈ C p λ ( x ) p X ( x )  Z M U 0 ( x ) ξ ( dx ) ⊗ U ( du ) ⊗ Q (( x, u ) , dx ′ ) =  sup x ∈ C p λ ( x ) p X ( x )  E M ξ ( U 0 ( x )) By assumption, sup x ∈ C p λ ( x ) p X ( x ) is b ounded, and E M ξ ( U 0 ( x )) is b ounded b y Prop osi- tion S.1.4 , so E M λ ( U 0 ( x )) is b ounded. Since Υ( T ) B 1 = E M λ ( U 0 ( x )) , w e conclude that B 1 = O p (Υ( T ) − 1 ). Finally , since K and H are non-negative functions, |B 3 | = E M λ 1 Υ( T ) Z R N T +1 T K h ( X s ) H ( X s ) ds ! ≤ E M λ 1 Υ( T ) Z R N T +1 R N T K h ( X s ) H ( X s ) ds ! = Υ( T ) − 1 E M λ ( U R N T ( x )) = Υ( T ) − 1 E M λ ( U 1 ( x )) , where the last equality holds by Prop osition S.1.4 . Since E M λ ( U 1 ( x )) = E M λ ( U 1 ( x )) can b e controlled b y Lemma S.2.1 , w e hav e B 3 = O p (Υ( T ) − 1 ). In the aggregate, w e obtain the claim that B = O p (Υ( T ) − 1 / 2 ). □ Finally , we control C . The quantities E M λ ( U 1 ( x )) and N T Υ( T ) can b e con trolled b y Lemmas S.2.1 and S.2.2 . By Prop osition S.1.5 , g α (1) ∼ G α . As a result, since D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ , we hav e C d − − → G α p X ( x ) H ( x ) Z R d K ( ∥ u ∥ ) du as h → 0 and T → ∞ . W e can no w finalize the pro of of Lemma S.2.3 b y assembling the ab o ve claims. By Claims S.2.1 and S.2.2 , we hav e con trols A = O p  1 Υ( T ) 1 / 2  and B = O p  1 Υ( T ) 1 / 2  resp ectiv ely . With the control of C , w e finish the pro of. □ R emark 7 . Using our notation, the initial measure of X t in [ 8 , Lemma A.1] is implicitly assumed to b e ξ (see the pro of on page 924, line 19 [ 8 ]). W e relax this assumption by assuming the distribution of X 0 is compactly supp orted on a non trivial compact subset with a b ounded density function. Next, we need to prepare some controls of the kernel function when deriv atives of D x ( · ) are inv olved. As a motiv ation, consider M = R 2 with K ( s ) = e − s 2 and D 0 ( u ) = p u 2 1 + u 2 2 for u ∈ R 2 . Then, even though ∂ 2 ∂ u 1 ∂ u 2 D 0 ( u ) = − u 1 u 2 ( u 2 1 + u 2 2 ) 3 / 2 is un- b ounded near 0, the pro duct function K ′  D 0 ( u ) h  ∂ 2 D 0 ∂ u 1 ∂ u 2 ( u ) = − 1 h e − ( u 2 1 + u 2 2 ) / 2 − u 1 u 2 u 2 1 + u 2 2 S.21 is b ounded near 0. This holds due to the fact that ∂ 2 D 0 ∂ u 1 ∂ u 2 ( u ) = O ( ∥ u ∥ − 1 ) at zero, while K ′ ( ∥ u ∥ ) is approximately linear, with no constant term, near ∥ u ∥ = 0. Since suc h a term inv olving deriv atives app ears frequen tly in our analysis, we verify in Lemma S.2.5 that this prop erty present in the Euclidean distance setting generalizes to the manifold setup. Lemma S.2.4. Supp ose the manifold mo del in Definition 1 hold. F or e ach x ∈ N , when r > 0 is sufficiently smal l, the designe d D x : B x ( r ) → [0 , ∞ ) satisfies D x ( x ′ ) = ˜ D x ( d g ( x, x ′ ) , θ x ′ ) , wher e x ′ = exp y ( d g ( x, x ′ ) θ x ′ ) ∈ B x ( r ) , for some nonne gative function ˜ D x define d on [0 , r ) × S d − 1 with ˜ D x (0 , θ ) = 0 for al l θ ∈ S d − 1 . If we set χ x : R d \ { 0 } → R by χ x ( u ) = ∂ ˜ D x ∂ ρ  0 , u ∥ u ∥  , χ x is p ositive and symmetric, i.e., χ x ( u ) = χ x ( − u ) for al l u ∈ R d \ { 0 } . Mor e over, with lo c al c o or dinates ( u 1 , · · · , u d ) ar ound x , ∂ 2 ∂ u i ∂ u j D x ( x ′ ) = O  1 D x ( x ′ )  for al l i, j = 1 , · · · , d . Pr o of. When D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ R p is sufficiently small, w e hav e [ 56 , Lemma SI.3] ∥ ι ( y ) − ι ( x ′ ) ∥ R p = d g ( x, x ′ ) + O ( d g ( x, x ′ ) 3 ). Therefore, ˜ D x ( ρ, θ ) = ρ + O ( ρ 3 ), where the implied constan t dep ends on θ , and hence χ x ( u ) = 1. Now, consider that in normal co ordinates, then for x ′ expressed as u ′ = ( u ′ 1 , · · · , u ′ d ), then D x ( x ′ ) = ∥ u ′ ∥ + O ( ∥ u ′ ∥ 3 ). Differen tiating t wice with resp ect to co ordinates u i and u j , we see that ∂ 2 ∂ u i ∂ u j D x ( x ′ ) = O  1 D x ( x ′ )  for all i, j = 1 , · · · , d . □ Lemma S.2.5. L et K ∈ C 3 ( R ) b e non-ne gative, c omp actly supp orte d, and sym- metric. L et x ∈ N and let D x b e define d as in L emma S.2.4 . Then, ∇D x ( · ) is b ounde d on B c ( x ) \ { x } , for some c > 0 . Then for al l 1 ≤ i, j ≤ d , ther e exists a b ounde d function ˜ K satisfying the r e quir ements for the kernel use d in L emma S.2.3 and dep ending on K ′′ (0) so that     K ′  D x ( x ′ ) h  ∂ 2 D x ∂ u i ∂ u j ( x ′ )     ≤ 1 h ˜ K  D x ( x ′ ) h  for x ′ ∈ B c ′ ( x ) \ { x } for some c ′ > 0 , as h → 0 . Pr o of. By the T a ylor expansion, for u i ∈ T x M and ∥ u i ∥ = 1, we hav e that lim h → 0 D x (exp x ( hu i )) h = ∂ ˜ D x ∂ ρ (0 , u i ) = χ x ( u i ) , where ˜ D x and χ x are defined in Lemma S.2.4 . Since χ is contin uous and S d − 1 is compact, then we can conclude that | χ x ( u i ) | ≤ C for all u i ∈ S d − 1 , some C > 0. So, ∇D x is bounded on B c ( x ) \ { x } for some c > 0. F or the second claim, note that giv en our symmetric assumption on K , b y the T aylor expansion we hav e K ( s ) = K (0) + K ′′ (0) 2 s 2 + O ( s 4 ), and hence K ′ ( s ) = K ′′ (0) s + O ( s 3 ) . F or s = D x ( y ) h , w e obtain K ′  D x ( y ) h  = K ′′ (0) D x ( y ) h + O  D x ( y ) h  3 ! . (S.23) S.22 JACOB MCERLEAN AND HA U-TIENG WU By Lemma S.2.4 , there exists c ′ , C ′ > 0 so that for all x ′ ∈ B c ′ ( x ) \ { x } , w e ha v e     ∂ 2 D x ∂ u i ∂ u j ( y )     ≤ C ′ D x ( x ′ ) . (S.24) Note that C ′ , c ′ do not dep end on x due to the manifold assumption. It follows from ( S.23 ) and ( S.24 ) that for all x ′ ∈ B c ′ ( x ) \ { x } ,     K ′  D x ( x ′ ) h  ∂ 2 D x ∂ u i ∂ u j ( x ′ )     ≤ 1 h C ′ K ′′ (0) + O  D x ( x ′ ) h  2 !! , Since locally a manifold can be well approximated by a linear affine subspace, we ha ve    D x ( x ′ ) h    ≤ C , where C > 0 is indep endent of x , when c ′ is sufficiently small. As a result, we hav e     K ′  D x ( x ′ ) h  ∂ 2 D x ∂ u i ∂ u j ( x ′ )     ≤ 1 h  C ′ K ′′ (0) + C 2  , By the compact supp ort assumption of K , supp ose supp K = [0 , L ]. W e can con- struct ˜ K ∈ C 3 ( R ) so that ˜ K ( t ) = C ′ K ′′ (0) + C 2 on [0 , L ] and deca ys to 0 smo othly on [ L, 2 L ]. Then, ˜ K satisfies the requiremen ts for the k ernel used in Lemma S.2.3 so that for all x ′ ∈ B c ′ ( x ) \ { x } ,     K ′  D x ( x ′ ) h  ∂ 2 D x ∂ u i ∂ u j ( x ′ )     ≤ 1 h ˜ K  D x ( x ′ ) h  . □ Before analyzing the discretization of additiv e functionals related to the estima- tors of interest, we establish another technical lemma for the sto c hastic in tegral. Lemma S.2.6. L et f ∈ C 3 ( R ) b e non-ne gative and c omp actly supp orte d on [0 , L ] ⊂ [0 , ∞ ) . Gr ant Assumptions 1 , 2 , 3 , and 4 . T ake a q ∈ C 3 ( M ) . Then, we have 1 h d/ 2 n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ s f  D x ( X r ) h  q ( X r ) dW r ! ds = O p ( p Υ( T )∆) . Pr o of. Denote I 0 := 0, I m ∆ := 1 h d/ 2 m X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ s f  D x ( X r ) h  q ( X r ) dW r ! ds = 1 h d/ 2 m X k =1 Z k ∆ ( k − 1)∆ ( k ∆ − s ) s f  D x ( X r ) h  q ( X r ) dW s , for m = 1 , · · · , n , where the second inequality comes from in tegrating b y parts using Lemma S.1.4 , and for t ∈ (( m − 1)∆ , m ∆] for some m , I t := I ( m − 1)∆ + 1 h d/ 2 Z t ( m − 1)∆ ( m ∆ − s ) s f  D x ( X r ) h  q ( X r ) dW s . By construction, I t is a contin uous martingale since it is the sto c hastic integral of the predictable pro cess r f  D x ( X r ) h  q ( X r ) with resp ect to the Brownian motion S.23 W s . When t ∈ (( m − 1)∆ , m ∆] for some m , the quadratic v ariation of I t is con trolled b y [ I t ] = [ I ( m − 1)∆ ] + Z t ( m − 1)∆ 1 h d ( m ∆ − s ) 2 f  D x ( X r ) h  q 2 ( X r ) ds ≤ [ I ( m − 1)∆ ] + Z t ( m − 1)∆ ∆ 2 h d f  D x ( X r ) h  q 2 ( X r ) ds . W e remark that while I t is defined as the sum of several terms, distinct terms ha ve disjoint time supp ort and hence there is cancellation of crosso v er terms in quadratic v ariation. So only quadratic v ariation of each term remains in [ I t ]. T h us, b y iterating the b ound, when T = n ∆, w e ha v e [ I T ] ≤ ∆ 2 h d Z T 0 f  D x ( X r ) h  q 2 ( X r ) ds . Apply Lemma S.2.3 with K, H set to f , q 2 to conclude that [ I T ] = O p (Υ( T )∆ 2 ), and so I T = O p ( p Υ( T )∆). □ With Lemma S.2.6 , we now analyze the discretization of an additiv e functional asso ciated with the estimators of interest. This result corresp onds to [ 8 , Lemma A.2]. W e emphasize that this discretization reflects practical data acquisition and differs from the discretization obtained via the generalized life-cycle decomp osition in Lemma S.2.3 . Also note that the function N here is not restricted to non-negative function lik e that in Lemma S.2.3 . Lemma S.2.7. L et K ∈ C 2 ( R ) b e non-ne gative and c omp actly supp orte d on [0 , L ] ⊂ [0 , ∞ ) . L et N ∈ C 2 ( M ) . Supp ose Assumptions 1 , 2 , 3 , 4 , and 6 hold. Then, when T = n ∆ is sufficiently lar ge, we have 1 h d Υ( T ) Z T 0 K  D x ( X s ) h  N ( X s ) ds − ∆ h d Υ( T ) n − 1 X k =0 K  D x ( X k ∆ ) h  N ( X k ∆ ) = O p  ∆ h 2  . Mor e over, E M λ 1 h d Υ( T ) Z T 0 K  D x ( X s ) h  N ( X s ) ds ! − E M λ ∆ h d Υ( T ) n − 1 X k =0 K  D x ( X k ∆ ) h  N ( X k ∆ ) ! = O  ∆ h 2  . Pr o of. T o simplify the notation, denote K h ( · ) := 1 h d K  D x ( · ) h  and q ∈ C 3 ( M ) as q ( x ′ ) := K h ( x ′ ) N ( x ′ ) . Since K is compactly supp orted on [0 , L ], when h > 0 is sufficien tly small, 2 Lh < inj x and q ( X t ) ≡ 0 when X t / ∈ B Lh ( x ). By Itˆ o’s formula ( S.9 ), for s ∈ (( k − S.24 JACOB MCERLEAN AND HA U-TIENG WU 1)∆ , k ∆], w e ha v e q ( X s ) − q ( X ( k − 1)∆ ) = Z s ( k − 1)∆ h ( µq )( X v ) + 1 2 r X α =1 σ α ( σ α q )( X v ) i dv + Z s ( k − 1)∆ ( σ α q )( X v ) dW α v . Also recall that with a lo cal co ordinate ( u 1 , · · · , u d ) on B r ( x ) centered at x , we ha ve ∇ 2 σ α ,σ α q = σ k α σ l α ( q kl − Γ j kl q j ) and ( ∇ σ α σ α )( q ) =  σ k α ∂ k σ j α + Γ j kl σ k α σ l α  q j , where q i =: ∂ i q = ∂ q ∂ u i , and q ij := ∂ ij q = ∂ 2 q ∂ u i ∂ u j . Assume we take the normal co ordinate so that Christoffel sym b ols disappear. W e ha v e σ α ( σ α q ) = σ k α σ l α q kl + σ k α ∂ k σ j α q j . Hence, Z k ∆ ( k − 1)∆ q ( X s ) − q ( X ( k − 1)∆ ) ds (S.25) = Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ µ j + 1 2 r X α =1 σ k α ∂ k σ j α ! ( X v ) q j ( X v ) dv ds + Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ ( σ j α q j )( X v ) dW α v ! ds + Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1 (( σ k α ) 2 q kk )( X v ) dv ! ds + Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1 (1 − δ kl )( σ k α σ l α q kl )( X v ) dv ! ds . W e now calculate deriv atives. F or i = 1 , · · · , d , q i ( X v ) = 1 h ( K ′ ) h ( X v ) ∂ D x ∂ u i ( X v ) N ( X v ) | {z } := A 1 i ( v ) + K h ( X v ) N i ( X v ) | {z } := A 2 i ( v ) . Con tinuing, for i = 1 , · · · , d , A j ii ( v ) := ∂ A j i ( v ) ∂ u i for j ∈ { 1 , 2 } satisfies A 1 ii ( v ) = 1 h 2 ( K ′′ ) h ( X v )  ∂ D x ∂ u i ( X v )  2 N ( X v ) + 1 h ( K ′ ) h ( X v )  ∂ 2 D x ∂ u 2 i ( X v ) N ( X v ) + ∂ D x ∂ u i ( X v ) N i ( X v )  A 2 ii ( v ) = 1 h ( K ′ ) h ( X v ) ∂ D x ∂ u i ( X v ) N i ( X v ) + K h ( X v ) N ii ( X v ) . S.25 Aggregating terms and lab eling, we compute q ii ( X v ) = A 1 ii ( v ) + A 2 ii ( v ) = 1 h 2 ( K ′′ ) h ( X v )  ∂ D x ∂ u i ( X v )  2 N ( X v ) + 1 h ( K ′ ) h ( X v ) ∂ 2 D x ∂ u 2 i ( X v ) N ( X v ) | {z } := B 1 i ( v ) + 2 h ( K ′ ) h ( X v ) ∂ D x ∂ u i ( X v ) N i ( X v ) | {z } := B 2 i ( v ) + K h ( X v ) N ii ( X v ) | {z } := B 3 i ( v ) . Similarly , for 1 ≤ i  = j ≤ d , we compute with A k ij ( v ) = ∂ A k j ( v ) ∂ u i for k ∈ { 1 , 2 } that A 1 ij ( v ) = 1 h 2 ( K ′′ ) h ( X v ) ∂ D x ∂ u j ( X v ) ∂ D x ∂ u i ( X v ) N ( X v ) + 1 h ( K ′ ) h ( X v )  ∂ 2 D x ∂ u i ∂ u j ( X v ) N ( X v ) + ∂ D x ∂ u j ( X v ) N i ( X v )  A 2 ij ( v ) = 1 h ( K ′ ) h ( X v ) ∂ D x ∂ u i ( X v ) N j ( X v ) + K h ( X v ) N ij ( X v ) . Again aggregating, we obtain for 1 ≤ i  = j ≤ d that q ij ( X v ) = A 1 ij ( v ) + A 2 ij ( v ) = 1 h 2 ( K ′′ ) h ( X v ) ∂ D x ∂ u j ( X v ) ∂ D x ∂ u i ( X v ) N ( X v ) + 1 h ( K ′ ) h ( X v ) ∂ 2 D x ∂ u i ∂ u j ( X v ) N ( X v ) | {z } := C 1 ij ( v ) + 1 h ( K ′ ) h ( X v )  ∂ D x ∂ u j ( X v ) N i ( X v ) + ∂ D x ∂ u i ( X v ) N j ( X v )  | {z } := C 2 ij ( v ) + K h ( X v ) N ij ( X v ) | {z } := C 3 ij ( v ) . By inserting these terms into ( S.25 ), we hav e 1 Υ( T ) n − 1 X k =0 Z ( k +1)∆ k ∆ [ q ( X s ) − q ( X ( k − 1)∆ )] ds = 1 Υ( T ) n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ µ j + 1 2 r X α =1 σ k α ∂ k σ j α ! ( X v )( A 1 j ( v ) + A 2 j ( v )) dv ds + 1 Υ( T ) n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ σ j α ( X v )( A 1 j ( v ) + A 2 j ( v )) dW α v ds + 1 Υ( T ) n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1 σ k α ( X v ) 2 ( B 1 k ( v ) + B 2 k ( v ) + B 3 k ( v )) dv ds + 1 Υ( T ) n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1 (1 − δ kl )( σ k α σ l α )( X v )( C 1 kl ( v ) + C 2 kl ( v ) + C 3 kl ( v )) dv ds , S.26 JACOB MCERLEAN AND HA U-TIENG WU and w e lab el terms in this expansion as I 1 := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ µ j + 1 2 r X α =1 σ k α ∂ k σ j α ! ( X v ) A 1 j ( v ) dv ds I 2 := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1 σ k α ( X v ) 2 B 1 k ( v ) dv ds I 3 := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1 (1 − δ kl )( σ k α σ l α )( X v ) C 1 kl ( v ) dv ds I 4 := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ σ j α ( X v ) A 1 j ( v ) dW α v ds . J 1 := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ µ j + 1 2 r X α =1 σ k α ∂ k σ j α ! ( X v ) A 2 j ( v ) dv ds J 2 := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1 σ k α ( X v ) 2 B 3 k ( v ) dv ds J 3 := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1 (1 − δ kl )( σ k α σ l α )( X v ) C 3 kl ( v ) dv ds J 4 := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ σ j α ( X v ) A 2 j ( v ) dW α v ds . L := n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ 1 2 r X α =1  σ k α ( X v ) 2 B 2 k ( v ) + (1 − δ kl )( σ k α σ l α )( X v ) C 2 kl ( v )  dv ds . As a result, the lemma statement can b e reform ulated to Z k ∆ ( k − 1)∆ q ( X s ) − q ( X ( k − 1)∆ ) ds = L + 4 X ℓ =1 ( I ℓ + J ℓ ) . Next, we analyze and con trol the asymptotic for each I ℓ , J ℓ , and L , b eginning with I 1 . Since K is compactly supp orted, w e only fo cus on the case that X v is sufficien tly close to x . W e hav e |I 1 | ≤ ∆ n X k =1 Z k ∆ ( k − 1)∆     µ j + 1 2 r X α =1 σ k α ∂ k σ j α  ( X v )      A 1 j ( v )   ds ≤ C ∆ h n X k =1 Z k ∆ ( k − 1)∆ | ( K ′ ) h ( X v ) |     µ j + 1 2 r X α =1 σ k α ∂ k σ j α  ( X v )    |N ( X v ) | ds , where in the last equalit y we use the fact that    ∂ D x ∂ u i ( X v )    ≤ C for some constant C > 0 (see the pro of in Lemma S.2.5 ) when T is sufficien tly large and h is sufficiently small. F or each j = 1 , · · · , d , we can apply Lemma S.2.3 with f set to | K ′ | and H set to    µ j + 1 2 P r α =1 σ k α ∂ k σ j α  N   and obtain I 1 = O p  ∆ h Υ( T )  . Analogous computations lead to I 2 = O p  ∆ h 2 Υ( T )  and I 3 = O p  ∆ h 2 Υ( T )  . S.27 Next, rewrite I 4 = n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ σ j α A 1 j ( v ) dW α v ds = 1 h n X k =1 Z k ∆ ( k − 1)∆ Z s ( k − 1)∆ ( K ′ ) h ( X v ) ∂ D x ∂ u j ( X v )( σ j α N )( X v ) dW α v ds . F or eac h j = 1 , · · · , d , since    ∂ D x ∂ u i ( X s )    ≤ C , w e can apply Lemma S.2.6 with f set to ( K ′ ) 2 and q set to ( σ j α N ) 2 to conclude I 4 = O p  √ Υ( T )∆ h ( d +2) / 2  . With the same approac hes, it follo ws that J 1 = O p (∆Υ( T )) , J 2 = O p (∆Υ( T )) , J 3 = O p (∆Υ( T )) , J 4 = O p  p Υ( T )∆ h d/ 2  , and L = O p  ∆ h Υ( T )  . Com bining all these we obtain 1 Υ( T ) n − 1 X k =0 Z ( k +1)∆ k ∆ [ q ( X s ) − q ( X ( k − 1)∆ )] ds = O p  ∆ h 2  , where we use Assumption 6 that ( h d − 2 Υ( T )) − 1 = o (1), and so we finish the first claim. Moreo ver, collecting the b ounds of all terms in the decomposition, we can sum- marize that our argument b ounded the difference      n X k =1 Z ( k +1)∆ k ∆ [ q ( X s ) − q ( X ( k − 1)∆ )] ds      ≤ | A T | + | M T | where A T := I 1 + I 2 + I 3 + J 1 + J 2 + J 3 + L contains predictable pro cess and M T := I 4 + J 4 con tains contin uous martingale. Then, use Lemma S.2.3 on A T and [ M T ] to conclude the probabilit y order that A T = O p  Υ( T )∆ h 2  and [ M T ] = O p  Υ( T )∆ 2 h d +2  , and hence that E M λ | A T | = O  Υ( T )∆ h 2  and E M λ [ M T ] = O  Υ( T )∆ 2 h d +2  . So, by Cauch y- Sc hw arz and the Itˆ o isometry , E M λ ( | M T | ) = O  √ Υ( T )∆ h ( d +2) / 2  . By Assumption 6 , it follo ws that E M λ      1 Υ( T ) n X k =1 Z ( k +1)∆ k ∆ [ q ( X s ) − q ( X ( k − 1)∆ )] ds      = O  ∆ h 2  and hence the second claim. □ Lemma S.2.7 quan tifies the difference b et w een a con tinuous integral and its “dou- ble discretization”, in whic h both the kernel and the function argumen ts replace X s b y X k ∆ . In the analysis of our estimators, w e also encounter a “single discretiza- tion”, where discretization app ears only in the kernel, while function increments suc h as f ( X ( k +1)∆ ) − f ( X k ∆ ) are represented via contin uous sto c hastic in tegrals. T o streamline the subsequent analysis, w e isolate this single-discretization setting in the following lemma. S.28 JACOB MCERLEAN AND HA U-TIENG WU Lemma S.2.8. With the same setup of L emma S.2.7 , when T = n ∆ is sufficiently lar ge, 1 Υ( T ) h d n − 1 X k =0 Z ( k +1)∆ k ∆ K  D x ( X k ∆ ) h  ( N ( X k ∆ ) − N ( X s )) ds = O p (∆) . (S.26) Pr o of. Denote K h ( · ) := 1 h d K  D x ( · ) h  . Applying Lemma S.1.4 to the decomposition of N ( X s ) − N ( X k ∆ ) as in ( S.25 ), we can compute n − 1 X k =0 Z ( k +1)∆ k ∆ K h ( X k ∆ )( N ( X s ) − N ( X k ∆ )) ds | {z } S = n − 1 X k =0 Z ( k +1)∆ k ∆ K h ( X k ∆ )(( k + 1)∆ − s ) µ j + 1 2 r X α =1 σ k α ∂ k σ j α ! N j ( X s ) ds | {z } S 1 + n − 1 X k =0 Z ( k +1)∆ k ∆ K h ( X k ∆ )(( k + 1)∆ − s ) σ j α N j ( X s ) dW α s | {z } S 2 + 1 2 n − 1 X k =0 Z ( k +1)∆ k ∆ K h ( X k ∆ )(( k + 1)∆ − s ) 1 2 r X α =1 ( σ k α ) 2 N kk ( X s ) ds | {z } S 3 + 1 2 n − 1 X k =0 Z ( k +1)∆ k ∆ K h ( X k ∆ )(( k + 1)∆ − s ) 1 2 r X α =1 (1 − δ kl ) σ k α σ l α N kl ( X s ) ds | {z } S 4 . By the regularity and growth assumptions, there exists a finite constant C > 0 due to the regularity assumption, such that S 1 is con trolled b y |S 1 | ≤ C n − 1 X k =0 Z ( k +1)∆ k ∆ K h ( X k ∆ )(( k + 1)∆ − s ) ds ≤ C ∆ 2 n − 1 X k =0 K h ( X k ∆ ) = C ∆ Z T 0 K h ( X s ) ds + O p  ∆ 2 Υ( T ) h 2  = O p (∆Υ( T )) , where the first equalit y comes from Lemma S.2.7 with K set to K and N set to 1 and the last equalit y comes from Lemma S.2.3 with K set to K and H set to 1 and ∆ h 2 = o (1). The same argumen t applies to S 3 and S 4 . Finally , S 2 can b e ev aluated just as in Lemma S.2.6 to obtain a probabilit y order of O p  ∆ h d/ 2 Υ( T ) 1 / 2  . So, o verall, |S | = O p (∆Υ( T )) + O p  ∆ h d/ 2 Υ( T ) 1 / 2  . As we assume that h d Υ( T ) → ∞ , it follo ws that ∆Υ( T ) 1 / 2 h d/ 2 = o p (∆Υ( T )). Thus, |S | = O p (∆Υ( T )) as claimed. □ With Lemma S.2.8 , we provide an analog of the result of Lemma S.2.7 when the k ernel is discretized but not the function of interest. S.29 Lemma S.2.9. With the same setup of L emma S.2.7 , when T = n ∆ is sufficiently lar ge, 1 h d Υ( T ) n − 1 X k =0 Z ( k +1)∆ k ∆  K  D x ( X k ∆ ) h  − K  D x ( X s ) h  N ( X s ) ds = O p  ∆ h 2  . Pr o of. Denote K h ( · ) := 1 h d K  D x ( · ) h  . W e may decomp ose 1 Υ( T ) n − 1 X k =0 Z ( k +1)∆ k ∆ [ K h ( X k ∆ ) − K h ( X s )] N ( X s ) ds = 1 Υ( T ) n − 1 X k =0 Z ( k +1)∆ k ∆ K h ( X k ∆ )( N ( X s ) − N ( X k ∆ )) ds | {z } := S 1 + 1 Υ( T ) n − 1 X k =0 Z ( k +1)∆ k ∆ [ K h ( X k ∆ ) N ( X k ∆ ) − K h ( X s ) N ( X s )] ds | {z } := S 2 . Then, apply Lemma S.2.8 to S 1 and Lemma S.2.7 to S 2 . □ In addition to Lemma S.2.8 , which treats additive functionals, we also need to handle the single-discretization case for sto c hastic integrals. The following Lemma completes the collection of technical results for the main pro ofs of the estimators. Lemma S.2.10. L et K ∈ C 3 ( R ) b e non-ne gative and c omp actly supp orte d on [0 , L ] ⊂ [0 , ∞ ) . Supp ose Assumptions 1 , 2 , 3 , and 4 hold. Consider two c ontinuous martingales Z (1) T and Z (2) T of the form Z ( i ) T = 1 h d/ 2 n − 1 X k =0 s K  D x ( X k ∆ ) h  Z ( k +1)∆ k ∆ q ( i ) ( X s ) dW s , wher e i = 1 , 2 and q (1) , q (2) ∈ C 2 ( M ) . Then, when T = n ∆ is sufficiently lar ge, we have [ Z (1) , Z (2) ] T Υ( T ) = 1 h d Υ( T ) Z T 0 K  D x ( X s ) h  q (1) ( X s ) q (2) ( X s ) ds + O p  ∆ h 2  . Pr o of. By a direct calculation, we hav e [ Z (1) , Z (2) ] t = 1 h d m − 1 X k =0 Z ( k +1)∆ k ∆ K  D x ( X k ∆ ) h  q (1) ( X s ) q (2) ( X s ) ds + 1 h d Z t m ∆ K  D x ( X m ∆ ) h  q (1) ( X s ) q (2) ( X s ) ds . W e can approximate this quantit y by replacing the contin uous pro cess X s with discrete observ ations X k ∆ . Define \ [ Z (1) , Z (2) ] T := 1 h d n − 1 X k =0 Z ( k +1)∆ k ∆ K  D x ( X k ∆ ) h  q (1) ( X k ∆ ) q (2) ( X k ∆ ) ds S.30 JACOB MCERLEAN AND HA U-TIENG WU and consider [ Z (1) , Z (2) ] T = \ [ Z (1) , Z (2) ] T + ([ Z (1) , Z (2) ] T − \ [ Z (1) , Z (2) ] T ) . First, [ Z (1) , Z (2) ] T − \ [ Z (1) , Z (2) ] T = 1 h d n − 1 X k =0 Z ( k +1)∆ k ∆ K  D x ( X k ∆ ) h   q (1) ( X s ) q (2) ( X s ) − q (1) ( X k ∆ ) q (2) ( X k ∆ )  ds , whic h b y Lemma S.2.8 with N = q (1) q (2) , w e obtain [ Z (1) , Z (2) ] T − \ [ Z (1) , Z (2) ] T = O p (∆ ν ( T )) . Moreo ver, b y Lemma S.2.7 , \ [ Z (1) , Z (2) ] T = 1 h d Z T 0 K  D x ( X s ) h  q (1) ( X s ) q (2) ( X s ) ds + O p  ∆ ν ( T ) h 2  . So, o v erall, [ Z (1) , Z (2) ] T = 1 h d Z T 0 K  D x ( X s ) h  q (1) ( X s ) q (2) ( X s ) ds + O p  ∆ ν ( T ) h 2  , as claimed. □ Appendix S.3. Proof of Theorem 4.3 With the technical preparation in Section S.2 , we can easily prov e Theorem 4.3 . Pr o of. By Lemma S.2.7 with N ≡ 1, when T is sufficiently large, we immediately ha ve ˆ L ( o ) ( x ) Υ( T ) = L ( o ) ( x ) Υ( T ) + O p  ∆ h 2  , where L ( o ) ( x ) := 1 h d R T 0 K  D x ( X s ) h  ds . As we ha ve assumed that ∆ h 2 = o (1), and we ha ve Υ( T ) − 1 L ( o ) ( x ) d − − → g α (1) p X ( x ) by Lemma S.2.3 with H = 1, we obtain ( 16 ). Moreo ver, b y Lemma S.2.7 , w e ha v e when T is sufficiently large, 1 Υ( T ) E M λ    ˆ L ( o ) ( x ) − L ( o ) ( x )    = O  ∆ h 2  and as in Lemma S.2.3 , we hav e that E M λ ( L ( o ) ( x )) = O (Υ( T )). □ Note that when assumptions of Theorem 4.3 hold, since g α (1)  = 0 almost surely , a ratio quantit y satisfies Υ( T ) ˆ L ( o ) ( x ) d − − → ( g α (1) p X ( x )) − 1 , (S.27) whic h follows from ( 16 ) together with the contin uous mapping theorem applied to the function s 7→ s − 1 . S.31 Appendix S.4. Prep ara tion for the proof of Theorems 4.6 S.4.1. Generalized Drift Estimator. Theorem S.4.1. Supp ose Assumptions 1 - 6 hold. Fix x ∈ M . T ake f ∈ C 2 ( M , R p ) with supp f ⊂ B r ( x ) for r < inj x ( M ) . Define ˆ µ f ( x ) := 1 ∆ P n − 1 k =0 K  D x ( X k ∆ ) h  ( f ( X ( k +1)∆ ) − f ( X k ∆ )) P n − 1 k =0 K  D x ( X k ∆ ) h  , (S.28) (S.29) ˆ L ( x ) := ∆ h d n − 1 X k =0 K  D x ( X k ∆ ) h  , B µ, o f ( x ) := κ 2 , 0  ∇ M µ f ( x ) · ∇ M log( p X ( x )) + 1 2 ∆ M µ f ( x )  , (S.30) wher e µ f is define d in ( S.9 ) , ∆ M is the L aplac e-Beltr ami op er ator on M , and ∇ M is the asso ciate d c ovariant derivative, and π f ( x ) := r X α =1 σ f ,α ( x ) σ f ,α ( x ) ⊤ ∈ R p × p , wher e σ f ,α is define d in ( S.9 ) . Supp ose h d Υ( T ) − → ∞ , h d +4 Υ( T ) − → C > 0 , C a c onstant, and ∆ h 2 p h d Υ( T ) − → 0 . Sinc e D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ R p , we have q h d ˆ L (o) ( x )  ˆ µ f ( x ) − µ f ( x ) − h 2 B µ, o f ( x )  d − − → N ( 0 , κ 2 , 0 π f ( x )) . Note that h d Υ( T ) − → ∞ and ∆ h 2 p h d Υ( T ) − → 0 jointly imply ∆ h 2 → 0 as T → ∞ . The quan tit y ˆ µ f ( x ) is a gener alize d drift estimator in the sense that it estimates the drift of the transformed proc ess f ( X t ) at f ( x ), and B µ f ( x ) quantifies the asso ciated bias. The key idea behind the proof is to control the discrepancy b et w een the additiv e functional and its uniform discretization, and then apply the generalized life-cycle decomp osition to quantify the resulting error. Before proving Theorem S.4.1 , w e establish a tec hnical lemma that streamlines the argument and will also b e used in the analysis of the diffusion estimator. Lemma S.4.1. Assume assumptions in The or em S.4.1 hold. Consider an R p - value d c ontinuous semimartingale Θ f ( t ) define d for t ≥ 0 as Θ f ( t ) := 1 h d/ 2 m − 1 X k =0 K  D x ( X k ∆ ) h  Z ( k +1)∆ k ∆ σ f ,α ( X s ) dW α s (S.31) + 1 h d/ 2 K  D x ( X m ∆ ) h  Z t m ∆ σ f ,α ( X s ) dW α s , wher e t ∈ [ m ∆ , ( m + 1)∆) for m = 0 , 1 , 2 , . . . , and so Θ f (0) = 0 imme diately. Denote J o ( x ) := κ 2 , 0 p X ( x ) . Then, when D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ R p , we have Θ f ( T ) p Υ( T ) , [Θ f ( T )] Υ( T ) ! d − − →  q J o ( x ) π f ( x ) W α (1) , J o ( x ) π f ( x ) g α (1)  (S.32) S.32 JACOB MCERLEAN AND HA U-TIENG WU in D ( R + , R ) as T → ∞ under P x for al l x ∈ M , and [Θ f ( T )] ˆ L ( x ) p − − → κ 2 , 0 π f ( x ) and Θ f ( T ) q ˆ L ( x ) d − − → N ( 0 , κ 2 , 0 π f ( x )) (S.33) as T → ∞ under P x for al l x ∈ M . Mor e over, Θ f ( T ) = O p ( p Υ( T )) . (S.34) Pr o of. Denote Θ f ( t ) := 1 h d/ 2 Z T 0 K  D x ( X s ) h  σ f ,l ( X s ) dW l s . Denote Θ f ( t ) = [Θ f 1 ( t ) , . . . , Θ f p ( t )] ⊤ ∈ R p . By Lemma S.2.10 , for any α, β = 1 , . . . , p , [Θ f α ( T ) , Θ f β ( T )] Υ( T ) = [Θ f α ( T ) , Θ f β ( T )] Υ( T ) + O p  ∆ h 2  . (S.35) When α = β , since π α,α f ( x ) ≥ 0 is finite by the regularit y assumption and ∆ h 2 → 0 as T → ∞ , then by Lemma S.2.3 with K ( · ) and H ( · ) set to K 2 ( · ) and π α,α f ( · ), we ha ve [Θ f α ( T ) , Θ f α ( T )] Υ( T ) d − − → J o ( x ) π α,α f ( x ) g α (1) as T → ∞ , where the co efficien t J o ( x ) := κ 2 , 0 p X ( x ) is calculated via Prop osition S.1.4 . When α  = β , in general we may not ha v e π α,β f ( x ) ≥ 0, but the analysis is similar to the diagonal case by applying Lemma S.2.10 to [Θ f α ( T ) , Θ f β ( T )] follow ed b y in voking the p olarization identit y , i.e., [Θ f α ( T ) , Θ f β ( T )] = 1 4  [Θ f α ( T ) + Θ f β ( T )] − [Θ f α ( T ) − Θ f β ( T )]  . W e can then apply Lemma S.2.3 to get [Θ f α ( T ) , Θ f β ( T )] Υ( T ) d − − → J o ( x ) π α,β f ( x ) g α (1) when T → ∞ , since we can collapse the polarization identit y in the limiting distribu- tion b y applying the ratio limit theorem. Denote [Θ f ( T )] := ([Θ f α ( T ) , Θ f β ( T )]) p α,β =1 and [Θ f ( T )] := ([Θ f α ( T ) , Θ f β ( T )]) p α,β =1 . W e hav e [Θ f ( T )] Υ( T ) d − − → J o ( x ) π f ( x ) g α (1) when T → ∞ , and hence ( S.34 ). Since Θ f ( T ) is lo cally square in tegrable and lo cally b ounded, we conclude ( S.32 ) from Prop osition S.1.6 ; that is, Θ f ( T ) p Υ( T ) , [Θ f ( T )] Υ( T ) ! = Θ f ( T ) p Υ( T ) , [Θ f ( T )] Υ( T ) ! + O p  ∆ h 2  d − − →  q J o ( x ) π f ( x ) W α (1) , J o ( x ) π f ( x ) g α (1)  , S.33 where the first equality comes from Lemma S.2.9 and ( S.35 ). Recall that π f ( x ) is non-negativ e definite b y the uniform ellipticity assumption. Next, by the same argument as that in [ 27 , Remark 4.26 and Chapter 7], w e ha ve (S.36)   Θ f ( T ) p Υ( T ) , s ˆ L ( x ) Υ( T )   d − − →  q J o ( x ) π f ( x ) p g α (1) Z, p p X ( x ) g α (1)  , where Z ∼ N (0 , I p ). By the contin uous mapping theorem with ( x, y ) 7→ x/y , we obtain Θ f ( T ) q ˆ L ( x ) d − − → N ( 0 , κ 2 , 0 π f ( x )) . The asymptotic b ehavior of [Θ f ( T )] ˆ L ( x ) can b e analyzed similarly with a w eak con ver- gence, or analyzed directly b y applying Lemma S.2.3 to analyze the denominator and n umerator of [Θ f α ( T ) , Θ f β ( T )] ˆ L ( x ) sim ultaneously , which leads to [Θ f α ( T ) , Θ f β ( T )] ˆ L ( x ) = E M λ  R R 2 R 1 ( K 2 ) h ( X s ) π α,β f ( X s ) ds  + Υ( T ) N T O p  1 √ Υ( T )  + Υ( T ) N T O p  ∆ h 2  E M λ  R R 2 R 1 K h ( X s ) ds  + Υ( T ) N T O p  1 √ Υ( T )  + Υ( T ) N T O p  ∆ h 2  . where K h ( · ) := 1 h d K  D x ( · ) h  . Since Υ( T ) N T con verges w eakly to a strictly p ositiv e Mittag-Leffler random v ariable in Lemma S.2.2 , with Lemma S.1.1 w e conclude that [Θ f ( T )] ˆ L ( x ) p − − → κ 2 , 0 π f ( x ) when T → ∞ , and hence the pro of. □ Pr o of of The or em S.4.1 . Denote K h ( · ) := 1 h d K  D x ( · ) h  . T ake a normal co ordinate c hart on B r ( x ). First, we proceed in the case that D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ R p . Plugging in Itˆ o’s formula ( S.9 ) that f ( X ( k +1)∆ ) − f ( X k ∆ ) = R ( k +1)∆ k ∆ µ f ( X t ) dt + R ( k +1)∆ k ∆ σ f ,α ( X t ) dW α t in to ( S.28 ), we hav e the bias and v ariance decomp osition of the error: E := ˆ µ f ( x ) − µ f ( x ) = P n − 1 k =1 K h ( X k ∆ ) R ( k +1)∆ k ∆ µ f ( X s ) ds ˆ L ( x ) − µ f ( x ) | {z } := B (S.37) + P n − 1 k =1 K h ( X k ∆ ) R ( k +1)∆ k ∆ σ f ,α ( X s ) dW α s ˆ L ( x ) | {z } := V . S.34 JACOB MCERLEAN AND HA U-TIENG WU W e start with V . Note that V = 1 h d/ 2 Θ f ( T ) ˆ L ( x ) , where Θ f ( T ) is defined in ( S.31 ). By ( S.33 ) of Lemma S.4.1 , q h d ˆ L (o) ( x ) V = Θ f ( T ) q ˆ L ( x ) d − − → N ( 0 , κ 2 , 0 π f ( x )) . (S.38) Next, consider B . F or i = 1 , · · · , p , by Lemma S.2.9 with N set to e ⊤ i µ f , w e obtain n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆ µ f ( X s ) ds = Z T 0 K h ( X s ) µ f ( X s ) ds + O p  ∆Υ( T ) h 2  . W e hav e a similar expression for ˆ L ( x ). Therefore, B = 1 Υ( T ) R T 0 K h ( X s ) ( µ f ( X s ) − µ f ( x )) ds + O p  ∆ h 2  1 Υ( T ) R T 0 K h ( X s ) ds + O p  ∆ h 2  = R T 0 K h ( X s ) ( µ f ( X s ) − µ f ( x )) ds R T 0 K h ( X s ) ds + O p  ∆ h 2  , where the first equality holds since µ f is b ounded by the drift assumption and the second equalit y comes from Lemma S.1.2 . W rite B = R T 0 K h ( X s ) ( µ f ( X s ) − µ f ( x )) ds R T 0 K h ( X s ) ds − B 0 | {z } := B 1 + B 0 + O p  ∆ h 2  , where B 0 = N T E M λ  R R 2 R 1 K h ( X s ) ( µ f ( X s ) − µ f ( x )) ds  N T E M λ  R R 2 R 1 K h ( X s ) ds  . (S.39) Analyzing B is thus reduced to controlling tw o terms. W e hav e h − 2 B 0 p − − → B µ, o f ( x ) when T → ∞ , w here B µ, o f ( x ) is defined as in ( S.30 ). T o see this claim, since ∆ h 2 → 0 b y the assumption h d Υ( T ) − → ∞ and ∆ h 2 p h d Υ( T ) − → 0, w e can apply exactly the same analysis as that of C in Lemma S.2.3 to b oth the denominator and numerator with Lemma S.2.1 Second, applying exactly the same analysis of A and B in Lemma S.2.3 , w e ha v e B 1 = o p ( h 2 ). As a result, since D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ R p , q h d ˆ L ( x )  B − h 2 B µ, o f ( x )  = o p  p h d +4 ˆ L  + q h d +4 ˆ L ( x )  h − 2 B 0 − B µ, o f ( x )  + O p  ∆ h 2 q h d ˆ L ( x )  = o p (1) , where the last c on trol comes from Slutsky’s theorem, and jointly from the assump- tions ∆ h 2 p h d Υ( T ) → 0, h d +4 Υ( T ) → C > 0, ˆ L ( x ) Υ( T ) d − − → g α (1) p X ( x ) by Lemma S.2.3 , and h − 2 B 0 p − − → B µ, o f ( x ) sho wn ab o v e. W e therefore obtain the claim q h d ˆ L ( x )  ˆ µ f ( x ) − µ f ( x ) − h 2 B µ, o f ( x )  d − − → N ( 0 , κ 2 , 0 π f ( x )) . □ S.35 Appendix S.5. Prep ara tion for the proof of Theorem 4.4 Theorem S.5.1. Supp ose Assumptions 1 , 2 , 3 , 4 , 5 , and 6 hold. Fix x ∈ M . Consider functions f , q ∈ C 3 ( M , R p ) with supp f ⊂ B r ( x ) and supp q ⊂ B r ( x ) for r < inj x ( M ) . F or a kernel function K and D x ( x ′ ) = ∥ ι ( x ) − ι ( x ′ ) ∥ R p , define ˆ π f ,q ( x ) := 1 ∆ P n − 1 k =0 K  D x ( X k ∆ ) h  ( f ( X ( k +1)∆ ) − f ( X k ∆ ))( q ( X ( k +1)∆ ) − q ( X k ∆ )) ⊤ P n − 1 k =0 K  D x ( X k ∆ ) h  and ˆ L ( x ) := ∆ h d n − 1 X k =0 K  D x ( X k ∆ ) h  . Denote π f ,q ( x ) := r X l =1 σ f ,α ( x ) σ q ,α ( x ) ⊤ ∈ R p × p , wher e σ f ,α ∈ R p is define d in ( S.9 ) , B π f ,q ( x ) := d X i =1 ∂ i π f ,q ( x ) ∂ i log p X ( x ) + 1 2 d X i =1 ∂ 2 i π f ,q ( x ) , (S.40) wher e ∂ i ar e the normal c o or dinate ve ctor fields c enter e d at x , and Ξ f ,q ( x ) := 1 2 ( π f ,f ( x ) ⊗ π q ,q ( x ) + π q ,q ( x ) ⊗ π f ,f ( x )) . With c onditions h d Υ( T ) ∆ − → ∞ , h d +4 Υ( T ) ∆ − → C , C > 0 , and h d − 4 Υ( T )∆ − → 0 , we have s h d ˆ L ( x ) ∆  ˆ π f ,q ( x ) − π f ,q ( x ) − h 2 B π f ,q ( x )  d − − → N ( 0 , κ 2 , 0 Ξ f ,g ( x )) . Note that ˆ π f ,q resem bles a kernel diffusion estimator, but is generalized to cap- ture the interaction betw een the functions f and q . When q = f , it estimates the p × p diffusion matrix of the pro cess f ( X t ) at f ( x ). Here, B π f ,q ( x ) quantifies the bias of the estimator. Before proving Theorem S.5.1 , we present some technical lemmas that streamline the argument. Lemma S.5.1. Assume assumptions in The or em S.5.1 hold. Consider f ∈ C ( M ) and q ∈ C 3 ( M , R p ) with supp f ⊂ B r ( x ) and supp q ⊂ B r ( x ) for r < inj x ( M ) . Denote A T := 1 h d n − 1 X k =0 K  D x ( X k ∆ ) h  Z ( k +1)∆ k ∆  Z s k ∆ f ( X t ) dt  σ q ,α ( X s ) dW α s B T := 1 h d n − 1 X k =0 K  D x ( X k ∆ ) h  ∆ f ( X k ∆ ) Z ( k +1)∆ k ∆ σ q ,α ( X s ) dW α s , wher e T = n ∆ . When T is sufficiently lar ge, we have A T p Υ( T ) = O p  ∆ h d/ 2  and B T p Υ( T ) = O p  ∆ h d/ 2  . = S.36 JACOB MCERLEAN AND HA U-TIENG WU Pr o of of L emma S.5.1 . With K h ( · ) := 1 h d K  D x ( · ) h  and Tr denoting matrix trace, w e ha ve [ A T ] = n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆  Z s k ∆ f ( X t ) dt  2 Tr π q ,q ( X s ) ds ≤ n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ Z ( k +1)∆ k ∆ f ( X t ) dt ! 2 Tr π q ,q ( X s ) ds ≤ ∆ 2 C 2 1 n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ Tr π q ,q ( X s ) ds = ∆ 2 C 2 1 h d [Θ q T ] , where C 1 > 0 is finite dep ending on the regularity assumption of f , and Θ q T is defined in ( S.31 ). By the same argumen t for [Θ q T ] in Lemma S.4.1 , we hav e [Θ q T ] = O p (Υ( T )) since our assumptions imply ∆ h 2 − → 0, and th us [ A T ] = O p  ∆ 2 Υ( T ) h d  , whic h leads to A T = O p  ∆ √ Υ( T ) h d/ 2  . F or B T , w e ha v e [ B T ] = ∆ 2 n − 1 X k =0 K h ( X k ∆ ) 2 f ( X k ∆ ) 2 Z ( k +1)∆ k ∆ π q ,q ( X s ) ds ≤ ∆ 2 C 2 1 h d [Θ q T ] , and hence the claim. □ No w, w e pro ve the main theorem of the section. Pr o of of The or em S.5.1 . W e start with introducing notation. Denote K h ( · ) := 1 h d K  D x ( · ) h  . F or k = 0 , 1 , . . . , n − 1, w e suppress the notation for the dep endence of Ω on k and define Ω( x ) := ( f ( x ) − f ( X k ∆ ))( q ( x ) − q ( X k ∆ )) ⊤ ∈ C 3 0 ( M , R p × p ) , whic h appears in the n umerator of ˆ π f ,q ( x ). W e use sup erscripts to index co ordi- nates. Sp ecifically , f a and q b denote the a -th and b -th components of f and q , resp ectiv ely , and Ω a,b denotes the ( a, b )-th entry of Ω, where a, b = 1 , · · · , p . W e w ork in normal co ordinates on B r ( x ). Since Ω( X k ∆ ) = 0, for s ∈ [ k ∆ , ( k + 1)∆], Itˆ o’s formula ( S.9 ) giv es Ω( X s ) = Z s k ∆  µ j + 1 2 r X l =1 σ i l ∂ i σ j l  ( X v )Ω j ( X v ) | {z } := M ( X v ) dv + Z s k ∆ ( σ j l Ω j )( X v ) | {z } := S l ( X v ) dW l v + Z s k ∆ 1 2 r X l =1 ( σ i l σ j l Ω i,j )( X v ) | {z } := H ( X v ) dv . (S.41) W e therefore hav e ˆ π f ,q ( x ) = ˆ Π 1 ( x ) + ˆ Π 2 ( x ) + ˆ Π 3 ( x ) , where ˆ Π i ( x ) := Γ i ( x ) ˆ L ( x ) (S.42) S.37 for i = 1 , 2 , 3, and Γ 1 ( x ) = n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆ M ( X s ) ds ∈ R p × p Γ 2 ( x ) = n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆ S α ( X s ) dW α ∈ R p × p (S.43) Γ 3 ( x ) = n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆ H ( X s ) ds ∈ R p × p . As w e will sho w below, asymptotically ˆ Π 1 ( x ) is negligible, ˆ Π 2 ( x ) goes to a Gaussian random matrix, and ˆ Π 3 ( x ) gives the targeting diffusion matrix. Note that with subscripts denoting differentiation in normal co ordinates, Ω a,b i ( x ) = f a i ( x )( q b ( x ) − q b ( X k ∆ )) + q b i ( x )( f a ( x ) − f a ( X k ∆ )) Ω a,b i,j ( x ) = f a i,j ( x )( q b ( x ) − q b ( X k ∆ )) (S.44) + q b i,j ( x )( f a ( x ) − f a ( X k ∆ )) + f a i ( x ) q b j ( x ) + f a j ( x ) q b i ( x ) . W e start with ˆ Π 3 ( x ). By plugging ( S.44 ) into H ( X s ), b y symmetry we obtain e ⊤ a H ( X s ) e b − π a,b f ,q ( X s ) = 1 2 r X α =1 ( σ i α σ j α )( X s ) ×  f a i,j ( X s )( q b ( X s ) − q b ( X k ∆ )) + q a i,j ( X s )( f b ( X s ) − f b ( X k ∆ ))  . Therefore, b y the same analysis of B as in ( S.37 ), for a, b = 1 , . . . , p , we get e ⊤ a ( ˆ Π 3 ( x ) − π f ,g ( x )) e b = h 2 e ⊤ a B π f ,g e b + O p  ∆ h 2  + o p ( h 2 ) . Hence, under the assumptions that h d Υ( T ) ∆ − → ∞ and h d − 4 Υ( T )∆ − → 0, we ha v e ˆ L ( x ) Υ( T ) d − − → g α (1) p X ( x ) b y Lemma S.2.3 , whic h com bined with the assumption that h d +4 Υ( T ) ∆ − → C leads to s h d ˆ L ( x ) ∆ e ⊤ a ( ˆ Π 3 ( x ) − π f ,g ( x ) − h 2 B π f ,g ) e b = o p (1) . Next, w e analyze ˆ Π 1 ( x ). When s ∈ [ k ∆ , ( k + 1)∆], by plugging the Itˆ o’s formula ( S.9 ), q b ( X s ) − q b ( X k ∆ ) = Z s k ∆ µ q b ( X t ) dt + Z s k ∆ σ q b ,α ( X t ) dW α t in to ( S.44 ), we ha ve a decomp osition Γ 1 ( x ) = Γ f ,µ 1 ( x ) + Γ f ,σ 1 ( x ) + Γ q ,µ 1 ( x ) + Γ q ,σ 1 ( x ), where e ⊤ a Γ f ,µ 1 ( x ) e b := n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆  Z s k ∆ µ q b ( X t ) dt  ζ a ( X s ) ds , e ⊤ a Γ f ,σ 1 ( x ) e b := n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆  Z s k ∆ σ q b ,α ( X t ) dW α t  ζ a ( X s ) ds , S.38 JACOB MCERLEAN AND HA U-TIENG WU ζ a :=  µ i + 1 2 P r α =1 σ ℓ α ∂ ℓ σ i α  f a i , and Γ q ,µ 1 ( x ) and Γ q ,σ 1 ( x ) are defined similarly . W e only analyze Γ f ,µ 1 ( x ) and Γ f ,σ 1 ( x ) since the analysis for the reversely defined quan tities is iden tical. By the regularity assumption, for all indices a, b we hav e | e ⊤ a Γ f ,µ 1 ( x ) e b | ≤ C ∆ 2 n − 1 X k =0 K h ( X k ∆ ) for some finite constan t C > 0 due to the regularity assumption. By Lemmas S.2.3 and S.2.7 , ∆ n − 1 X k =0 K h ( X k ∆ ) = O p (Υ( T )) + O p  ∆Υ( T ) h 2  and hence e ⊤ a Γ f ,µ 1 ( x ) e b = O p (∆Υ( T )) + O p  ∆ 2 Υ( T ) h 2  . The term Γ f ,σ 1 in volv es martingales and is more complicated. Using sto c hastic in tegration b y parts (Lemma S.1.4 ), we decomp ose e ⊤ a Γ f ,σ 1 e b = n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆ ζ a ( X s ) ds × Z ( k +1)∆ k ∆ σ q b ,α ( X s ) dW α s | {z } := e ⊤ a Γ f,σ 1 , 1 e b − n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆  Z s k ∆ ζ a ( X t ) dt  σ q b ,α ( X s ) dW α s | {z } := e ⊤ a Γ f,σ 1 , 2 e b . Lemma S.5.1 giv es e ⊤ a Γ f ,σ 1 , 2 e b = O p  ∆ √ Υ( T ) h d/ 2  . W e turn to e ⊤ a Γ f ,σ 1 , 1 e b , whic h inv olves k erneled summation of pro ducts of R ( k +1)∆ k ∆ ζ a ( X s ) ds and R ( k +1)∆ k ∆ σ q b ,α ( X s ) dW α s . W e estimate it by leveraging appro ximations at uniform discrete times: e ⊤ a Γ f ,σ 1 , 1 e b = ∆ n − 1 X k =0 K h ( X k ∆ ) ζ a ( X k ∆ ) × Z ( k +1)∆ k ∆ σ q b ,α ( X s ) dW α s | {z } := e ⊤ a ˆ Γ f,σ 1 , 1 e b + n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆ σ q b ,α ( X s ) dW α s Z ( k +1)∆ k ∆ ( ζ a ( X s ) − ζ a ( X k ∆ )) ds | {z } := e ⊤ a ( Γ f,σ 1 , 1 − ˆ Γ f,σ 1 , 1 ) e b . Applying Lemma S.5.1 again, we obtain e ⊤ a ˆ Γ f ,σ 1 , 1 e b = O p  ∆ √ Υ( T ) h d/ 2  . W e contin ue to con trol e ⊤ a  Γ f ,σ 1 , 1 − ˆ Γ f ,σ 1 , 1  e b . Note that this term is a k ernel summation of pro ducts of a martingale term and a discrete-time approximation, which needs a treatmen t differen t from all those ab o ve. Recall that F ∆ k is the sub σ -algebra of the filtration generated by the process X t up to time ∆ k . By the linearit y of exp ectation and S.39 the to w er prop ert y , we hav e E M λ    e ⊤ a  Γ f ,σ 1 , 1 − ˆ Γ f ,σ 1 , 1  e b    ≤ n − 1 X k =0 E M λ  K h ( X k ∆ )     Z ( k +1)∆ k ∆ σ q b ,α ( X s ) dW α s Z ( k +1)∆ k ∆ ( ζ a ( X s ) − ζ a ( X k ∆ )) ds      = n − 1 X k =0 E M λ K h ( X k ∆ ) E M λ      Z ( k +1)∆ k ∆ σ q b ,α ( X s ) dW α s         Z ( k +1)∆ k ∆ ( ζ a ( X s ) − ζ a ( X k ∆ )) ds          F ∆ k  ! . Then, apply the Cauch y-Sc h w arz inequalit y to b ound the inner exp ectation by E M λ      Z ( k +1)∆ k ∆ ( ζ a ( X s ) − ζ a ( X k ∆ )) ds     2      F ∆ k  | {z } := I 1 ! 1 / 2 × E M λ      Z ( k +1)∆ k ∆ σ q b ,α ( X s ) dW α s     2      F ∆ k  | {z } := I 2 ! 1 / 2 . First, w e use the trivial b ound I 2 ≤ E M λ  sup s ∈ [ k ∆ , ( k +1)∆]     Z s k ∆ σ q b ,α ( X s ) dW α s     2      F ∆ k  ≤ C 1 E ([ M k ∆ ]) , where the second b ound is via the BDG inequality in Theorem S.1.7 with p = 1 and M k t := Z k ∆+ t k ∆ σ q b ,α ( X s ) dW α s for t ∈ [0 , ∆]. Then, since [ M k ∆ ] = Z ( k +1)∆ k ∆ r X α =1 σ q b ,α ( X s ) 2 ds ≤ C 2 ∆ , where C 2 > 0 is finite by the regularity assumption, we obtain I 2 ≤ C 1 C 2 ∆. With Itˆ o’s formula ( S.9 ), ζ a ( X s ) − ζ a ( X k ∆ ) = Z s k ∆ µ ζ a ( X t ) dt + Z s k ∆ σ ζ a ,α ( X t ) dW α t , when s ∈ [ k ∆ , ( k + 1)∆], the b ound ( a + b ) 2 ≤ 2( a 2 + b 2 ) giv es the control of I 1 : I 1 ≤ 2 E M λ  ∆ Z ( k +1)∆ k ∆  Z s k ∆ µ ζ a ( X t ) dt  2 ds     F k ∆  | {z } := I 1 , 1 + 2 E M λ  ∆ Z ( k +1)∆ k ∆  Z s k ∆ σ ζ a ,α ( X t ) dW α t  2 ds     F k ∆  | {z } := I 1 , 2 S.40 JACOB MCERLEAN AND HA U-TIENG WU By the Cauch y-Sc h warz inequality again, I 1 , 1 ≤ E M λ  ∆ Z ( k +1)∆ k ∆ ( s − k ∆) Z s k ∆ µ ζ a ( X t ) 2 dt ds     F k ∆  ≤ E M λ  ∆ 2 Z ( k +1)∆ k ∆ Z ( k +1)∆ k ∆ µ ζ a ( X t ) 2 dt ds     F k ∆  ≤ C 3 ∆ 4 , where C 3 > 0 is finite due to the regularity assumption. W e turn to I 1 , 2 . By inv ok- ing the BDG inequality (Theorem S.1.7 ) for the pro cess M k t := R k ∆+ t k ∆ σ ζ a ,α ( X s ) dW α s and setting s ∗ ∈ argmax s ∈ [ k ∆ , ( k +1)∆) ( M k s − k ∆ ) 2 , w e b ound I 1 , 2 ≤ E M λ  ∆ Z ( k +1)∆ k ∆ ( M k s ∗ − k ∆ ) 2 ds     F k  ≤ C 1 E M λ  ∆ Z ( k +1)∆ k ∆  Z ( k +1)∆ k ∆ C 4 dt  ds     F k  = C 1 C 4 ∆ 3 , where the first b ound is trivial b y definition of s ∗ , and the second b ound arises f rom the BDG inequality applied with p = 1 and |S a α ( X t ) | 2 ≤ C 4 for a finite constant C 4 > 0 by the regularity assumption. W e thus hav e I 1 = O (∆ 3 ). As a result,    E M λ  e ⊤ a  Γ f ,σ 1 , 1 − ˆ Γ f ,σ 1 , 1  e b     ≤ C 1 p C 2 C 4 n − 1 X k =0 E M λ  K h ( X k ∆ )∆ 2  = C 1 p C 2 C 4 ∆ E M λ ( ˆ L ( x )) = O (∆Υ( T )) , where the last equality comes from Lemmas S.2.3 and S.2.7 , and hence e ⊤ a  Γ f ,σ 1 , 1 − ˆ Γ f ,σ 1 , 1  e b = O p (∆Υ( T )) . (S.45) By the assumptions h d Υ( T ) ∆ − → ∞ and h d − 4 Υ( T )∆ − → 0, we hav e ∆ h 2 − → 0. Com bin- ing the analyses, we conclude e ⊤ a Γ 1 e b = O p (∆Υ( T )) + O p ∆ p Υ( T ) h d/ 2 ! . By assumptions h d − 4 Υ( T )∆ → 0 and ∆ − → 0, with ( S.27 ) we conclude s h d ˆ L ( x ) ∆ e ⊤ a ˆ Π 1 e b = s h d ˆ L ( x )∆ e ⊤ a Γ 1 e b = O p q h d Υ( T )∆ s Υ( T ) ˆ L ( x ) ! + O p ∆ 1 / 2 s Υ( T ) ˆ L ( x ) ! = o p (1) . S.41 It remains to analyze ˆ Π 2 ( x ). By applying Itˆ o’s formula to q b ( X s ) − q b ( X k ∆ ) and f b ( X s ) − f b ( X k ∆ ), w e rewrite Γ 2 ( x ) = Γ q ,µ 2 ( x ) + Γ q ,σ 2 ( x ) + Γ f ,µ 2 ( x ) + Γ f ,σ 2 ( x ), where e ⊤ a Γ q ,µ 2 e b := n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆  Z s k ∆ µ q b ( X t ) dt  σ f a ,α ( X s ) dW α s , e ⊤ a Γ q ,σ 2 e b := n − 1 X k =0 K h ( X k ∆ ) Z ( k +1)∆ k ∆  Z s k ∆ σ q b ,β ( X t ) dW β t  σ f a ,α ( X s ) dW α s , and Γ f ,µ 2 ( x ) and Γ f ,σ 2 ( x ) are defined similarly . As we did for Γ 1 , we only analyze Γ q ,µ 2 ( x ) and Γ q ,σ 2 ( x ) as the analysis of the reverse scenario is identical. By Lemma S.5.1 , w e ha v e e ⊤ a Γ q ,µ 2 e b = O p  ∆ √ Υ( T ) h d/ 2  . So, h d/ 2 q ˆ L (o) ( x )∆ e ⊤ a Γ q ,µ 2 e b = O p s ∆Υ( T ) ˆ L ( x ) ! = o p (1) , (S.46) where we use ( S.27 ) and ∆ → 0 in the last equalit y . The analysis of Γ q ,σ 2 , as a locally square in tegrable lo cal martingale, is more inv olv ed due to the double sto chastic in tegration. Denote M b,k ( t ) := Z k ∆+ t k ∆ σ q b ,α ( X s ) dW α s for t ∈ [0 , ∆). The quadratic v ariation of e ⊤ a Γ q ,σ 2 e b b ecomes  e ⊤ a Γ q ,σ 2 e b  = n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ M b,k ( t − k ∆) 2 π a,a f ( x )( X t ) dt , where we use π f := π f ,f to simplify the notation. By the Do ob-Mey er decomp osi- tion w e ha v e M b,k ( s − k ∆) 2 = Z s k ∆ π b,b q ( X t ) dt + 2 Z s k ∆  Z t k ∆ σ q b ,γ ( X u ) dW γ u  σ q b ,β ( X t ) dW β t , and hence  e ⊤ a Γ q ,σ 2 e b  = n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆  Z s k ∆ π b,b q ( X t ) dt  π a,a f ( X s ) ds | {z } := e ⊤ a A e b + n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ Z s k ∆ M b,k ( t − k ∆) σ q b ,β ( X t ) dW β t π a,a f ( X s ) ds | {z } := e ⊤ a Z e b . S.42 JACOB MCERLEAN AND HA U-TIENG WU By the sto chastic F ubini Theorem (Lemma S.1.4 ), e ⊤ a Z e b b ecomes 2 n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ M b,k ( t − k ∆) σ q b ,β ( X t ) dW β t Z ( k +1)∆ k ∆ π a,a f ( X s ) ds | {z } e ⊤ a Z 1 e b − 2 n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ M b,k ( s − k ∆) σ q b ,β ( X s ) Z s k ∆ π a,a f ( X t ) dtdW β s | {z } e ⊤ a Z 2 e b . The quadratic v ariation of e ⊤ a Z 2 e b can be directly controlled by  h d ∆ e ⊤ a Z 2 e b  = 4 h 2 d ∆ 2 n − 1 X k =0 K h ( X k ∆ ) 4 Z ( k +1)∆ k ∆  Z s k ∆ π a,a f ( X t ) dt  2 M b,k ( s − k ∆) 2 π b,b q ( X s ) ds ≤ 4 C 3 h 2 d n − 1 X k =0 K h ( X k ∆ ) 4 Z ( k +1)∆ k ∆ M b,k ( s − k ∆) 2 ds , where in the last inequality we use max {| π a,b f ( X t ) | , | π a,b q ( X t ) |} ≤ C , where C > 0 is finite by the regularity assumptions on f , q, µ and σ . W e now apply the BDG inequalit y similar to the control of I 1 , 2 ab o v e, and obtain E M λ  h d ∆ e ⊤ a Z 2 e b  ≤ 4 C 3 h d E M λ n − 1 X k =0 1 h d K  D x ( X k ∆ ) h  4 Z ( k +1)∆ k ∆ E M λ  M b,k ( s ∗ − k ∆) 2   F k  ds ! ≤ 4 C 3 C 1 ∆ h d E M λ n − 1 X k =0 1 h d K  D x ( X k ∆ ) h  4 Z ( k +1)∆ k ∆ π b,b q ( X u ) du ! , where the term inside the expectation is the same as [Θ q b ( T )], where Θ q b ( T ) is defined in ( S.31 ) with K 2 . By Lemma S.4.1 , [Θ q b ( T )] = O (Υ( T )). Therefore, w e ha ve E M λ h h d ∆ e ⊤ a Z 2 e b i = O  ∆Υ( T ) h d  , and hence h d ∆ e ⊤ a Z 2 e b = O p p ∆Υ( T ) h d/ 2 ! . (S.47) S.43 Next, w e con trol e ⊤ a Z 1 e b b y appro ximation at uniform discrete times: e ⊤ a Z 1 e b = 2 n − 1 X k =0 K h ( X k ∆ ) 2 ∆ π a,a f ( X k ∆ ) Z ( k +1)∆ k ∆  Z t k ∆ σ g b ,γ ( X u ) dW γ u  σ g b ,β ( X t ) dW β t | {z } e ⊤ a ˆ Z 1 e b + 2 n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆  π a,a f ( X s ) − π a,a f ( X k ∆ )  ds × Z ( k +1)∆ k ∆  Z t k ∆ σ g b ,γ ( X u ) dW γ u  σ g b ,β ( Y t ) dW β t | {z } e ⊤ a ( Z 1 − ˆ Z 1 ) e b . Just as for the computations leading to ( S.47 ), we can compute h d ∆ e ⊤ a ˆ Z 1 e b = O p  √ ∆Υ( T ) h d/ 2  . The analysis of e ⊤ a ( Z 1 − ˆ Z 1 ) e b is similar to that of e ⊤ a (Γ f ,σ 1 , 1 − ˆ Γ f ,σ 1 , 1 ) e b ab o v e in ( S.45 ), except the quadratic v ariation of the former contains one less ∆ factor. Hence, we achiev e e ⊤ a ( Z 1 − ˆ Z 1 ) e b = O p (∆ 1 / 2 Υ( T )). W e thus hav e h d ∆ e ⊤ a Z e b = O p p ∆Υ( T ) h d/ 2 ! + O p (∆ 1 / 2 Υ( T )) . (S.48) W e pro ceed to examine A . By again appro ximating the pro cess at uniform discrete times, π b,b q ( X t ) π a,a f ( X s ) =  π b,b q ( X t ) − π b,b q ( X k ∆ )  π a,a f ( X s ) + π b,b q ( X k ∆ )  π a,a f ( X s ) − π a,a f ( X k ∆ )  + π b,b q ( X k ∆ ) π a,a f ( X k ∆ ) , e ⊤ a A e b is decomposed into n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ Z s k ∆  π b,b q ( X t ) − π b,b q ( X k ∆ )  π a,a f ( X s ) dtds | {z } := A 1 + n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ Z s k ∆ π b,b q ( X k ∆ )  π a,a f ( X s ) − π a,a f ( X k ∆ )  dtds | {z } := A 2 + n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ Z s k ∆ π b,b q ( X k ∆ ) π a,a f ( X k ∆ ) dtds | {z } := A q 3 . S.44 JACOB MCERLEAN AND HA U-TIENG WU By the same application of Itˆ o’s Lemma (e.g., the treatmen t of [ Z t ] − d [ Z t ] in Lemma S.2.10 ), w e obtain h d ∆Υ( T ) A 1 = O p (∆ 1 / 2 ) + O p  s ∆ h d Υ( T )  , h d ∆Υ( T ) A 2 = O p (∆ 1 / 2 ) + O p  s ∆ h d Υ( T )  . W e thus hav e h d ∆Υ( T ) e ⊤ a ( A 1 + A 2 ) e b = O p (∆ 1 / 2 ) + O p  s ∆ h d Υ( T )  b y assumption. The nontrivial term in the analysis of ˆ Π 2 is A q 3 . Note that when w e analyze Γ f ,σ 2 , there exists a similar term, denoted lik ewise as A f 3 , whic h is also non trivial. W e hav e h d ∆Υ( T ) ( A q 3 + A f 3 ) = ∆ h d Υ( T ) n − 1 X k =0 K  D x ( X k ∆ ) h  2 1 2 ( π b,b q ( X k ∆ ) π a,a f ( X k ∆ ) + π b,b q ( X k ∆ ) π a,a q ( X k ∆ )) d − − → 1 2 κ 2 , 0 g α (1) p X ( x )( π b,b q ( x ) π a,a f ( x ) + π b,b q ( x ) π a,a q ( x )) , where the conv ergence comes from Lemmas S.2.3 and S.2.7 . With ( S.48 ),  h d/ 2 ∆ 1 / 2 e ⊤ a (Γ q ,σ 2 + Γ f ,σ 2 ) e b  Υ( T ) = h d ∆Υ( T ) ( A q 3 + A f 3 ) + o p (1) d − − → κ 2 , 0 g α (1) p X ( x ) 1 2 ( π b,b q ( x ) π a,a f ( x ) + π b,b q ( x ) π a,a q ( x )) . T o obtain the desired w eak conv ergence to the normal distribution, we need to ev aluate cross quadratic v ariation. W e sketc h the key steps here without details since they are the same as the ab o v e quadratic v ariation terms. The cross quadratic v ariation of  e ⊤ a Γ q ,σ 2 e b , e ⊤ c Γ q ,σ 2 e d  is n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆ M b,k ( t − k ∆) M d,k ( t − k ∆) π a,c f ( x )( X t ) dt . With the p olarization M b,k M d,k = 1 4 (( M b,k + M d,k ) 2 − ( M b,k − M d,k ) 2 ), we can apply the Do ob-Meyer decomp osition. The asso ciated A part of the term inv olving ( M b,k + M d,k ) 2 is n − 1 X k =0 K h ( X k ∆ ) 2 Z ( k +1)∆ k ∆  Z s k ∆ ( π b,b q + π d,d q + 2 π b,d q )( X t ) dt  π a,c f ( X s ) ds S.45 and hence the asso ciated h d ∆Υ( T ) ( A q 3 + A f 3 ) part of the term inv olving M b,k M d,k is ∆ h d Υ( T ) n − 1 X k =0 K  D x ( X k ∆ ) h  2 1 2 ( π b,d q ( X k ∆ ) π a,c f ( X k ∆ ) + π b,d f ( X k ∆ ) π a,c q ( X k ∆ )) d − − → κ 2 , 0 g α (1) p X ( x ) 1 2 ( π b,d q ( x ) π a,c f ( x ) + π b,d f ( x ) π a,c q ( x )) . Recall the index routine that for a random matrix M ∈ R p × p with cov (vec( M )) = A ⊗ B ∈ R p 2 × p 2 , where A and B are non-negative definite, the ( i, j )-th entry of cov (v ec( M )) is A a,b B c,d , where i = ( b − 1) p + a and j = ( d − 1) + c . With these preparation, by Prop osition S.1.6 and the same argument as ( S.36 ) in Theorem S.4.1 with Θ f replaced b y h d ∆ e ⊤ a Γ q ,σ 2 e b and K replaced by K 2 , w e ha v e h d/ 2 ∆ 1 / 2 v ec(Γ q ,σ 2 + Γ f ,σ 2 ) q ˆ L ( x ) d − − → N  0 , κ 2 , 0 2 ( π f ,f ( x ) ⊗ π q ,q ( x ) + π q ,q ( x ) ⊗ π f ,f ( x ))  . With ( S.46 ) and Slutsky’s theorem, we obtain s h d ˆ L ( x ) ∆ ˆ Π 2 ( x ) = h d/ 2 q ˆ L ( x )∆ (Γ q ,σ 2 + Γ f ,σ 2 ) + h d/ 2 q ˆ L ( x )∆ (Γ q ,µ 2 + Γ f ,µ 2 ) = h d/ 2 q ˆ L ( x )∆ (Γ q ,σ 2 + Γ f ,σ 2 ) + o p (1) d − − → N  0 , κ 2 , 0 2 ( π f ,f ( x ) ⊗ π q ,q ( x ) + π q ,q ( x ) ⊗ π f ,f ( x ))  . By putting all the ab o ve together and introducing vec h, Slutsky’s theorem giv es the desired conv ergence: s h d ˆ L ( x ) ∆  ˆ π f ,g ( x ) − π f ,g ( x ) − h 2 B π f ,g ( x )  = s h d ˆ L ( x ) ∆ h ˆ Π 1 ( x ) + ˆ Π 2 ( x ) + ( ˆ Π 3 ( x ) − π f ,g ( x ) − h 2 B π f ,g ( x )) i = s h d ˆ L ( x ) ∆ ˆ Π 2 ( x ) + o p (1) d − − → N ( 0 , κ 2 , 0 Ξ f ,g ( x )) . □ T o analyze the drift estimators, we require the following corollary , which will also b e used in establishing tangent space estimation. Note that the relationship among h , Υ( T ), and ∆ differs from that in Theorem S.5.1 b y a factor of ∆. Corollary S.5.2. Supp ose the assumptions in The or em S.5.1 hold, but r eplac e assumptions h d Υ( T ) ∆ − → ∞ , h d +4 Υ( T ) ∆ − → C , C > 0 , and h d − 4 Υ( T )∆ − → 0 by h d Υ( T ) − → ∞ , h d +4 Υ( T ) − → C for C > 0 , and h d − 4 Υ( T )∆ 2 − → 0 . Then, q h d ˆ L (o) ( x ) ve ch  ˆ π f ,g ( x ) − π f ,g ( x ) − h 2 B π f ,g ( x )  p − − → 0 ∈ R p . Pr o of. Since the assumptions h d Υ( T ) − → ∞ , h d +4 Υ( T ) − → C , and h d − 4 Υ( T )∆ 2 − → 0 differ from those in Theorem S.5.1 b y a factor of ∆, we scaled up ˆ π f ,g ( x ) − π f ,g ( x ) − h 2 B π f ,g ( x ) by a factor of √ ∆. The error analysis is identical to that in the pro of of S.46 JACOB MCERLEAN AND HA U-TIENG WU Theorem S.5.1 , while we now ha v e the dominan t term q h d ˆ L (o) ( x ) ˆ Π 2 ( x ) = o p (1) since ∆ = o (1). Thus, we obtain the conv ergence to 0 in probability . □ Appendix S.6. Proofs of Main Theorems S.6.1. Pro of of Theorem 4.4 ab out diffusion estimator. T he pro of follows immediately by an application of Theorem S.5.1 to the map f = q := ι : M → R p . By definition, ˆ π ( o ) ( x ) = ˆ π ι,ι ( x ). Next, note that σ ι,l ( x ) defined in ( S.9 ) is a directional deriv ative of ι so that σ ι,l = σ l ι = σ k l ∂ k ι j e j , where e j is the standard orthonormal basis of R q , and the pushforwarded v ector satisfies ι ∗ σ l = σ k l ∂ k ι j e j , so w e hav e σ ι,l ( x ) = ι ∗ σ l ( ι ( x )). This leads to π ι,ι ( x ) ⊗ π ι,ι ( x ) = π ( o ) ( x ) ⊗ π ( o ) ( x ) and hence Ξ ι,ι ( x ) = Ξ ( o ) ( x ). Therefore, w e ha v e π ( o ) ( x ) = π ι,ι ( x ). Also, b y definition in ( 17 ), b ( o ) π ( x ) = B π f ,f ( x ). With all the ab ov e, Theorem S.5.1 gives the desired result. S.6.2. Pro of of Theorem 4.5 ab out tangent space estimation. Fix x ∈ M . W e will apply Corollary S.5.2 to the map f = q := ι : M → R p . In this setup, as discussed in Section S.6.1 , ˆ π ( o ) ( x ) = ˆ π ι,ι ( x ) and π ( o ) ( x ) = π ι,ι ( x ). Cho ose ambien t co ordinates around ι ( x ) so that π ( o ) ( x ) is nonzero only on the left upp er d × d matrix. Denote the eigenv alue decomp osition π ( o ) ( x ) = U D U ⊤ , where U ∈ O ( p ), D = diag ( λ 1 , . . . , λ d , 0 , . . . , 0) ∈ R p × p and λ 1 ≥ . . . ≥ λ d . By construction, the first d eigenv ectors of π ( o ) ( x ), denoted as, form an orthonormal basic of ι ∗ T x M . By Corollary S.5.2 , we hav e ˆ π ( o ) ( x ) = π ( o ) ( x ) + h 2 b ( o ) π ( x ) ∈ R p × p , where b ( o ) π ( x ) ∈ R p × p is a symmetric matrix corresp onding to the bias of the dif- fusion estimator at x , with b ( o ) π ( x ) = O (1) + o p  1 √ h d +4 ˆ L ( o ) ( x )  . Note that b y the assumption h d +4 Υ( x ) − → C and ( S.27 ), asymptotically h d +4 ˆ L ( o ) ( x ) conv erges in distribution to a random v ariable when T → ∞ . Th us, o p  1 √ h d +4 ˆ L ( o ) ( x )  = o p (1) b y Slutsky’s theorem. Also, we can replace o p  1 √ h d +4 ˆ L ( o ) ( x )  b y o p  1 √ h d +4 Υ( T )  . Clearly , b ( o ) π ( x ) = O p (1). Denote the eigendecomp osition of ˆ π ( o ) ( x ) = ˆ U ˆ D ˆ U ⊤ , where ˆ U ∈ O ( p ) (the space of p × p orthogonal matrices) and ˆ D are diagonal with eigenv alues ordered non- increasingly . Denote ˆ U d and U d to b e the dominant d columns of ˆ U and U . Using the perturbation technique [ 51 ], asymptotically when h → 0, we hav e ˆ P x := ˆ U d ˆ U ⊤ d = U d U ⊤ d + h 2 b ( tan ) ( x ) = P x + h 2 b ( tan ) ( x ) for some p × p symmetric matrix b ( tan ) ( x ), where b ( tan ) ( x ) = O (1)+ o p  1 √ h d +4 Υ( T )  . W e thus conclude the theorem. S.6.3. Pro of of Theorem 4.6 ab out drift estimator. Pr o of. Analogous to proving Theorem 4.4 b y applying Theorem S.5.1 , w e prov e Theorem 4.6 b y applying Theorem S.4.1 , with the additional consideration of the pro jection ˆ P x of ( 8 ). With f = ι : M → R p , D x ( x ′ ) := ∥ ι ( x ) − ι ( x ′ ) ∥ R p , w e obtain q h d ˆ L ( o ) ( x )  ˆ µ ι ( x ) − µ ι ( x ) − h 2 B µ, o ι ( x )  d − − → N ( 0 , κ 2 , 0 π ( o ) ( x )) (S.49) S.47 b y Theorem S.4.1 , where B µ, o ι is defined in ( S.30 ). F ollowing the same argu- men t as that in proving Theorem 4.4 , we hav e µ ι ( x ) := ι ∗ µ ( ι ( x )), and hence P x µ ι ( x ) = µ ( o ) ( x ). Recall that ˆ µ ( o ) ( x ) = ˆ P x ˆ µ ι ( x ). How ever, the targeting bias term b ( o ) µ ( x )  = P x B µ, o ι ( x ), but rather b ( o ) µ ( x ) = P x B µ, o ι ( x ) − b ( tan ) ( x ) µ ι ( x ) ( 20 ), as the drift estimator bias also contains the bias term induced by the tangent space estimation. By Theorem 4.5 , we ha ve ˆ P x = P x + h 2 b ( tan ) ( x ), where b ( tan ) ( x ) = b ( tan ) ( x ) + ϵ ( tan ) ( x ) with b ( tan ) ( x ) = O (1) and ϵ ( tan ) ( x ) = o p  1 √ h d +4 Υ( T )  . Putting all to- gether, write ˆ µ ( o ) ( x ) − µ ( o ) ( x ) − h 2 b ( o ) µ ( x ) = P x ( ˆ µ ι ( x ) − µ ι ( x ) − h 2 B µ ι ( x )) + ( P x − ˆ P x ) ˆ µ ι ( x ) + h 2 b ( tan ) ( x ) µ ι ( x ) = P x ( µ ι ( x ) − ˆ µ ι ( x ) − h 2 B µ ι ( x )) (S.50) − h 2 b ( tan ) ( x )( ˆ µ ι ( x ) − µ ι ( x )) − h 2 ϵ ( tan ) µ ι ( x ) . (S.51) By ( S.49 ) and the con tin uous mapping theorem, ( S.50 ) after normalization b ecomes q h d ˆ L ( o ) ( x ) P x ( µ ι ( x ) − ˆ µ ι ( x ) − h 2 B µ ι ( x )) d − − → N ( 0 , κ 2 , 0 π ( o ) ( x )) , where w e use the fact that P x π ( o ) ( x ) P ⊤ x = π ( o ) ( x ). The first term of ( S.51 ) after normalization becomes q h d +4 ˆ L ( o ) ( x ) b ( tan ) ( x )( ˆ µ ι ( x ) − µ ι ( x )) = o p (1) b y using ( S.49 ) and the fact that b ( tan ) ( x ) = O (1). The second term of ( S.51 ) after normalization becomes q h d +4 ˆ L ( o ) ( x ) ϵ ( tan ) ( x ) µ ι ( x ) = o p   s ˆ L ( o ) ( x ) Υ( T )   = o p (1) , where the last equality comes from ( S.27 ). By combining all the ab ov e con trols, w e obtain the desired conclusion. □ Appendix S.7. More det ails on numerical simula tion S.48 JACOB MCERLEAN AND HA U-TIENG WU Figure S.7. T op to b ottom ro w: ˆ L ( o ) on ellipsoids with eccen- tricities (1 , 1 , 1), (1 . 5 , 1 , 1), and (2 , 1 . 5 , 1) ev aluated from a single long tra jectory with differen t n i and ∆ = 10 − 2 , where log 10 ( n i ) ∈ { 4 , 5 , 6 , 7 , 8 } , from left to right column. Bins are colored b y the p ercen tile of their densit y across all eccentricities (within each col- umn) for a fixed T . This enables meaningful visual comparison across plots sharing the same T v alue. Figure S.8. F rom left to right: visualizations of ˆ µ E ( x ) , ˆ µ ( o ) ( x ) , and P x ˆ µ E ( x ), where P x is the pro jection op erator on to the tan- gen t space T x M , for base-point samples x drawn uniformly from a spherical cap centered at (1 , 0 , 0) ⊤ and observ ed on ellipsoids with eccen tricity (1 , 1 , 1), shown from tw o viewing angles. The ground- truth drift vector is sup erimp osed as blue arro ws. S.49 Figure S.9. F rom left to right: visualizations of ˆ µ E ( x ) , ˆ µ ( o ) ( x ) , and P x ˆ µ E ( x ), where P x is the pro jection op erator on to the tan- gen t space T x M , for base-p oint samples x drawn uniformly from a spherical cap centered at (1 , 0 , 0) ⊤ and observ ed on ellipsoids with eccentricit y (1 . 5 , 1 , 1), sho wn from tw o viewing angles. The ground-truth drift vector is sup erimp osed as blue arro ws. S.50 JACOB MCERLEAN AND HA U-TIENG WU Figure S.10. Histograms asso ciated with T able 1 . Left: means and standard deviations of NRMSE of v arious drift estimators at x where ∥ µ ( o ) ( x ) ∥ / ∥ µ ( o ) ∥ ∞ > 0 . 05. Right: means and stan- dard deviations of RMSE of v arious estimators when x where ∥ µ ( o ) ( x ) ∥ / ∥ µ ( o ) ∥ ∞ ≤ 0 . 05. S.51 Figure S.11. Histograms asso ciated with the error of π ( o ) in T able 1 . Figure S.12. QQ plots corresp onding to the drift estimation er- rors shown in Figure 3 , assessing agreement with the theoretical Gaussian reference; the normal comp onent after estimated pro jec- tion to tangent space. S.52 JACOB MCERLEAN AND HA U-TIENG WU Figure S.13. QQ plots corresp onding to the vector field co ordi- nate ( i, j ) errors for ˆ π ( o ) ( x ) of Figure 4 . Error t yp e ˆ µ E ˆ µ ( o ) P x ˆ µ E ˆ π ( o ) ∥ ˆ µ ( x ) − µ ( o ) ( x ) ∥ ∥ µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 1 . 530 ± 1 . 120 0 . 259 ± 0 . 247 0 . 252 ± 0 . 244 — |∥ ˆ µ ( x ) ∥−∥ µ ( o ) ( x ) ∥| ∥ µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 0 . 856 ± 1 . 050 0 . 162 ± 0 . 184 0 . 157 ± 0 . 179 — Θ( ˆ µ ( x ) , µ ( o ) ( x )) ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 0 . 936 ± 0 . 191 0 . 181 ± 0 . 251 0 . 177 ± 0 . 252 — ∥ ˆ µ ( x ) − µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ < 0 . 05) 0 . 984 ± 0 . 014 0 . 117 ± 0 . 053 0 . 114 ± 0 . 051 — ∥ ˆ π ( x ) − π ( o ) ( x ) ∥ F ∥ π ∥ F — — — 0 . 038 ± 0 . 007 ∥ sin Θ 2 ∥ F — — — 0 . 017 ± 0 . 006 ∥ ˆ µ ( x ) − µ ( o ) ( x ) ∥ ∥ µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 1 . 410 ± 0 . 976 0 . 257 ± 0 . 267 0 . 250 ± 0 . 264 — |∥ ˆ µ ( x ) ∥−∥ µ ( o ) ( x ) ∥| ∥ µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 0 . 764 ± 0 . 883 0 . 174 ± 0 . 209 0 . 170 ± 0 . 205 — Θ( ˆ µ ( x ) , µ ( o ) ( x )) ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ ≥ 0 . 05) 0 . 887 ± 0 . 25 0 . 165 ± 0 . 248 0 . 161 ± 0 . 249 ∥ ˆ µ ( x ) − µ ( o ) ( x ) ∥ ( ∥ µ ( o ) ( x ) ∥ ∥ µ ( o ) ∥ ∞ < 0 . 05) 0 . 838 ± 0 . 015 0 . 121 ± 0 . 049 0 . 119 ± 0 . 049 — ∥ ˆ π ( x ) − π ( o ) ( x ) ∥ F ∥ π ∥ F — — — 0 . 039 ± 0 . 012 ∥ sin Θ 2 ∥ F — — — 0 . 016 ± 0 . 009 T able S.3. Same as T able 1 ab out summary of v arious ev aluation metrics, but with eccentricities (1 , 1 , 1) (top blo ck) and (1 . 5 , 1 , 1) (b ottom blo ck). ˆ µ ( x ) is the estimator of µ ( o ) ( x ), whic h can be ˆ µ E , ˆ µ ( o ) , or P x ˆ µ E , where P x is the pro jection to T x M , listed in the top. ˆ π ( x ) is the estimator of π ( o ) ( x ), whic h is ˆ π ( o ) ( x ). Θ( µ ( o ) , ˆ µ ) is the angle b etw een µ ( o ) and ˆ µ with the unit radian. ∥ sin Θ ∥ F is the subspace distance b etw een the dominant 2D eigenspaces of ˆ π ( o ) and π ( o ) . S.53 Figure S.14. A single long tra jectory of length n = 10 8 with time-step ∆ = 10 − 2 is run on the parameter space N = [0 , 2 π ) × [0 , 2 π ) and mapp ed to the Klein Bottle in R 4 , where the occupation densit y ˆ L ( o ) is plotted in N , where log 10 ( n i ) ∈ { 4 , 5 , 6 , 7 , 8 } , from left to righ t column. In each image, bins are colored according to the percentile of their density v alues within that plot. Figure S.15. The drift vector field µ ( u, v ) = (1 + 1 2 cos  u 2  sin( v ) , 1 2 sin (2 v )) ⊤ on N = [0 , 2 π ) × [0 , 2 π ) used to gen- erate the dynamics on the Klein b ottle. S.54 JACOB MCERLEAN AND HA U-TIENG WU Figure S.16. T op: Histogram of NRMSE of v arious estimators on the Klein Bottle. Bottom: Histogram of NRMSE of diffusion estimator. S.55 b ( o ) µ bias of observed drift vector estimator b ( o ) π bias of observed diffusion matrix estimator C X pro cess sp ecific constant D ( R + , R ) the Sk oroho d space with Borel σ -algebra and canonical filtration d dimension of the manifold ∆ sampling p erio d D x a distance-lik e function at x ∈ M e ( X ) explosion time of manifold-v alued diffusion E x exp ectation conditional on X 0 = x Γ Gamma function h bandwidth I , I p × p the iden tit y matrix (of size p × p ) ι an em b edding of M in to R p 1 A the indicator function of the set A K a kernel function κ p,q k ernel-dep enden t constan t for p ∈ N , q ∈ { 0 } ∪ N µ (o) observ ed drift v ector n num b er of pro cess observ ations p ambien t space dimension π (o) observ ed diffusion matrix R m m -th regeneration time of generalized life-cycle decomp osition S m m -th subsequen t regeneration time of generalized life-cycle decomp osition σ α α -th comp onen t diffusion vector field ∼ asymptotically equiv alent (ratio → 1) T sampling perio d θ s shift operator for time s ≥ 0 ˜ g s,t ( a, b ) ˜ ϕ s,t ( a, b ) − ˜ ϕ s ( a ) ˜ ϕ t ( b ), a measure of path dep endence Υ the scaling factor of the pro cess X t W ( M ) C ([0 , ∞ ) , M ) W r 0 { ω ∈ C ([0 , ∞ ) , R r ) : ω (0) = 0 } B t ( W ( M )) the σ -algebra generated by cylinder sets of W ( M ) up to time t > 0 B t ( W r 0 ) the σ -algebra generated by cylinder sets of W r 0 up to time t > 0 W t , W α t standard r -dimensional Brownian motion and its comp onents X high-dimensional time series A recurren t atom T m exp onen tial jump-times T able S.5. Commonly used notation. Dep ar tment of Ma thema tics, Duke University, Durham, 27708, NC, USA Courant Institute of Ma thema tical Sciences, New York University, New York, 10012, USA

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment