Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks

Plug-In Classification of Drift Functions in Diffusion Processes Using Neural Networks
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study a supervised multiclass classification problem for diffusion processes, where each class is characterized by a distinct drift function and trajectories are observed at discrete times. Extending the one-dimensional multiclass framework of Denis et al. (2024) to multidimensional diffusions, we propose a neural network-based plug-in classifier that estimates the drift functions for each class from independent sample paths and assigns labels based on a Bayes-type decision rule. Under standard regularity assumptions, we establish convergence rates for the excess misclassification risk, explicitly capturing the effects of drift estimation error and time discretization. Numerical experiments demonstrate that the proposed method achieves faster convergence and improved classification performance compared to Denis et al. (2024) in the one-dimensional setting, remains effective in higher dimensions when the underlying drift functions admit a compositional structure, and consistently outperforms direct neural network classifiers trained end-to-end on trajectories without exploiting the diffusion model structure.


💡 Research Summary

The paper addresses supervised multiclass classification for diffusion processes where each class is distinguished solely by its drift function. The authors consider a d‑dimensional stochastic differential equation
(dX_t = b_{Y}(X_t)dt + \sigma(X_t)dB_t)
with a known diffusion coefficient σ and known class priors (p_k). For each class (k) the drift (b_k) is unknown and must be learned from high‑frequency discrete observations of independent sample paths.

The theoretical contribution begins with a Bayes optimal classifier (g^) for continuous‑time observations. Using Girsanov’s theorem, the posterior class probabilities are expressed as a soft‑max of functionals
(F_k^
(X)=\int_0^T b_k^\top a^{-1} dX - \frac12\int_0^T|\sigma^{-1}b_k|^2 ds)
where (a=\sigma\sigma^\top). Proposition 2.4 shows that (g^(X)=\arg\max_k \phi_k(F^(X))) with (\phi_k) the soft‑max weighted by the priors. This extends the one‑dimensional result of Denis et al. (2024) to the multivariate case.

Because real data are observed on a fine grid (t_m=m\Delta), the authors introduce a discretized version (\bar F_k) of the functional and replace the unknown drifts by estimators (\hat b_k). The resulting plug‑in classifier computes scores
(\hat F_k = \frac1M\sum_{m=0}^{M-1}\hat b_k(X_{t_m})^\top a^{-1}(X_{t_m})(X_{t_{m+1}}-X_{t_m}) - \frac{\Delta}{2}\sum_{m=0}^{M-1}|\sigma^{-1}(X_{t_m})\hat b_k(X_{t_m})|^2)
and assigns the label with the largest soft‑max probability.

Drift estimation is performed component‑wise with feed‑forward ReLU neural networks. The network class (\mathcal F(L,p,s,F)) imposes an (\ell_\infty) bound on weights and biases, a sparsity constraint (s), and a uniform supremum bound (F). For each class (k) and each coordinate (i), the estimator (\hat b_{i}^{(k)}) minimizes the empirical squared error between the observed increments (\Delta^{-1}(X_{t_{m+1}}-X_{t_m})) and the network output.

Theorem 2.5 decomposes the excess misclassification risk of the plug‑in classifier into two terms: a discretization error of order (\sqrt{\Delta}) and a drift‑estimation error of order (\sqrt{\mathcal E(\hat b_k,b_k)}), where (\mathcal E) is the global (L^2) error of the drift estimator. This clean separation shows that improving either the sampling frequency or the drift estimator directly reduces classification error.

Theorem 2.7 leverages recent non‑parametric convergence results for sparse ReLU networks (Zhao et al., 2025). Assuming each true drift belongs to a compositional Hölder class (G(q,d,t,\beta)), the authors choose depth (L\asymp\log N), widths ({p_i}) and sparsity (s) proportional to the effective complexity (\varphi_N = N^{-\beta^/(2\beta^+t)}) (up to logarithmic factors). Under the condition (\Delta \lesssim \varphi_N \log^3 N) and a small empirical optimization error, the excess risk satisfies
(R(\hat g)-R(g^*) \le C\big(\sqrt{\Delta} + \varphi_N^{1/2-\varepsilon}\big))
for any (\varepsilon\in(0,1/4]). Hence, with sufficiently fine time steps, the classifier attains the Bayes risk up to the minimax‑optimal drift‑estimation rate, independent of the ambient dimension when the drift has a compositional structure.

Two simulation studies validate the theory. The first experiment uses a high‑dimensional diffusion ((d=10) or more) with locally fluctuating drifts that admit a compositional representation. The neural‑network plug‑in classifier outperforms B‑spline drift estimators and a direct end‑to‑end classifier that ignores the SDE structure, achieving the predicted convergence rate and showing dimension‑free behavior. The second experiment reproduces the one‑dimensional benchmark of Denis et al. (2024); the proposed method matches or slightly exceeds their plug‑in classifier and the Bayes benchmark, confirming that the extension to higher dimensions does not sacrifice performance in the original setting.

In summary, the paper delivers (i) an explicit Bayes optimal rule for multivariate diffusion classification, (ii) a practical plug‑in algorithm that combines neural‑network drift estimation with a theoretically justified decision rule, (iii) rigorous risk bounds that separate discretization and estimation errors, and (iv) empirical evidence that the method scales to higher dimensions when drifts possess a compositional structure. This work substantially broadens the applicability of diffusion‑based classification beyond the one‑dimensional case and provides a solid foundation for future research on supervised learning with stochastic differential equation models.


Comments & Academic Discussion

Loading comments...

Leave a Comment