DNS: Data-driven Nonlinear Smoother for Complex Model-free Process
We propose data-driven nonlinear smoother (DNS) to estimate a hidden state sequence of a complex dynamical process from a noisy, linear measurement sequence. The dynamical process is model-free, that is, we do not have any knowledge of the nonlinear dynamics of the complex process. There is no state-transition model (STM) of the process available. The proposed DNS uses a recurrent architecture that helps to provide a closed-form posterior of the hidden state sequence given the measurement sequence. DNS learns in an unsupervised manner, meaning the training dataset consists of only measurement data and no state data. We demonstrate DNS using simulations for smoothing of several stochastic dynamical processes, including a benchmark Lorenz system. Experimental results show that the DNS is significantly better than a deep Kalman smoother (DKS) and an iterative data-driven nonlinear state estimation (iDANSE) smoother.
💡 Research Summary
The paper introduces a novel data‑driven nonlinear smoother (DNS) that can estimate the hidden state sequence of a complex dynamical process without any knowledge of its state‑transition model (STM). The only available information is a linear measurement model yₜ = H xₜ + wₜ, where H and the measurement‑noise covariance C_w are known, and a large collection of measurement‑only sequences for training. This setting—referred to as “model‑free”—makes traditional Bayesian smoothers such as the Rauch‑Tung‑Striebel (RTS) smoother or particle smoothers inapplicable, because they rely on an STM to regularize the inference.
DNS tackles the problem in two phases: inference and unsupervised learning. In the inference phase the posterior p(x₁:ₜ|y₁:ₜ) is factorized as
p(x₁:ₜ|y₁:ₜ) = p(x₁|y₁:ₜ) ∏{t=2}^{T} p(xₜ| \hat{x}{1:t‑1}, y₁:ₜ)
where \hat{x}{1:t‑1} denotes the sequence of states estimated in previous steps. Crucially, the conditional distribution p(xₜ| \hat{x}{1:t‑1}, y₁:ₜ) is allowed to depend on future measurements y_{t+1:T} (anti‑causal information). This is a key distinction from the iterative DANSE (iDANSE) smoother, which only uses past measurements and therefore loses smoothing power.
A deep recurrent architecture (DRA) parameterizes the prior p(xₜ| \hat{x}{1:t‑1}, y{1:t‑1}, y_{t+1:T}) as a Gaussian N(mₜ, Lₜ). The DRA consists of three gated recurrent units (GRUs): one processes the past measurement sequence, another processes the future measurement sequence, and the third processes the previously estimated states. Their outputs are concatenated, passed through two dense layers, and combined with a skip‑connection that directly injects the state‑estimate stream. The resulting mean mₜ(θ) and covariance Lₜ(θ) are functions of the network parameters θ.
Given the Gaussian prior and the current measurement yₜ, a closed‑form Kalman‑like update yields the posterior N( \tilde{m}_ₜ, \tilde{L}_ₜ) (Eq. 7). This step uses the Woodbury matrix identity and “completing the square”, preserving the computational efficiency of classic Kalman filtering while avoiding any explicit STM.
Training is fully unsupervised. For each time step the log‑likelihood of the observed measurement can be expressed analytically after integrating out the hidden state (Eq. 8). Summing over all time steps and all training sequences gives a total log‑likelihood L(D; θ). Maximizing L with respect to θ via stochastic gradient ascent yields the DRA that best explains the measurement data. Because the likelihood depends on the recursively estimated states \hat{x}_{1:t‑1}, the inference pass is embedded inside the learning loop, making the training process end‑to‑end.
The authors evaluate DNS on three stochastic dynamical systems: (1) a stochastic Lorenz‑63 system (3‑dimensional, Markovian), (2) a stochastic Chen system (3‑dimensional, Markovian), and (3) a stochastic double‑spring pendulum (SDSP) where the observable 4‑dimensional position vector is non‑Markovian (the underlying internal state has eight dimensions). For all experiments H = I and isotropic Gaussian noise are used. Training uses 1 000 short sequences (length = 100); testing uses 100 long sequences (length = 1 000) to assess generalization. Performance is measured in signal‑to‑measurement‑noise ratio (SMNR) and normalized mean‑square error (NMSE) at three noise levels (‑10 dB, 0 dB, 10 dB).
Results (Table 1) show that DNS consistently outperforms the deep Kalman smoother (DKS), the iterative DANSE (iDANSE), and a baseline DANSE estimator across all systems and noise levels. The advantage is most pronounced at low SMNR (‑10 dB), where the inclusion of anti‑causal measurements yields a substantial NMSE reduction. A lighter version of DNS (DNS‑S) that omits past measurements still beats DKS and iDANSE but falls short of the full DNS, confirming the importance of using both past and future information. An extended RTS smoother (ER‑TSS) that knows the true STM serves as an upper bound; DNS approaches its performance despite having no STM.
Key contributions of the paper are: (i) a fully unsupervised Bayesian smoother for model‑free processes, (ii) a recurrent neural network architecture that fuses past, future, and prior‑state information to produce a Gaussian prior, and (iii) a closed‑form posterior update that retains the computational simplicity of Kalman filtering. Limitations include the reliance on a linear measurement model, potential sensitivity to error accumulation in the recursive state estimates during training, and the lack of real‑world hardware validation. Future work could extend DNS to nonlinear measurement models, multi‑sensor fusion, online adaptation, and deployment on embedded platforms.
In summary, DNS represents a significant step toward practical, data‑driven smoothing for systems where physics‑based models are unavailable, offering both theoretical elegance (closed‑form Bayesian updates) and empirical superiority over existing data‑driven smoothers.
Comments & Academic Discussion
Loading comments...
Leave a Comment