Developing fast and efficient algorithms for retrieval of objects to a given user query is an area of active research. The present study investigates retrieval of time series objects from a phoneme database to a given user pattern or query. The proposed method maps the one-dimensional time series retrieval into a sequence retrieval problem by partitioning the multi-dimensional phase-space using k-means clustering. The problem of whole sequence as well as subsequence matching is considered. Robustness of the proposed technique is investigated on phoneme time series corrupted with additive white Gaussian noise. The shortcoming of classical power-spectral techniques for time series retrieval is also discussed.
Deep Dive into Information retrieval from a phoneme time series database.
Developing fast and efficient algorithms for retrieval of objects to a given user query is an area of active research. The present study investigates retrieval of time series objects from a phoneme database to a given user pattern or query. The proposed method maps the one-dimensional time series retrieval into a sequence retrieval problem by partitioning the multi-dimensional phase-space using k-means clustering. The problem of whole sequence as well as subsequence matching is considered. Robustness of the proposed technique is investigated on phoneme time series corrupted with additive white Gaussian noise. The shortcoming of classical power-spectral techniques for time series retrieval is also discussed.
Real-world physical and biological systems are inherently nonlinear feedback systems that evolve with time.
External recording from such a system results in data dicretized in time as well as amplitude also know as digital data. Such a digital data generated as function of time represents a time series. Time series analysis is quite an involved topic accompanied by rigorous theoretical constructs [1]. However, there has been recent emphasis on developing generic techniques for approximate time series and sequence matching from information retrieval perspective [2][3][4][5].
The present study explores a nonlinear dynamical approach for time series matching and retrieval. The interdisciplinary nature of the nonlinear dynamical systems lends itself to a wide range of applications [6,7]. It is well appreciated that the various components of the experimental systems in real world are coupled nonlinearly with complex feedback loops. This restricts the choice of linear systems approaches for their analysis. External recordings sampled from such systems provide insight into the underlying dynamics and may exist in more than one-dimension. This multi dimensional representation is also known as the state-space/phasespace representation of the time series. The theory of nonlinear dynamical analysis provides us a way to reconstruct the multi-dimensional representation from the observed one-dimensional realization. Such a multidimensional representation also captures the rich geometry and inherent nonlinear correlations not evident in the one-dimensional representation. The report is organized as follows: In Sec 2, the mapping on the onedimensional time series onto a multi-dimensional phasespace is discussed. Subsequently, partitioning of the vectors in the phase space into a binary sequence is discussed in Sec 3. Such an approach implicitly maps the time series matching problem into a sequence matching problem. In Sec 4, whole sequence and subsequence retrieval from a phoneme database to a given user query is discussed. The effectiveness of the proposed technique in the presence of additive white Gaussian noise (AWGN) is also investigated.
Consider a dynamical system in a D-dimensional state space, R D . While the original state space may be high dimensional, in the case of dissipative dynamical systems, which is characteristic of most real world systems, the dynamics settles down on a low dimensional attractor with system evolution. This attractor exhibits rich geometry. The one-dimensional time series of finite number of points (N) is obtained by sampling a single dynamical variable with the measurement function ϕ at finite intervals of time (T), given by ϕ n ∈ R. The method of delays by Takens [8], provides us a way to reconstruct the vector series in an equivalent state space in R d from the observed one-dimensional time series ϕ n , n =1…N.
Such a mapping has been found to preserve the topological properties of the original state space. The reconstructed vector series in the equivalent state space is given by
where w(k) represents the state of the system in the ddimensional space. The above procedure is called embedding [8]. The quantities d and τ in (1) represent the embedding dimension and the time delay respectively. The embedded vectors can be represented in a matrix form, known as the trajectory matrix (Γ), where
Recent extensions and modifications to the Taken’s theorem have increased the application of the concepts of nonlinear dynamics to a wide range of data sets. Stark et al., [8] extended the Taken’s theorem to the class of forced systems. The concept of over-embedding was recently suggested [9] even to accommodate nonstationarities due to slowly varying time dependent parameters. The vectors form the rows of the trajectory matrix (Γ). It should be noted that a proper choice of the embedding dimension (d) and time delay (τ) is necessary for a proper unfolding of the geometry in the phase space. If d e is the true dimension of the data in the original state-space then d > 2d e would be a sufficient choice of the embedding dimension [6,7]. In real world data sets one does not have any knowledge about d e . In such cases, the method of false-nearest neighbor’s (FNN) [11] is used to determine the minimal embedding dimension (d). The vectors that are close to each other in a lower dimension, may show significant separation as the dimension is increased. The fraction of the FNN decreases with the increase in the embedding dimension. The value at which FNN is lowest (almost zero) is the minimal sufficient embedding dimension. A popular example where tow 13].In this report, the time delay is chosen as (τ = 1).
The number of FNN decreases with increase in the embedding dimension (d) characteristic of dissipative dynamical systems.
Linear transformation techniques such as Power spectral analysis, singular value decomposition (PCA), have been used extensively in mining time series data [3,4]. A close co
…(Full text truncated)…
This content is AI-processed based on ArXiv data.