This thesis derives, tests and applies two linear projection algorithms for machine learning under non-stationarity. The first finds a direction in a linear space upon which a data set is maximally non-stationary. The second aims to robustify two-way classification against non-stationarity. The algorithm is tested on a key application scenario, namely Brain Computer Interfacing.
Non-stationarity of a stochastic process is defined loosely as variability of probability distribution over time.
Conversely, stationarity of a stochastic process corresponds to constancy of distribution. In machine learning, typical tasks include regression, classification or system identification. For regression and classification, no guarantee on generalization, from a training set to a test set, may be made under the assumption that the underlying process, yielding the training and test sets, is non-stationary. Thus quantification of nonstationarity and algorithms for choosing features which are stationary are indispensable for these tasks.
On the other hand, in system identification, a stationary or maximally non-stationary subsystem often carries important significance in terms of the primitives of the domain under consideration: for example, in neuroscience, identification of the subsystem of the dynamics over synaptic weights with a neural network with n weights consisting of the subset of m < n weights whose distribution is maximally non-stationary under learning is of vital interest to analysis of the neural substrate underlying the learning process.
The classical literature on machine learning and statistical learning theory assumes that the samples used for training the parameters of, for instance, classifiers, are drawn from a single probability distribution, rather than a, possibly non-stationary, process [52]. That is to say, the time series from which the data are taken is stationary over time. Thus, approaches to learning which relax this stationarity assumption have been sparse up until the present time within the Machine Learning literature. In particular, the first algorithm, Stationary Subspace Analysis (SSA) for obtaining a linear stationary projection of a data set was published only recently in 2009 [54]. Since 2009 algorithms have published which refine SSA computationally [22] under assumptions, develop a maximum likelihood approach to SSA [27] and derive an algebraic algorithm for its solution [28]. In addition to SSA, linear algorithms have been published for applications, for example, sCSP for Brain Computer Interfacing [56]. However, non-stationarity in Machine Learning remains largely an open problem: for instance, no general technique for classification under non-stationarity has been proposed which is non-adaptive. (Adaptation has been studied in some detail, for instance, for an adaptive neural network for regression or classification, see [46]; otherwise only the covariate shift problem has been studied in detail by, for instance, [50].) Methods which seek to perform robust classification under non-stationarity remain to be fully investigated.
The present thesis’s contribution is twofold: firstly in Chapter 2 we present a method based on Stationary Subspace Analysis (SSA) [54] which addresses the system identification task described above, namely identifying a maximally non-stationary subsystem. We subsequently apply this method to a specific machine learning task, namely Change Point Detection. In particular, we show that using the method based on SSA as a prior feature extraction step to Change Point Detection, boosts the performance of three representative Change Point Detection algorithms on synthetic data and data adapted from real world recordings. Note that this chapter (Chapter 2) consists in part of joint work of the present author and the authors of the following submitted paper: [10], of which the Chapter is an adapted version. Secondly, in Chapter 3, we address the two-fold classification problem under non-stationarity and propose a method for adapting a commonly used method for two-fold classification, namely Linear Discriminant Analysis (LDA) for this setting: we call the resulting algorithm stationary Linear Discriminant Analysis, or sLDA for short. We investigate the properties and performance of the algorithm on simulated data and on data recorded for Brain Computer Interface experiments. Finally having tested the algorithm we perform a rigorous investigation of the results obtained by means of statistical testing and comparison with base-line methods. Please note, in addition, that Chapter 3 includes joint work with Wojciech Samek.
Maximizing Non-Stationarity with Applications to Change Point Detection
Change Point Detection is a task that appears in a broad range of applications such as biomedical signal processing [19,39,31], speech recognition [2,45], industrial process monitoring [4,38], fault state Detection [17] and econometrics [13]. The goal of Change Point Detection is to find the time points at which a time series changes from one macroscopic state to another. As a result, the time series is decomposed into segments [4] of similar behavior. Change point detection is based on finding changes in the properties of the data, such as in the moments (mean, variance, kurtosis) [4], in the spectral properties [3], temporal structure [30] or change
This content is AI-processed based on open access ArXiv data.