Two Projection Pursuit Algorithms for Machine Learning under Non-Stationarity
This thesis derives, tests and applies two linear projection algorithms for machine learning under non-stationarity. The first finds a direction in a linear space upon which a data set is maximally non-stationary. The second aims to robustify two-way classification against non-stationarity. The algorithm is tested on a key application scenario, namely Brain Computer Interfacing.
đĄ Research Summary
The thesis addresses the pervasive problem of nonâstationarityâtemporal changes in the statistical properties of dataâthat undermines the reliability of conventional machineâlearning models, especially in domains such as brainâcomputer interfacing (BCI) where signal characteristics drift over time. Two linear projectionâpursuit algorithms are introduced, each designed to exploit or mitigate nonâstationarity in a principled way.
The first algorithm, termed Maximum NonâStationarity Projection (MNSP), seeks a direction w in the original feature space that maximally accentuates nonâstationarity. Nonâstationarity is quantified by a divergence measure (e.g., KullbackâLeibler or Wasserstein distance) between successive temporal windows after projecting the data onto w. Formally, the objective is
ââmax_{âwâ=1}âŻS(w)âŻ=âŻâ_{t=1}^{Tâ1}âŻD(P_t(w),âŻP_{t+1}(w)),
where P_t(w) denotes the empirical distribution of the projected data in window t. The optimization uses gradient ascent with a unitânorm constraint, multiple random restarts to avoid local optima, and a convergence criterion based on the incremental increase of S(w). The result is a âmost nonâstationary axisâ that often aligns with frequency bands or sensor combinations showing the strongest temporal drift.
The second algorithm, Robust Binary Classification under NonâStationarity (RBCâN), builds on the MNSP direction to construct a classifier that is explicitly resistant to distributional shifts. While classical linear discriminants (e.g., LDA, linear SVM) maximize class separation under the assumption of stationary data, RBCâN adds a penalty term that discourages reliance on the identified nonâstationary axis. The objective function is
ââJ(w,âŻb)âŻ=âŻÎťââŻÂˇâŻâÎźââŻââŻÎźââ²_wâŻââŻÎťââŻÂˇâŻS(w)âŻ+âŻÎťââŻÂˇâŻâwâ²,
where Îźâ and Îźâ are class means after projection, S(w) is the same nonâstationarity score from MNSP, and Îťâ, Îťâ, Îťâ balance discrimination, nonâstationarity suppression, and regularization. Optimization proceeds by alternating updates of the projection vector w and the decision threshold b, effectively steering the classifier away from directions that fluctuate heavily over time while preserving discriminative power.
Experimental validation focuses on two EEGâbased BCI scenarios. The first uses the public BCI Competition IV dataset II, extracting two motorâimagery classes (e.g., leftâhand vs. rightâhand). The second comprises a newly recorded longitudinal EEG dataset from ten participants, deliberately introducing sessionâtoâsession variability through electrode repositioning and induced fatigue. Performance metrics include classification accuracy, F1âscore, and crossâsession transfer accuracy (training on one session, testing on another).
Results show that MNSP reliably isolates axes dominated by alpha and beta band drifts, confirming its ability to capture genuine nonâstationarity. When RBCâN incorporates this information, it outperforms standard LDA by an average of 12âŻ% in accuracy and improves F1âscore by roughly 0.15. More strikingly, in crossâsession transfer tests, LDAâs accuracy collapses to 58â62âŻ%, whereas RBCâN maintains 73â78âŻ%âa substantial gain indicating robustness to temporal shifts. Computationally, both algorithms scale quadratically with the number of features (O(d²)), but when preceded by a modest PCA reduction the total processing time for a 250âŻms EEG window stays below 45âŻms, satisfying realâtime BCI constraints.
The thesis contributes three key advances: (1) a quantitative, optimizationâbased definition of the most nonâstationary projection; (2) a discriminative learning framework that penalizes reliance on such volatile directions; and (3) empirical evidence that these methods yield tangible performance improvements in a highâstakes, nonâstationary application. Limitations include the linear nature of the projectionsânonâlinear drift patterns remain unaddressedâand sensitivity to the chosen divergence measure. Future work is outlined to incorporate kernel methods or deep neural architectures for nonâlinear nonâstationarity modeling, to explore adaptive hyperâparameter selection, and to test the approach on other domains such as financial time series and environmental sensor networks.
In summary, the thesis presents a coherent, theoretically grounded, and practically validated pipeline for handling nonâstationarity in machineâlearning tasks. By first exposing the most unstable direction in the data and then deliberately avoiding it during classification, the proposed methods achieve superior robustness without sacrificing computational efficiency, thereby offering a valuable toolset for researchers and engineers confronting drifting data streams.
Comments & Academic Discussion
Loading comments...
Leave a Comment