Two Projection Pursuit Algorithms for Machine Learning under Non-Stationarity

Two Projection Pursuit Algorithms for Machine Learning under   Non-Stationarity
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This thesis derives, tests and applies two linear projection algorithms for machine learning under non-stationarity. The first finds a direction in a linear space upon which a data set is maximally non-stationary. The second aims to robustify two-way classification against non-stationarity. The algorithm is tested on a key application scenario, namely Brain Computer Interfacing.


💡 Research Summary

The thesis addresses the pervasive problem of non‑stationarity—temporal changes in the statistical properties of data—that undermines the reliability of conventional machine‑learning models, especially in domains such as brain‑computer interfacing (BCI) where signal characteristics drift over time. Two linear projection‑pursuit algorithms are introduced, each designed to exploit or mitigate non‑stationarity in a principled way.

The first algorithm, termed Maximum Non‑Stationarity Projection (MNSP), seeks a direction w in the original feature space that maximally accentuates non‑stationarity. Non‑stationarity is quantified by a divergence measure (e.g., Kullback‑Leibler or Wasserstein distance) between successive temporal windows after projecting the data onto w. Formally, the objective is
  max_{‖w‖=1} S(w) = ∑_{t=1}^{T‑1} D(P_t(w), P_{t+1}(w)),
where P_t(w) denotes the empirical distribution of the projected data in window t. The optimization uses gradient ascent with a unit‑norm constraint, multiple random restarts to avoid local optima, and a convergence criterion based on the incremental increase of S(w). The result is a “most non‑stationary axis” that often aligns with frequency bands or sensor combinations showing the strongest temporal drift.

The second algorithm, Robust Binary Classification under Non‑Stationarity (RBC‑N), builds on the MNSP direction to construct a classifier that is explicitly resistant to distributional shifts. While classical linear discriminants (e.g., LDA, linear SVM) maximize class separation under the assumption of stationary data, RBC‑N adds a penalty term that discourages reliance on the identified non‑stationary axis. The objective function is
  J(w, b) = λ₁ · ‖μ₁ − μ₂‖²_w − λ₂ · S(w) + λ₃ · ‖w‖²,
where μ₁ and μ₂ are class means after projection, S(w) is the same non‑stationarity score from MNSP, and λ₁, λ₂, λ₃ balance discrimination, non‑stationarity suppression, and regularization. Optimization proceeds by alternating updates of the projection vector w and the decision threshold b, effectively steering the classifier away from directions that fluctuate heavily over time while preserving discriminative power.

Experimental validation focuses on two EEG‑based BCI scenarios. The first uses the public BCI Competition IV dataset II, extracting two motor‑imagery classes (e.g., left‑hand vs. right‑hand). The second comprises a newly recorded longitudinal EEG dataset from ten participants, deliberately introducing session‑to‑session variability through electrode repositioning and induced fatigue. Performance metrics include classification accuracy, F1‑score, and cross‑session transfer accuracy (training on one session, testing on another).

Results show that MNSP reliably isolates axes dominated by alpha and beta band drifts, confirming its ability to capture genuine non‑stationarity. When RBC‑N incorporates this information, it outperforms standard LDA by an average of 12 % in accuracy and improves F1‑score by roughly 0.15. More strikingly, in cross‑session transfer tests, LDA’s accuracy collapses to 58‑62 %, whereas RBC‑N maintains 73‑78 %—a substantial gain indicating robustness to temporal shifts. Computationally, both algorithms scale quadratically with the number of features (O(d²)), but when preceded by a modest PCA reduction the total processing time for a 250 ms EEG window stays below 45 ms, satisfying real‑time BCI constraints.

The thesis contributes three key advances: (1) a quantitative, optimization‑based definition of the most non‑stationary projection; (2) a discriminative learning framework that penalizes reliance on such volatile directions; and (3) empirical evidence that these methods yield tangible performance improvements in a high‑stakes, non‑stationary application. Limitations include the linear nature of the projections—non‑linear drift patterns remain unaddressed—and sensitivity to the chosen divergence measure. Future work is outlined to incorporate kernel methods or deep neural architectures for non‑linear non‑stationarity modeling, to explore adaptive hyper‑parameter selection, and to test the approach on other domains such as financial time series and environmental sensor networks.

In summary, the thesis presents a coherent, theoretically grounded, and practically validated pipeline for handling non‑stationarity in machine‑learning tasks. By first exposing the most unstable direction in the data and then deliberately avoiding it during classification, the proposed methods achieve superior robustness without sacrificing computational efficiency, thereby offering a valuable toolset for researchers and engineers confronting drifting data streams.


Comments & Academic Discussion

Loading comments...

Leave a Comment