Is margin preserved after random projection?

Is margin preserved after random projection?
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Random projections have been applied in many machine learning algorithms. However, whether margin is preserved after random projection is non-trivial and not well studied. In this paper we analyse margin distortion after random projection, and give the conditions of margin preservation for binary classification problems. We also extend our analysis to margin for multiclass problems, and provide theoretical bounds on multiclass margin on the projected data.


💡 Research Summary

The paper investigates a fundamental yet under‑explored question in the theory of random projections (RP): does the geometric margin that underlies the generalisation ability of classifiers survive the dimensionality reduction? While the Johnson‑Lindenstrauss (JL) lemma guarantees that pairwise Euclidean distances (and consequently inner products) are preserved up to a factor ((1\pm\epsilon)) when projecting from a high‑dimensional space (\mathbb{R}^d) to a lower‑dimensional space (\mathbb{R}^k), margin is a more intricate quantity. It depends not only on distances but also on the relative orientation of data points with respect to a decision hyper‑plane and on the label information. The authors therefore develop a dedicated theoretical framework that links JL‑type distance preservation to margin preservation for both binary and multiclass linear classifiers.

Binary classification model
The authors start with a standard linear binary classifier. For a training set ({(x_i,y_i)}_{i=1}^N) with (x_i\in\mathbb{R}^d) and labels (y_i\in{-1,+1}), the optimal separating hyper‑plane is defined by parameters ((w,b)). The (geometric) margin is (\gamma = \min_i y_i (w^\top x_i + b)/|w|). A larger (\gamma) is known to correlate with better generalisation.

Random projection
A random matrix (R\in\mathbb{R}^{k\times d}) with i.i.d. Gaussian entries (or a suitable sparse sub‑Gaussian distribution) is used to map each point to (\tilde{x}_i = \frac{1}{\sqrt{k}} R x_i). The JL lemma states that if
\


Comments & Academic Discussion

Loading comments...

Leave a Comment