Multiple Kernel Learning: A Unifying Probabilistic Viewpoint
We present a probabilistic viewpoint to multiple kernel learning unifying well-known regularised risk approaches and recent advances in approximate Bayesian inference relaxations. The framework proposes a general objective function suitable for regression, robust regression and classification that is lower bound of the marginal likelihood and contains many regularised risk approaches as special cases. Furthermore, we derive an efficient and provably convergent optimisation algorithm.
š” Research Summary
This paper presents a unified probabilistic framework that bridges the two dominant paradigms in kernel learning: regularised risk minimisation (the āclassicalā SVM/ridgeāregression view) and Bayesian Gaussianāprocess (GP) modelling. The authors start by formalising the kernel as a linear combination of base kernels, K(Īø)=āāĪøāKā, with nonānegative coefficients Īø. In the regularisedārisk setting the objective is
āāuįµK(Īø)^{-1}u + Cāāāæā(yįµ¢,uįµ¢) ,
where ā is a loss function and C a tradeāoff parameter. By the representer theorem the solution can be expressed as u=K(Īø)α, turning the problem into a finiteādimensional optimisation over α.
From a Bayesian perspective the same setting is described by a GP prior P(u|Īø)=š©(u|0,K(Īø)) and a likelihood P(y|u). The joint MAP estimate then minimises
āāuįµK(Īø)^{-1}u ā 2āāāæ logāÆP(yįµ¢|uįµ¢) + log|K(Īø)| .
The logādeterminant term log|K(Īø)| is the normalising constant of the prior; it grows without bound when any Īøāāā, thereby automatically penalising overly complex kernels (an Occamāsārazor effect).
Multiple kernel learning (MKL) traditionally adds a convex regulariser Ī»āĪøāāįµ (or an āāānorm ball constraint) to the risk objective, yielding
āāĻ_MKL(Īø)=min_u
Comments & Academic Discussion
Loading comments...
Leave a Comment