Scalable Greedy Algorithms for Transfer Learning

Scalable Greedy Algorithms for Transfer Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this paper we consider the binary transfer learning problem, focusing on how to select and combine sources from a large pool to yield a good performance on a target task. Constraining our scenario to real world, we do not assume the direct access to the source data, but rather we employ the source hypotheses trained from them. We propose an efficient algorithm that selects relevant source hypotheses and feature dimensions simultaneously, building on the literature on the best subset selection problem. Our algorithm achieves state-of-the-art results on three computer vision datasets, substantially outperforming both transfer learning and popular feature selection baselines in a small-sample setting. We also present a randomized variant that achieves the same results with the computational cost independent from the number of source hypotheses and feature dimensions. Also, we theoretically prove that, under reasonable assumptions on the source hypotheses, our algorithm can learn effectively from few examples.


💡 Research Summary

The paper tackles a binary transfer‑learning scenario in which a learner has access only to a large pool of pre‑trained source models (hypotheses) and a very small set of labeled examples from a target task. Unlike classic domain‑adaptation approaches that assume raw source data are available, the authors adopt the Hypothesis Transfer Learning (HTL) framework, treating each source model as a black‑box. Their goal is to select a limited number k of source hypotheses and raw feature dimensions that together yield the best predictor for the target domain.

Formally, the target predictor is expressed as
(h_{\text{trg}}(x)=w^{\top}x+\sum_{i=1}^{n}\beta_i h_{\text{src}i}(x)),
where (w) are weights on the original features, (\beta_i) are coefficients for the n source hypotheses, and the pair ((w,\beta)) is learned from the m target training points ({(x_j,y_j)}
{j=1}^m). The learning objective combines a regularized empirical risk with an ℓ₀ sparsity constraint:

\


Comments & Academic Discussion

Loading comments...

Leave a Comment