A UCB Bandit Algorithm for General ML-Based Estimators

Reading time: 1 minute
...

๐Ÿ“ Original Info

  • Title: A UCB Bandit Algorithm for General ML-Based Estimators
  • ArXiv ID: 2601.01061
  • Date: 2026-01-03
  • Authors: Yajing Liu, Erkao Bao, Linqi Song

๐Ÿ“ Abstract

We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary model-based estimators into multi-armed bandit frameworks. A fundamental challenge in deploying sophisticated ML models for sequential decision-making is the lack of tractable concentration inequalities required for principled exploration. We overcome this by directly modeling the learning curve behavior of the underlying estimator. Assuming the Mean Squared Error follows a power-law decay as training samples increase, we derive a generalized concentration inequality and prove ML-UCB achieves sublinear regret. This framework enables principled integration of any ML model whose learning curve can be empirically characterized, eliminating model-specific theoretical analysis. Our approach significantly reduces implementation complexity while saving compute and memory resources through its simple formula based on offline-trained parameters. Experiments on collaborative filtering with synthetic data demonstrate substantial improvements over LinUCB.

๐Ÿ“„ Full Content

...(๋ณธ๋ฌธ ๋‚ด์šฉ์ด ๊ธธ์–ด ์ƒ๋žต๋˜์—ˆ์Šต๋‹ˆ๋‹ค. ์‚ฌ์ดํŠธ์—์„œ ์ „๋ฌธ์„ ํ™•์ธํ•ด ์ฃผ์„ธ์š”.)

Start searching

Enter keywords to search articles

โ†‘โ†“
โ†ต
ESC
โŒ˜K Shortcut