A UCB Bandit Algorithm for General ML-Based Estimators

February 09, 2026

Reading time: 1 minute

...

📝 Original Info

Title: A UCB Bandit Algorithm for General ML-Based Estimators
ArXiv ID: 2601.01061
Date: 2026-01-03
Authors: Yajing Liu, Erkao Bao, Linqi Song

📝 Abstract

We present ML-UCB, a generalized upper confidence bound algorithm that integrates arbitrary model-based estimators into multi-armed bandit frameworks. A fundamental challenge in deploying sophisticated ML models for sequential decision-making is the lack of tractable concentration inequalities required for principled exploration. We overcome this by directly modeling the learning curve behavior of the underlying estimator. Assuming the Mean Squared Error follows a power-law decay as training samples increase, we derive a generalized concentration inequality and prove ML-UCB achieves sublinear regret. This framework enables principled integration of any ML model whose learning curve can be empirically characterized, eliminating model-specific theoretical analysis. Our approach significantly reduces implementation complexity while saving compute and memory resources through its simple formula based on offline-trained parameters. Experiments on collaborative filtering with synthetic data demonstrate substantial improvements over LinUCB.

📄 Full Content

...(본문 내용이 길어 생략되었습니다. 사이트에서 전문을 확인해 주세요.)

A UCB Bandit Algorithm for General ML-Based Estimators

📝 Original Info

📝 Abstract

📄 Full Content

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

📄 Full Content

Start searching

No results found