Online Learning with Improving Agents: Multiclass, Budgeted Agents and Bandit Learners

Reading time: 5 minute
...

📝 Original Info

  • Title: Online Learning with Improving Agents: Multiclass, Budgeted Agents and Bandit Learners
  • ArXiv ID: 2602.17103
  • Date: 2026-02-19
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았으므로, 실제 논문을 확인 후 기입 바랍니다. **

📝 Abstract

We investigate the recently introduced model of learning with improvements, where agents are allowed to make small changes to their feature values to be warranted a more desirable label. We extensively extend previously published results by providing combinatorial dimensions that characterize online learnability in this model, by analyzing the multiclass setup, learnability in a bandit feedback setup, modeling agents' cost for making improvements and more.

💡 Deep Analysis

📄 Full Content

With the proliferation of machine learning based decision making tools and their application to societal and personal domains, there has been growing interest in understanding the implications of the use of such tools on the behavior of the individuals influenced by their decisions.

One aspect of such implications is addressed under the title of strategic classification. It addresses the possible manipulation of user’s data aimed to achieve desirable classification by the decision algorithm. The research along this line concerns setups where the feature vectors available to the learner differ from the true instance feature vectors in a way that increases the likelihood of some desirable outcomes. For example, users sign up for a gym club, to appear to be healthier to an algorithm assigning life insurance rates. Strategic classification learning aims to develop learning algorithms that mitigate the effects of such data manipulations ( [HMPW16], [ABY23] [AYZ24]).

Another related line of research aims to design algorithms that incentivize users to change their behavior (and consequently their true attributes) in a direction that improves their label ([MMH20] and more). In this setup, the designer of the algorithm wishes to incentivize the individuals to exercise more. This paper follows a recent line of work title Learning with improvements ([ABN + 25], [SS25]), where the learner assumes that the affected agents do change their true attributes towards achieving a desired label, but the focus is on accurate prediction of the resulting classification (rather than incentivizing behavioral change).

We extend the earlier published work on this topic along several axes: 1. While earlier work analyzed only the case of learning with finite hypothesis classes, we characterize learnability via a combinatorial dimension. Our dimension based analysis implies learnability, with explicit successful learners, for infinite classes as well (Section 3). We note that the mistake bound in this model is always upper bounded by the usual (no improvements) mistake bound, however, in some cases there is a big gap between the two (Observations 1 and 2 in that section). 2. We extend the scope of addressed by earlier works on learning with improving agents by analyzing the multiclass case. Namely, we consider setups in which there are (arbitrarily) many possible labels with a user-preference ordering over these labels. This reflects a situation like having one’s work being evaluated for several appreciation levels (say, your paper may be accepted as a poster, for a spotlight talk, for a full oral presentation, or for best paper award ….) (Section 4). 3. Once we discuss the multiclass setup, there is a natural question about the feedback provided to an online learner. When the labels are binary, a feedback indicating whether a predicted instance label is true or false, also reveals its true label. In contrast, with more than two possible labels, the distinction between full information feedback -revealing the correct label, and partial information setup where the feedback is restricted to correct/wrong label. In Section 5 we analyze this ‘bandit’ setup (that is irrelevant to the binary classification setup). We provide a combinatorial characterization of the optimal mistake bound for this setup, as well as describe the optimal (mistake minimizing) learner. In Subsection 5.1 we analyze the price in terms of additional mistakes, of the learner having only limited bandit feedback. 4. An underlying feature of learning with improvements is the notion of an improvement graph whose nodes are agents’ feature vectors and edges correspond to the ability of an agent to shift their feature vector (say, “paper with typos” to “paper without typos”). We extend previous work by removing the requirement that these graphs have a bounded degree. 5. Finally, in Section 6, we extend our investigation to the setup in which the agents incur a cost for improving their features.

We study learning with improvement setting initiated by Attias et al. [ABN + 25] who studied the problem in the PAC setting. They assume each agent can improve their features to a set of allowed features, if doing so would help them get a more desirable prediction. They show a separation between PAC learning and PAC learning with improvement. They also show that for some classes, allowing improvement makes it possible to find classifiers that achieve zero error as opposed to arbitrarily small error. This problem was extended by Sharma and Sun [SS25] to the online setting where the agents are chosen adversarially. However, Sharma and Sun [SS25] only consider finite and binary hypothesis classes. Moreover, they assume the number of points each agent can improve to is bounded. In this work we first extend their results (in the online setting) to infinite hypothesis classes and general improvement sets. We also introduce a model for studying the multiclass hypothesis classes where we a

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut