Online Learning of Assignments that Maximize Submodular Functions
Which ads should we display in sponsored search in order to maximize our revenue? How should we dynamically rank information sources to maximize value of information? These applications exhibit strong diminishing returns: Selection of redundant ads and information sources decreases their marginal utility. We show that these and other problems can be formalized as repeatedly selecting an assignment of items to positions to maximize a sequence of monotone submodular functions that arrive one by one. We present an efficient algorithm for this general problem and analyze it in the no-regret model. Our algorithm possesses strong theoretical guarantees, such as a performance ratio that converges to the optimal constant of 1-1/e. We empirically evaluate our algorithm on two real-world online optimization problems on the web: ad allocation with submodular utilities, and dynamically ranking blogs to detect information cascades.
💡 Research Summary
The paper tackles a novel online optimization problem that combines assignment constraints with monotone submodular objectives. In many real‑world web services, a decision maker must repeatedly assign a set of items (ads, news sources, blog posts, etc.) to a limited number of positions (ad slots, ranking slots, sensor locations) while the utility of each assignment is captured by a submodular function that exhibits diminishing returns. Unlike classic submodular maximization, which assumes a static ground set, the authors consider a sequence of functions f₁,…,f_T that arrive online, and the algorithm must choose an assignment S_t from a feasible family ℳ (typically a matching constraint) at each round t without knowledge of future functions.
The core contribution is an efficient online greedy algorithm that maintains an estimate of marginal gains for each item‑position pair based on the history of observed functions. At round t the algorithm computes a surrogate gain ˆΔ_{i,j} for every possible assignment (i,j) and selects the feasible assignment that maximizes the sum of these surrogate gains. The algorithm incorporates an adaptive learning rate η_t derived from a Lagrangian‑based online convex optimization framework; η_t decays as O(1/√t) to balance exploration and exploitation while respecting the submodular structure.
Two main theoretical guarantees are proved. First, the expected regret R_T = Σ_{t=1}^T
Comments & Academic Discussion
Loading comments...
Leave a Comment