A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study online configuration selection with admission control problem, which arises in LLM serving, GPU scheduling, and revenue management. In a planning horizon with $T$ periods, we consider a two-layer framework for the decisions made within each time period. In the first layer, the decision maker selects one of the $K$ configurations (ex. quantization, parallelism, fare class) which induces distribution over the reward-resource pair of the incoming request. In the second layer, the decision maker observes the request and then decides whether to accept it or not. Benchmarking this framework requires care. We introduce a \textbf{switching-aware fluid oracle} that accounts for the value of mixing configurations over time, provably upper-bounding any online policy. We derive a max-min formulation for evaluating the benchmark, and we characterize saddle points of the max-min problem via primal-dual optimality conditions linking equilibrium, feasibility, and complementarity. This guides the design of \textbf{SP-UCB–OLP} algorithm, which solves an optimistic saddle point problem and achieves $\tilde{O}(\sqrt{KT})$ regret.

💡 Research Summary

The paper addresses a novel online decision‑making problem that naturally arises in large‑language‑model (LLM) serving, GPU scheduling, and revenue management: a two‑layer process where a system first selects a configuration (e.g., quantization level, parallelism degree, fare class) and then, after observing the realized reward‑resource pair of an incoming request, decides whether to admit the request under a cumulative resource budget. Formally, over a horizon of (T) periods, there are (K) configurations (\Theta={1,\dots,K}). Selecting configuration (\theta_t) at time (t) induces a distribution (D_{\theta_t}) over reward (r_t) and resource consumption vector (a_t). After observing ((r_t,a_t)), the decision maker chooses a binary admission variable (x_t\in{0,1}). The path‑wise budget constraint requires (\sum_{t=1}^T a_t x_t \le B) component‑wise, and the objective is to maximize (\mathbb{E}

A Two-Layer Framework for Joint Online Configuration Selection and Admission Control

💡 Research Summary

Comments & Academic Discussion

Leave a Comment