Online Learning Algorithms for Stochastic Water-Filling
Water-filling is the term for the classic solution to the problem of allocating constrained power to a set of parallel channels to maximize the total data-rate. It is used widely in practice, for example, for power allocation to sub-carriers in multi-user OFDM systems such as WiMax. The classic water-filling algorithm is deterministic and requires perfect knowledge of the channel gain to noise ratios. In this paper we consider how to do power allocation over stochastically time-varying (i.i.d.) channels with unknown gain to noise ratio distributions. We adopt an online learning framework based on stochastic multi-armed bandits. We consider two variations of the problem, one in which the goal is to find a power allocation to maximize $\sum\limits_i \mathbb{E}[\log(1 + SNR_i)]$, and another in which the goal is to find a power allocation to maximize $\sum\limits_i \log(1 + \mathbb{E}[SNR_i])$. For the first problem, we propose a \emph{cognitive water-filling} algorithm that we call CWF1. We show that CWF1 obtains a regret (defined as the cumulative gap over time between the sum-rate obtained by a distribution-aware genie and this policy) that grows polynomially in the number of channels and logarithmically in time, implying that it asymptotically achieves the optimal time-averaged rate that can be obtained when the gain distributions are known. For the second problem, we present an algorithm called CWF2, which is, to our knowledge, the first algorithm in the literature on stochastic multi-armed bandits to exploit non-linear dependencies between the arms. We prove that the number of times CWF2 picks the incorrect power allocation is bounded by a function that is polynomial in the number of channels and logarithmic in time, implying that its frequency of incorrect allocation tends to zero.
💡 Research Summary
The paper tackles the classic water‑filling power allocation problem in a setting where the channel gain‑to‑noise ratios are stochastic, i.i.d. over time, and their distributions are unknown. Instead of assuming perfect instantaneous channel state information (CSI), the authors cast the problem as a stochastic combinatorial multi‑armed bandit (MAB) task. Each feasible power allocation vector (subject to total and per‑carrier power constraints) is treated as an arm, and the reward is the sum‑rate obtained on the selected sub‑carriers, which is a non‑linear function of the random channel gains (typically log(1 + a_i X_i)).
Two optimization objectives are considered. The first (O₁) maximizes the expected sum‑rate, i.e., maxₐ E
Comments & Academic Discussion
Loading comments...
Leave a Comment