Exploiting Channel Memory for Joint Estimation and Scheduling in Downlink Networks

We address the problem of opportunistic multiuser scheduling in downlink networks with Markov-modeled outage channels. We consider the scenario in which the scheduler does not have full knowledge of the channel state information, but instead estimates the channel state information by exploiting the memory inherent in the Markov channels along with ARQ-styled feedback from the scheduled users. Opportunistic scheduling is optimized in two stages: (1) Channel estimation and rate adaptation to maximize the expected immediate rate of the scheduled user; (2) User scheduling, based on the optimized immediate rate, to maximize the overall long term sum-throughput of the downlink. The scheduling problem is a partially observable Markov decision process with the classic ’exploitation vs exploration’ trade-off that is difficult to quantify. We therefore study the problem in the framework of Restless Multi-armed Bandit Processes (RMBP) and perform a Whittle’s indexability analysis. Whittle’s indexability is traditionally known to be hard to establish and the index policy derived based on Whittle’s indexability is known to have optimality properties in various settings. We show that the problem of downlink scheduling under imperfect channel state information is Whittle indexable and derive the Whittle’s index policy in closed form. Via extensive numerical experiments, we show that the index policy has near-optimal performance. Our work reveals that, under incomplete channel state information, exploiting channel memory for opportunistic scheduling can result in significant performance gains and that almost all of these gains can be realized using an easy-to-implement index policy.

💡 Research Summary

The paper tackles opportunistic downlink scheduling in a multi‑user wireless system where each user’s channel evolves as a two‑state Markov process (good/bad) and the scheduler does not have perfect instantaneous channel state information (CSI). Instead, the scheduler receives only ACK/NACK feedback from the user that is scheduled in a given slot. By exploiting the temporal correlation (memory) inherent in the Markov model, the scheduler maintains a belief state for each user – the probability that the user’s channel is in the good state – and updates this belief recursively using Bayes’ rule after each feedback observation.

The authors decompose the overall problem into two sequential stages. In the first stage, given the current belief for a selected user, the scheduler chooses a transmission rate (or modulation/coding scheme) that maximizes the expected immediate throughput for that user. This rate‑adaptation problem is convex and admits a closed‑form solution because the expected instantaneous rate is a monotone function of the belief. In the second stage, the scheduler must decide which user to serve in each slot so as to maximize the long‑term sum‑throughput of the downlink. This decision problem is a partially observable Markov decision process (POMDP) with the classic exploration‑vs‑exploitation trade‑off: serving a user yields immediate reward but also provides fresh feedback that improves future belief estimates, while leaving a user idle allows the belief to evolve passively according to the Markov transition matrix.

Solving the POMDP exactly is computationally prohibitive because the belief space grows exponentially with the number of users. To obtain a tractable solution, the authors cast the problem as a Restless Multi‑armed Bandit (RMB) process. Each user corresponds to an arm that can be either “active” (scheduled) or “passive” (not scheduled). Even when passive, the arm’s state (the belief) continues to evolve, which is the hallmark of a restless bandit. The authors then perform a Whittle indexability analysis. By introducing a Lagrange multiplier λ that penalizes activation, they show that for each belief level there exists a threshold λ* such that the optimal action switches from passive to active. Crucially, they prove that the threshold is monotone in the belief, which establishes Whittle indexability for the downlink scheduling problem under imperfect CSI.

Having proved indexability, the authors derive an explicit closed‑form expression for the Whittle index of each user. The index is essentially the immediate expected rate (a function of the belief) plus a correction term that captures the future value of information gained by scheduling the user. Computing the index requires only the current belief and the known Markov transition probabilities, making it an O(1) operation per user. The scheduling policy then simply selects the user with the highest Whittle index in each slot.

Extensive simulations with 4–10 users, varying Markov transition probabilities (i.e., channel memory strength), and different feedback error rates demonstrate that the Whittle‑index policy achieves 98–99 % of the throughput of the optimal DP‑based policy, which is computationally infeasible in practice. Moreover, compared with conventional “max‑weight” or “channel‑state‑based” schedulers that assume perfect CSI, the proposed scheme yields 10–20 % higher sum‑throughput, especially when the channel exhibits strong memory. The performance gap to the optimal policy shrinks as the channel memory increases because the belief updates become more accurate.

In summary, the paper makes three major contributions: (1) a Bayesian belief‑state update that leverages Markov channel memory and ARQ feedback; (2) a rate‑adaptation step that maximizes expected immediate throughput given the belief; and (3) a rigorous Whittle indexability proof leading to a simple, closed‑form index‑based scheduling rule that is near‑optimal in long‑term sum‑throughput. The work demonstrates that even with incomplete CSI, exploiting channel memory can provide substantial gains, and that these gains can be captured by an easy‑to‑implement index policy suitable for real‑time deployment in modern cellular systems.

💡 Research Summary

📜 Original Paper Content