Multiuser Scheduling in a Markov-modeled Downlink using Randomly Delayed ARQ Feedback

We focus on the downlink of a cellular system, which corresponds to the bulk of the data transfer in such wireless systems. We address the problem of opportunistic multiuser scheduling under imperfect channel state information, by exploiting the memory inherent in the channel. In our setting, the channel between the base station and each user is modeled by a two-state Markov chain and the scheduled user sends back an ARQ feedback signal that arrives at the scheduler with a random delay that is i.i.d across users and time. The scheduler indirectly estimates the channel via accumulated delayed-ARQ feedback and uses this information to make scheduling decisions. We formulate a throughput maximization problem as a partially observable Markov decision process (POMDP). For the case of two users in the system, we show that a greedy policy is sum throughput optimal for any distribution on the ARQ feedback delay. For the case of more than two users, we prove that the greedy policy is suboptimal and demonstrate, via numerical studies, that it has near optimal performance. We show that the greedy policy can be implemented by a simple algorithm that does not require the statistics of the underlying Markov channel or the ARQ feedback delay, thus making it robust against errors in system parameter estimation. Establishing an equivalence between the two-user system and a genie-aided system, we obtain a simple closed form expression for the sum capacity of the Markov-modeled downlink. We further derive inner and outer bounds on the capacity region of the Markov-modeled downlink and tighten these bounds for special cases of the system parameters.

💡 Research Summary

The paper tackles the downlink scheduling problem in a cellular system where the base station (BS) must serve multiple users under imperfect channel state information (CSI). Each user’s wireless channel is modeled as a two‑state (ON/OFF) Markov chain with transition probabilities p = Pr(ON→ON) and q = Pr(OFF→OFF). After each transmission the scheduled user returns an ARQ acknowledgment (ACK) or negative acknowledgment (NACK). Crucially, this feedback does not arrive instantly; it is delayed by a random, i.i.d. amount D that is independent across users and time slots. Consequently, the BS can only infer the channel indirectly through a history of delayed ACK/NACK signals.

The authors formulate the scheduling problem as a partially observable Markov decision process (POMDP). The belief state for user i at time t, denoted π_i(t), is the conditional probability that the channel is ON given all feedback received up to t. Beliefs are updated using the Markov transition matrix whenever new feedback arrives; otherwise they evolve according to the Chapman‑Kolmogorov equations. The action space consists of selecting a single user for transmission in each slot, and the instantaneous reward is 1 if the transmission succeeds (i.e., the channel is ON) and 0 otherwise. The objective is to maximize the long‑term average sum throughput, i.e., the expected average reward over an infinite horizon.

Two‑user case (K = 2).
For the special case of two users, the paper proves that a greedy policy—always scheduling the user with the highest current belief—achieves the optimal sum throughput for any distribution of the feedback delay D. The proof hinges on two observations: (1) the immediate expected reward is maximized by picking the larger belief, and (2) because the two belief processes evolve independently and the Markov chain is time‑invariant, the greedy choice also maximizes the expected future reward. By showing that the greedy action satisfies the Bellman optimality equation, the authors establish global optimality. This result is notable because it holds without any restriction on the delay statistics, making the policy robust to arbitrary feedback latency.

More than two users (K ≥ 3).
When the number of users exceeds two, the greedy policy is no longer universally optimal. The authors construct explicit counter‑examples where selecting the highest‑belief user now leads to a lower expected cumulative reward than a non‑greedy choice that sacrifices immediate gain for better future beliefs. Numerical simulations confirm that the performance gap between the greedy and the true optimal policy is modest—typically 1–3 % of the maximum sum throughput—indicating that the greedy rule is near‑optimal while being dramatically simpler to implement.

Implementation without statistical knowledge.
A key practical contribution is an algorithm that implements the greedy policy without requiring knowledge of the Markov transition probabilities (p, q) or the delay distribution of D. The BS only needs to keep track of the most recent ACK/NACK for each user and apply a deterministic belief‑evolution rule based on elapsed time. This “statistics‑agnostic” approach makes the scheduler robust against model misspecification and adaptive to changing environments.

Capacity analysis.
For the two‑user system the authors establish an equivalence with a genie‑aided scenario in which the BS instantly knows the exact channel states. Leveraging this equivalence, they derive a closed‑form expression for the sum capacity: \