Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals

Reading time: 5 minute
...

📝 Original Info

  • Title: Poisson-MNL Bandit: Nearly Optimal Dynamic Joint Assortment and Pricing with Decision-Dependent Customer Arrivals
  • ArXiv ID: 2602.16923
  • Date: 2026-02-18
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았으므로, 저자 리스트는 원문을 참고하시기 바랍니다. **

📝 Abstract

We study dynamic joint assortment and pricing where a seller updates decisions at regular accounting/operating intervals to maximize the cumulative per-period revenue over a horizon $T$. In many settings, assortment and prices affect not only what an arriving customer buys but also how many customers arrive within the period, whereas classical multinomial logit (MNL) models assume arrivals as fixed, potentially leading to suboptimal decisions. We propose a Poisson-MNL model that couples a contextual MNL choice model with a Poisson arrival model whose rate depends on the offered assortment and prices. Building on this model, we develop an efficient algorithm PMNL based on the idea of upper confidence bound (UCB). We establish its (near) optimality by proving a non-asymptotic regret bound of order $\sqrt{T\log{T}}$ and a matching lower bound (up to $\log T$). Simulation studies underscore the importance of accounting for the dependency of arrival rates on assortment and pricing: PMNL effectively learns customer choice and arrival models and provides joint assortment-pricing decisions that outperform others that assume fixed arrival rates.

💡 Deep Analysis

📄 Full Content

Assortment (what products to offer) and pricing (at what prices) are among the central decision problems in revenue management. In many retail and platform settings, these decisions are made at regular "accounting" intervals-daily, weekly, or on other operational cycles (Ma et al. 2018, Brown et al. 2023, Aparicio et al. 2023). Over such a period, the offered assortment and prices influence revenue through two channels: how many customers arrive and what those customers purchase. Individual customer purchase behavior is often modeled via discrete choice models, most notably the multinomial logit (MNL) models, focusing on optimizing expected revenue per arriving customer, while the arrival process is typically taken as fixed. In practice, however, customer arrivals are affected by assortment and pricing decisions: a more attractive selection or more competitive prices can pull in additional traffic (Kahn 1995, Wang 2021). Therefore, models that ignore such decision-dependent arrivals can lead to suboptimal decisions for per-period revenue maximization.

In this paper, we incorporate customer arrivals into the choice-based modeling for the dynamic joint assortment-pricing problem. Our goal is to sequentially determine assortment and pricing decisions {S t , p t } T t=1 among N products to maximize cumulative expected revenue over a time horizon T . Under a classical MNL model, given an assortment set S ⊆ [N ] = {1, 2, . . . , N } and prices p = {p j } j∈S , an arriving customer purchases product j ∈ S with probability q j (S, p) = exp(v j -p j ) 1+ k∈S exp(v k -p k ) , where v j denotes the intrinsic value of product j and can be further contextualized by d z -dimensional product features z. Consequently, the classical approach aims at maximizing the expected customer-wise cumulative reward based on the per-customer expected reward j∈S q j (S, p)p j . However, a key aspect is overlooked: the number of customer arrivals in each period is random and depends on the assortment-pricing decision. To capture the effect of assortment and pricing on arrivals, we model the number of arrivals in each period as a Poisson count with mean proportional to a decision-dependent arrival rate λ(S, p), i.e., the arrival rate is a function of the assortment S and prices p. Combining the arrival model and choice model together, given an assortment S and prices p, the conditional mean of the per-period reward for this period is:

This intuitive form highlights the coupling between per-customer revenue and per-period revenue induced by an assortment-pricing decision: a decision can improve conversion or margins yet reduce customer arrivals enough to lower total period revenue; conversely, one that appears worse on a per-customer revenue can be optimal if it attracts substantially more customers. For example, discounting a “magnet” product may reduce sales of other products through substitution, but still increase overall revenue by drawing in more arrivals during the period; similarly, expanding the assortment by adding an item may dilute purchase probabilities, yet increase arrivals by making the offer set more attractive. These examples underscore that the effect of assortment and pricing on arrivals can take many forms. The key, then, is to develop a flexible model for the arrival rate that captures the dependency of arrivals on the assortment-pricing decision. To this end, instead of confining this dependency to a specific functional form, we parametrize λ(S, p) through a rich set of basis functions of (S, p), allowing the model to be expressive enough to accommodate a wide range of dependence structures.

Since neither customer arrival nor choice behavior is known a priori, both must be learned from data generated under past decisions while maximizing cumulative reward over time. Specifically, in each period, the firm chooses an assortment and prices, observes the realized customer arrivals and purchases, and updates its inference on the model parameters using all available observations. Such an online learning problem falls under the umbrella of bandit problems (Lattimore and Szepesvári 2020), where a key challenge is to balance “exploration” and “exploitation” over action space, i.e., feasible assortments and prices, so as to maximize cumulative reward, or equivalently, minimize cumulative regret relative to an oracle policy that knows the model parameters and always selects the optimal action. We propose a new algorithm, Poisson-MNL (PMNL), which jointly learns the parameters of the Poisson arrival model and the MNL choice model and provides dynamic assortment and pricing decisions using an upper confidence bound (UCB) approach. We establish regret bounds showing that PMNL is optimal up to a logarithmic factor in the time horizon T and show in simulation studies that it outperforms benchmarks that assume a fixed arrival rate when customer arrival is influenced by the offered assortment and pricing.

Let us summarize some of

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut