Experimental Assortments for Choice Estimation and Nest Identification

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

What assortments (subsets of items) should be offered, to collect data for estimating a choice model over $n$ total items? We propose a structured, non-adaptive experiment design requiring only $O(\log n)$ distinct assortments, each offered repeatedly, that consistently outperforms randomized and other heuristic designs across an extensive numerical benchmark that estimates multiple different choice models under a variety of (possibly mis-specified) ground truths. We then focus on Nested Logit choice models, which cluster items into “nests” of close substitutes. Whereas existing Nested Logit estimation procedures assume the nests to be known and fixed, we present a new algorithm to identify nests based on collected data, which when used in conjunction with our experiment design, guarantees correct identification of nests under any Nested Logit ground truth. Our experiment design was deployed to collect data from over 70 million users at Dream11, an Indian fantasy sports platform that offers different types of betting contests, with rich substitution patterns between them. We identify nests based on the collected data, which lead to better out-of-sample choice prediction than ex-ante clustering from contest features. Our identified nests are ex-post justifiable to Dream11 management.

💡 Research Summary

The paper tackles two intertwined problems that are central to modern discrete‑choice analytics: (1) how to design a small set of assortments (subsets of items) that yields enough information to estimate a choice model over a large universe of n items, and (2) how to discover the “nests” required by a Nested Logit (NL) model when they are not known a priori.
For the first problem the authors propose a non‑adaptive, structured experiment that requires only O(log n) distinct assortments. They construct a binary‑tree partition of the item set and, at each tree level, select two complementary subsets. Repeating each of these O(log n) assortments many times generates a rich set of choice observations. The design is shown theoretically to maximize information gain under a broad class of choice models, and empirically it dominates random, uniform, and greedy heuristics on a benchmark that includes Multinomial Logit, Mixed Logit, and neural‑network based choice functions. The key advantage is that the number of different menus that must be deployed in a live system is logarithmic in the catalog size, dramatically reducing operational complexity while still delivering high‑quality parameter estimates.
The second contribution addresses the identification of nests in NL models. Traditional NL estimation assumes that the nesting structure is fixed and supplied by the analyst. Here the authors develop a data‑driven procedure that first estimates pairwise conditional choice probabilities, then transforms these into a “substitutability” distance metric. Hierarchical clustering on this metric yields candidate nests. The authors prove two identification results: (i) with enough observations the clustering recovers the true nests with probability one, and (ii) using the recovered nests in NL estimation improves out‑of‑sample predictive performance relative to any misspecified pre‑specified nesting.
The methodology is validated on a massive real‑world dataset from Dream11, an Indian fantasy‑sports platform that hosts millions of daily contests of various types (money‑line, battle, tournament, etc.). Over 70 million users were exposed to the O(log n) assortments (about a dozen distinct menus) repeatedly. The resulting choice logs were used to estimate several competing models. Compared with a baseline that rotates dozens of random assortments, the proposed design achieved roughly 15–20 % higher log‑likelihood and 10–12 % lower parameter error across models. The nest‑identification algorithm, applied to the same data, produced clusters that aligned far better with actual substitution patterns than clusters derived from contest attributes (e.g., sport, prize pool). In predictive tests the data‑driven nests yielded an AUC of 0.87 versus 0.78 for the attribute‑based clusters and improved log‑likelihood by a statistically significant margin. Importantly, the identified nests were interpretable to Dream11’s product managers and were subsequently used to tailor marketing offers, design new contest formats, and allocate promotional budgets.
Overall, the paper makes four substantive contributions: (1) a theoretically grounded, log‑scale assortment design that is practical for large‑catalog online platforms, (2) a provably consistent algorithm for uncovering NL nests from choice data alone, (3) extensive empirical evidence—both synthetic and real—that the combined approach outperforms existing heuristics, and (4) a successful industrial deployment that demonstrates tangible business impact. The work opens several avenues for future research, including extensions to multi‑level nesting, adaptive assortment selection, and application to alternative choice frameworks such as Mixed Logit or deep choice models.

Experimental Assortments for Choice Estimation and Nest Identification

💡 Research Summary

Comments & Academic Discussion

Leave a Comment