The predictability of consumer visitation patterns
We consider hundreds of thousands of individual economic transactions to ask: how predictable are consumers in their merchant visitation patterns? Our results suggest that, in the long-run, much of our seemingly elective activity is actually highly predictable. Notwithstanding a wide range of individual preferences, shoppers share regularities in how they visit merchant locations over time. Yet while aggregate behavior is largely predictable, the interleaving of shopping events introduces important stochastic elements at short time scales. These short- and long-scale patterns suggest a theoretical upper bound on predictability, and describe the accuracy of a Markov model in predicting a person’s next location. We incorporate population-level transition probabilities in the predictive models, and find that in many cases these improve accuracy. While our results point to the elusiveness of precise predictions about where a person will go next, they suggest the existence, at large time-scales, of regularities across the population.
💡 Research Summary
This paper investigates how predictable individual consumers are in their patterns of visiting merchants, using massive credit‑card transaction records from two major financial institutions—one in North America (≈5 × 10⁷ accounts over six months) and one in Europe (≈4 × 10⁶ accounts over eleven months). After rigorous filtering to retain only genuine personal cards (minimum 10 and maximum 50 unique stores per month, 50–120 purchases per month), the authors obtain time‑ordered sequences of store visits for tens of thousands of shoppers.
To quantify predictability, two entropy measures are employed. Temporally‑uncorrelated (TU) entropy is computed solely from the marginal visitation frequencies p_i of each store, S_TU = −∑ p_i log p_i, and therefore ignores ordering. Sequence‑dependent (SD) entropy incorporates the compressibility of the full visit sequence using a Lempel‑Ziv approximation of Kolmogorov complexity; repeated subsequences lower the entropy, reflecting higher order‑level regularity. Both entropies display narrow distributions across the populations, and, notably for the credit‑card data, TU and SD entropies are very close, indicating that short‑term reordering of visits does not dramatically affect overall information content.
The authors also examine the Zipf law governing store visitation frequencies: the probability that a store ranks r in a shopper’s personal list follows P(r) ∝ r^−α, with α≈0.80 (North America) and α≈1.13 (Europe). This scaling holds regardless of the total number of stores visited, suggesting a universal regularity in how consumers allocate attention across merchants.
For dynamic prediction, first‑order Markov chain models are built for each individual. Transition probabilities p_{ij}=Pr(next store=j | current store=i) are estimated from training windows ranging from one to six months, and predictions are tested on subsequent one‑to‑four‑month periods. Several key findings emerge: (1) extending the training period yields only marginal gains in accuracy; (2) when training data are limited (< 3 months), a naïve frequentist model—choosing among the top‑visited stores according to the empirical distribution—outperforms the Markov model; (3) seasonal effects are evident, with lower prediction rates in summer and December.
A “global” Markov model, aggregating transition matrices across all shoppers, achieves slightly higher mean accuracy (≈25–27 %) than either the individual Markov or naïve models, but its performance varies considerably (standard deviation ≈ 3.6 %) depending on the sampled subset of users. This indicates that while shared merchant transitions provide modest additional predictive power, individual spontaneity remains dominant.
The paper emphasizes that entropy, while useful for comparing populations, does not fully capture the nuanced notion of predictability. High TU entropy can coexist with highly regular daily routines, and SD entropy is sensitive to the temporal window chosen for analysis. Moreover, the stochastic interleaving of visits at short time scales—e.g., swapping the order of a grocery run and a post‑office stop—limits the effectiveness of both Markov and frequency‑based forecasts.
In conclusion, consumer visitation behavior exhibits strong long‑term regularities (Zipf scaling, low TU entropy) that make it broadly predictable over extended horizons. However, at the granularity of days or weeks, the sequence of visits is highly variable, and even perfect knowledge of transition probabilities does not substantially outperform a simple “most‑frequent‑store” heuristic. The findings suggest that models of human mobility must account for the dual nature of choice (introducing unpredictability) and necessity (imposing regularity), and that predictability should be evaluated with respect to the relevant temporal scale.
Comments & Academic Discussion
Loading comments...
Leave a Comment