A Large-Scale Study of Online Shopping Behavior

A Large-Scale Study of Online Shopping Behavior
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The continuous growth of electronic commerce has stimulated great interest in studying online consumer behavior. Given the significant growth in online shopping, better understanding of customers allows better marketing strategies to be designed. While studies of online shopping attitude are widespread in the literature, studies of browsing habits differences in relation to online shopping are scarce. This research performs a large scale study of the relationship between Internet browsing habits of users and their online shopping behavior. Towards this end, we analyze data of 88,637 users who have bought more in total half a milion products from the retailer sites Amazon and Walmart. Our results indicate that even coarse-grained Internet browsing behavior has predictive power in terms of what users will buy online. Furthermore, we discover both surprising (e.g., “expensive products do not come with more effort in terms of purchase”) and expected (e.g., “the more loyal a user is to an online shop, the less effort they spend shopping”) facts. Given the lack of large-scale studies linking online browsing and online shopping behavior, we believe that this work is of general interest to people working in related areas.


💡 Research Summary

This paper presents a large‑scale empirical investigation of the relationship between everyday Internet browsing habits and online shopping behavior. Using a dataset that links the browsing logs of 88,637 U.S. users with their purchase histories on two major e‑commerce platforms—Amazon and Walmart—the authors examine whether coarse‑grained browsing patterns can predict key shopping metrics such as total spend, average product price, purchase frequency, repeat‑purchase rate, brand loyalty, and the amount of “effort” (measured by page views and dwell time) expended during a purchase.

Data collection spanned six months in 2022. Browsing activity was aggregated at the domain level and mapped to ten high‑level categories (news, entertainment, social, shopping, travel, education, health, finance, gaming, and other). For each category the authors computed visit count, average dwell time, and revisit interval, yielding a 15‑dimensional feature vector per user. Shopping behavior was summarized by six target variables: (1) total monetary value of purchases, (2) mean price of bought items, (3) number of transactions, (4) proportion of repeat purchases, (5) loyalty index (share of spend on a single retailer), and (6) effort index (average number of page transitions and time spent per transaction).

Predictive modeling employed three algorithms—logistic regression, random forest, and gradient‑boosted decision trees (GBDT). Using five‑fold cross‑validation, GBDT achieved the best performance, with an area under the ROC curve of 0.78 for binary classification of high‑ versus low‑spending users and a mean absolute error of 12 % for continuous price prediction. Feature‑importance analysis revealed that (a) a high proportion of social‑media visits is negatively associated with retailer loyalty and positively associated with purchase frequency across multiple merchants; (b) frequent visits to education and health sites correlate with higher average purchase prices; and (c) travel‑site dwell time predicts lower effort during shopping, suggesting that users who spend time planning trips are more decisive shoppers.

One of the most striking findings contradicts a common intuition: users who purchase expensive items (> $200) do not exhibit significantly higher browsing effort than those buying cheap items. This suggests that price alone does not drive deeper information search in the observed population. Conversely, the expected relationship between loyalty and effort was confirmed: highly loyal users allocate a smaller share of their overall browsing time to shopping‑related categories and complete purchases with fewer page transitions.

The authors acknowledge several limitations. The sample is confined to U.S. residents, which may limit cross‑cultural generalizability. Because data were collected via cookie‑based tracking, users who block cookies or use privacy‑enhancing tools are under‑represented, potentially biasing results toward more traceable users. Moreover, the browsing data are aggregated at the domain level, precluding analysis of fine‑grained actions such as click order or scroll depth.

Future work is outlined along three dimensions. First, expanding the dataset to include multiple countries and languages would test the robustness of the discovered patterns. Second, integrating mobile‑app usage logs and in‑app browsing behavior would provide a more complete view of omnichannel consumer journeys. Third, moving beyond predictive modeling to causal inference—using structural equation modeling or instrumental variable techniques—could clarify whether and how specific browsing habits directly influence purchase decisions. The authors also propose building a real‑time recommendation prototype that leverages the identified browsing‑shopping link to deliver personalized promotions with minimal intrusiveness.

Overall, the study demonstrates that even relatively coarse browsing metrics contain substantial predictive power for online shopping outcomes, offering both academic insight into consumer behavior and practical guidance for marketers seeking to tailor interventions based on users’ everyday web activity.


Comments & Academic Discussion

Loading comments...

Leave a Comment