Analyzing the Shopping Journey Computing Shelf Browsing Visits in a Physical Retail Store
đ Original Paper Info
- Title: Analyzing the Shopping Journey Computing Shelf Browsing Visits in a Physical Retail Store- ArXiv ID: 2601.00928
- Date: 2026-01-02
- Authors: Luis Yoichi Morales, Francesco Zanlungo, David M. Woollard
đ Abstract
Motivated by recent challenges in the deployment of robots into customer-facing roles within retail, this work introduces a study of customer activity in physical stores as a step toward autonomous understanding of shopper intent. We introduce an algorithm that computes shoppers' ``shelf visits'' -- capturing their browsing behavior in the store. Shelf visits are extracted from trajectories obtained via machine vision-based 3D tracking and overhead cameras. We perform two independent calibrations of the shelf visit algorithm, using distinct sets of trajectories (consisting of 8138 and 15129 trajectories), collected in different stores and labeled by human reviewers. The calibrated models are then evaluated on trajectories held out of the calibration process both from the same store on which calibration was performed and from the other store. An analysis of the results shows that the algorithm can recognize customers' browsing activity when evaluated in an environment different from the one on which calibration was performed. We then use the model to analyze the customers' ``browsing patterns'' on a large set of trajectories and their relation to actual purchases in the stores. Finally, we discuss how shelf browsing information could be used for retail planning and in the domain of human-robot interaction scenarios.đĄ Summary & Analysis
1. **Key Contribution**: This study develops an algorithm to analyze customer behavior in physical stores, enabling robots to autonomously understand shopper intent. 2. **Simplified Explanation**: The research is like observing which products customers are interested in at a store and using that information to guide robots to assist them better. 3. **Sci-Tube Style Script**: - **Beginner Level**: Analyze customer behavior in physical stores to help robots identify what products customers are interested in. - **Intermediate Level**: Use machine vision-based algorithms to track customersâ âshelf visitâ patterns and provide automated assistance via robots using this information. - **Advanced Level**: The study analyzes customer trajectories to understand the correlation between browsing behaviors and purchasing patterns, enabling robots to autonomously support customers based on their interests.đ Full Paper Content (ArXiv Source)
Motivated by recent challenges in the deployment of robots into customer-facing roles within retail, this work introduces a study of customer activity in physical stores as a step toward autonomous understanding of shopper intent. We introduce an algorithm that computes shoppersâ âshelf visitsâ â capturing their browsing behavior in the store. Shelf visits are extracted from trajectories obtained via machine vision-based 3D tracking and overhead cameras. We perform two independent calibrations of the shelf visit algorithm, using distinct sets of trajectories (consisting of 8138 and 15129 trajectories), collected in different stores and labeled by human reviewers. The calibrated models are then evaluated on trajectories held out of the calibration process both from the same store on which calibration was performed and from the other store. An analysis of the results shows that the algorithm can recognize customersâ browsing activity when evaluated in an environment different from the one on which calibration was performed. We then use the model to analyze the customersâ âbrowsing patternsâ on a large set of trajectories and their relation to actual purchases in the stores. Finally, we discuss how shelf browsing information could be used for retail planning and in the domain of human-robot interaction scenarios.
Introduction
Despite the potential to increase customer satisfaction, improve store performance and grow sales, one of the main limiting factors to the introduction of robots in customer-facing roles is the difficulty of enabling autonomous behavior . While AI promises to revolutionize the ability of robots to have open-world understanding, the general consensus is that such a revolution is far away and retailers should be conservative . Current limitations of customer-facing service robots in retail can be illustrated in Softbankâs shift from Pepper, a customer service robot designed to directly interact with customers that met with limited success, to Whiz, a robot more specifically designed for non customer-facing roles such as cleaning .
A key component of missing functionality is the ability of robots to quickly ascertain shopper intent in order to directly address acute customer needs without significant assistance. While shopper intent can be considered as a complex open-world problem, we can look at parallels between shopper behaviors (and the measurement of those behaviors) in e-commerce settings for possible approaches to developing more understanding.
Data concerning âbrowsing activityâ are crucial in the analysis and modeling of e-commerce, but have been neglected in the performance analysis of physical retail. Nevertheless, customers of physical stores do perform parallel actions to those of online browsing. For example, shoppers can spend longer or shorter time in front of a shelf, check different products, pick up products, and then proceed to buy only some of them (if any). It is evident that this âbrowsing activityâ correlates with customersâ purchases, and thus is extremely valuable information for both store management and Consumer Packaged Goods (CPG) manufacturers who are interested not only in their productsâ sales performance, but also in how shoppers perceive and interact with products in stores.
A considerable amount of research into physical store optimization has focused on more readily available data, including the analysis of sales , and at building models to optimize store layouts and product placement in order to increase revenue.
Despite these efforts, an obvious gap in physical retail analytics is in measuring the sales conversion funnel (i.e., what happens between when a shopper enters a store and when they make their purchases). We argue that, in understanding a shopperâs browsing activities in physical stores, we can start to bridge the gap in autonomy of robots by providing more targeting and meaningful support to shoppers, leading to better outcomes, both in terms of customer support and increased sales.
In this work we introduce an algorithm that analyzes the trajectories of shoppers moving in a physical store and extracts their âshelf visitsâ â i.e., records of when, where and how long customers stop in front of shelves. We use the algorithm to analyze the customersâ âbrowsing patternsâ and their relation with âbuying patternsâ, and discuss possible applications, limitations and future work.
Related Works
Related works regarding shopper behavior analysis in physical retail stores may be divided in two main categories, estimation of shopper paths given a shopping list and retail shelf placement optimization.
Previous works have proposed methods to estimate the path that shoppers take in a store given a shopping list and the impact of that path on purchases. For example, proposes to estimate profit computing the average impulse purchase on shelf end caps, while estimates traffic at a shelf based on its location in order to compute impulse buying. Impulse buying in particular is important to study as store revenue is highly impacted by unplanned purchases , so optimization of promotional product placement is a key factor is overall store performance.
In recent years, the retail industry has become aware of the benefits of shelf layout optimization techniques based on available retail data , and several studies have focused on applying shelf space allocation models in practice . Furthermore, numerical studies applying shelf space optimization have shown that cross-space elasticity has limited impact in retail profit . A state-of-the-art literature review of the retail shelf planning problem is given in . Layout designs which improve store revenue have been previously proposed in and the relation between shelf layout and marketing effectiveness and sales is studied in . Also in the field of robotics some retail-oriented applications have been proposed. A work on digital twin models for retail logistics was presented in and robot data collection for store modeling is presented in .
To the extent of our knowledge, previous works have presented neither models nor analysis of shelf browsing activity which is the focus on this work.
Data collection and processing
Target Stores
For this study we used data from two convenience stores (see Fig. 1). Store $`s1`$ has an area of $`87.39`$ m$`^2`$ and has $`n_{s,1}=19`$ shelves and $`n_{e,1}=2`$ exits/entrances. Store $`s2`$ has an area of $`109.16`$ m$`^2`$ and has $`n_{s,2}=50`$ shelves and $`n_{e,2}=2`$ entrances/exits.
style="width:70.0%" />
In both stores all entrances can be also used as exits, and our trajectory analysis system records if and when they are used as entrances or exits by the customers (i.e. a visit to the same entrance/exit geometrical area is recorded as a visit to an âentranceâ at the beginning of a trajectory tracking, and as a visit to an âexitâ at the end).
Store Map
We build and use a high density colored 3D point cloud map of the store. This map is used to place cameras, compute their coverage and extract the layout of the store. The store layout contains the perimeter, entrances, exits, and shelves. Shelves are represented as a set of 3D vertices and a normal vector that represents the interactive shelf face (i.e., the face displaying the products). Each shelf has only 1 interactive face. The algorithm proposed in this work takes into consideration only the projection on the 2D floor of the shelvesâ interactive faces and of obstacles that can obstruct the customers vision, such as the shelvesâ non-interacting faces. We consider these 2D projections as segments, and we may denote the segment corresponding to the $`j`$th shelf as $`\mathbf{s}_j`$, with $`j \in 1,\hdots, n_s`$, and the segment corresponding to the $`k`$-th obstacle as $`\mathbf{s}_{n_s+k}`$.
People Tracking System
We track shopper movement using Standard AIâs Vision ML platform. 2D pose detection models, each running at 10 FPS on a calibrated set of cameras located on the ceiling of the store (similar to a typical security cameras setup) are synthesized into three-dimensional poses based on triangulation of the 2D poses from the separate cameras. We use the neck key point to track peopleâs centroids.
Data processing
We define a trajectory as a set of time-ordered arrays including the position and body orientation of a person from the entrance to the exit of the store. The sampling rate of our tracking system is $`10Hz`$. We low pass filter the trajectory points before computing the velocity as described below.
Although our tracking system provides a larger amount of data (such as acceleration, hand key points, etc.) that is stored and analyzed for other purposes and future developments, the proposed algorithm uses only the following information.
The 2D vector
\begin{equation}
\label{eq:posvector}
\mathbf{x}_i(t_k),
\end{equation}
represents the 2D projection on the storeâs floor of customerâs $`i`$ (3D) centroid at the $`k`$th time stamp,
\begin{equation}
t_k= k \Delta_t,
\end{equation}
$`\Delta_t=0.1`$ being the tracking time step, while the angle
\begin{equation}
\label{eq:orangle}
\theta_i(t_k)
\end{equation}
identifies customerâs $`i`$ body orientation (the forward normal to the 2D projection of the line connecting the shoulders, where the position of the shoulders and the bodyâs forward are identified using the aforementioned 3D tracking system) through the normal unit vector
\begin{equation}
\mathbf{n}_i(t_k)=(\cos(\theta_i(t_k),\sin(\theta_i(t_k)).
\end{equation}
Velocity is defined as
\begin{equation}
\mathbf{v}_i(t_k)= \frac{\mathbf{x}_i(t_{k+1})-\mathbf{x}_i(t_{k-1})}{2 \Delta_t}.
\end{equation}
Although this is a 2D vector in the following algorithm we only use its norm (speed) $`v_i(t_k)`$.
Shelf Stop Algorithm
This section explains our heuristic algorithm to determine if a shelf stop happened within a trajectory. The algorithm is relatively simple, to enable calibration on a relatively small set of trajectories, and to allow the algorithm to be used with simpler tracking systems (any tracking system providing the 2D vector $`\mathbf{x}`$ and angle $`\theta`$ of eqs. ([eq:posvector],[eq:orangle]) will suffice).
The main idea is that a âstopâ in front of a shelf $`j`$ is a portion of a customerâ trajectory satisfying the following conditions, defined by 3 parameters: $`T_B`$ (minimum browsing time), $`\Delta_B`$ (maximum distance to shelf) and $`v_B`$ (maximum browsing velocity).
More specifically, for each customer $`i`$ we identify a single candidate shelf (or no candidate) in the following way. We first find all the intersections between the shelvesâ faces or obstacles $`\mathbf{s}_k`$ and the half line defined by a positive multiple of the customerâs orientation vector,
\begin{equation}
\lambda \mathbf{n}_i.
\end{equation}
style="width:70.0%" />
We then identify each such intersection with a shelf or obstacle $`l`$ using the value assumed by $`\lambda`$ at the intersection point,
\begin{equation}
\lambda_{l}>0,
\end{equation}
and look for the segment corresponding to the minimum distance,
\begin{equation}
j=\text{argmin}_l \lambda_l.
\end{equation}
If the minimum distance corresponds to an interactive shelf face,
\begin{equation}
j\in 1,\hdots,n_s,
\end{equation}
such shelf is identified as the candidate (otherwise, there is no candidate shelf).
This computation provides us also with the distance to the shelf, simply defined as $`\lambda_j`$.
A stop is then defined as a time interval
\begin{equation}
t_k\in [t_s,t_f], \; t_f-t_s\geq T_B
\end{equation}
during which for all $`k`$
-
the candidate shelf $`j`$ does not change,
-
the distance to the shelf satisfies
MATH\begin{equation} \lambda_j(t_k)\leq\Delta_B, \end{equation}Click to expand and view more -
and the customerâs velocity is smaller than a threshold value
MATH\begin{equation} v_i(t_k)\leq v_B. \end{equation}Click to expand and view more
For each customer $`i`$, shelf $`j`$ and time $`k`$ the algorithm produces a Boolean output
\begin{equation}
\label{eq:output}
S^i_j(t_k)
\end{equation}
assuming a value of 1 if the above conditions are satisfied, and 0 otherwise.
Algorithm Parameter Calibration
The values of the 3 parameters ($`T_B`$, $`\Delta_B`$ and $`v_B`$) are optimized through a calibration process based on human labeling.
We built two calibration sets, the first consisting of $`n_1=279`$ trajectories from $`s1`$, and the second consisting of $`n_2=270`$ trajectories from $`s2`$. For each trajectory we produced a $`2D`$ video including position, velocity and orientation information, that human reviewers then used to identify and encode the shelf browsing behavior.
$`n_l=4`$ human reviewers analyzed the full trajectory sets, and identified, based on their understanding of browsing behavior, when customer $`i`$ performed a stop in front of shelf $`j`$. Based on this labeling, each time stamp $`k`$ of customer $`i`$ receives a âvisit booleanâ to shelf $`j`$ defined by a voting system.
Namely, defining $`n^i_j(t_k)`$ as the number of reviewers that identified $`i`$ as stopping in front of $`j`$ at time $`k`$, the visit boolean is
\begin{equation}
V^i_j(t_k)=\begin{cases}
1, \text{ if } n^i_j(t_k)>n_l/2,\\
0, \text{ if } n^i_j(t_k)\leq n_l/2.
\end{cases}
\end{equation}
We may then define the number of true positives as
\begin{equation}
TP=\#\left(k: S^i_j(t_k)=V^i_j(t_k)=1\right),
\end{equation}
where $`S^i_j(t_k)`$ in the output of the algorithm (eq. [eq:output]); the number of false positives as
\begin{equation}
FP=\#\left(k: S^i_j(t_k)\neq V^i_j(t_k)=0\right),
\end{equation}
and the the number of false negatives as
\begin{equation}
FN=\#\left(k: S^i_j(t_k)\neq V^i_j(t_k)=1\right).
\end{equation}
We define precision as
\begin{equation}
P=\frac{TP}{TP+FP},
\end{equation}
and recall as
\begin{equation}
R=\frac{TP}{TP+FN}.
\end{equation}
The calibration process is based on finding the parameter values that maximize the $`F_1`$ score, defined as
\begin{equation}
F_1=\frac{2PR}{P+R}.
\end{equation}
We performed two independent calibrations. The calibration process based on the $`s1`$ calibration set provided the following parameter values, corresponding to a maximum value of $`F_1\approx 0.86`$: $`T_I=2`$ s, $`\Delta_I\approx 1.2`$ m, $`v_I\approx 0.55`$ m/s.
On the other hand, in the $`s2`$ calibration test a value of $`F_1\approx 0.89`$ was attained, corresponding to the following parameter values: $`T_I=1.7`$ s, $`\Delta_I\approx 1.55`$ m, $`v_I\approx 0.48`$ m/s.
The value of $`\Delta_I`$ is larger on $`s2`$, most likely due to the different shop geometry.
Algorithm Evaluation
We performed same-store evaluation by randomly selecting a fraction $`p`$ of the trajectories and use it to calibrate the model, and then run the algorithm using the optimized parameters on the remaining trajectories and compute the corresponding $`F_1`$ score. The results concerning $`s2`$ are shown in Fig. 3.
/>
Inter-store evaluation is performed again by randomly selecting a fraction $`p`$ of the trajectories and using them to calibrate the model. The calibrated model is then evaluated on the full trajectory set of the other store.
The $`F_1`$ score obtained by calibrating on $`s1`$ and evaluating on $`s2`$ is $`0.84`$ and the score obtained by calibrating on $`s2`$ and evaluating on $`s1`$ is $`0.87`$.
Analysis of Browsing Behavior
We perform two kinds of analyses of browsing behavior; one on the largest set of trajectories available for $`s1`$ and $`s2`$, and one on a reduced set of $`s1`$ trajectories for which purchasing activity data are available.
Large set analysis, $`s1`$
We analyze $`N_1=8138`$ trajectories for $`s1`$, where $`n_{s1}=19`$ shelves are available. For each trajectory, we build a binary $`n_{s1}`$ dimensional vector, whose entries are 1 if the shopper visited the corresponding shelf, 0 otherwise. By averaging over all trajectories, we obtain an average number of visits per trip of $`2.54`$. See Fig. 4 for shelf visit counts.
Large set analysis $`s2`$
We use the same method to analyze $`N_2=15129`$ trajectories for $`s2`$, where $`n_{s2}=50`$ shelves are available, obtaining an average number of visits per trip of $`2.82`$. See Fig. 5 for shelf visit counts.
/>
/>
Comparison to purchase activity on $`s1`$
For a subset of $`473`$ trajectories of $`s1`$ we have matched the trajectory to the shoppersâ purchase activity (as provided by the retailersâ transaction logs from their point-of-sale system). We can thus compare the average visit vector built from these trajectories with the corresponding average number of purchases per shelf. By taking the ratio between the average purchase per shelf and the average visit per shelf, we can obtain a vector of visit/purchase conversion rates, shown in Fig 8. Although this is a very preliminary analysis, it shows the potential that our approach may have in highlighting how visits to different shelves influence purchases and ultimately provide insight into overall store performance.
/>
Conclusions and Future Works
This paper presented an approach to compute shelf visits from trajectories of shoppers in physical retail stores. The calibration of the model using human-labeled data was discussed and the evaluation of the approach using two different sets of trajectories was presented. The calibrated model was used to analyze browsing patterns and the relation between these patterns and shopper purchases. Though this work was done in convenience stores, the same approach applies to other formats.
Future extensions of our research will involve the refinement of the model to detect more complex shopper behaviors, introduction of machine learning and other techniques to increase accuracy of browsing detection, and more targeted understanding of which products the shopper focused on during shelf browsing through analysis of gaze and head orientation. By honing in on shoppersâ intent, we can make any interventions more targeted and meaningful, including interaction with service robots whose task is to provide useful and practical help to customers .
Additionally, with knowledge of shopping patterns in the store, a robot system could also provide real time recommendations and advertise different product categories to shoppers based on their trajectories and previous shelf-browsing history. Similar to click histories in e-commerce, understanding shopper intent in physical stores is a significant factor in personalization of product recommendations. While this work presupposes a service robot use-case, these same capabilities (both shopper assistance and personalized recommendations) could be surfaced to shoppers through mobile applications and even in-store media displays.
Finally, we can derive insights into how different layouts might influence shopper engagement with products. This capability will allow a data-driven approach to store layout optimization, potentially revolutionizing the way retailers think about product placement and shelf organization to maximize customer interaction and, ultimately, increase revenue.
đ ë ŒëŹž ìê°ìëŁ (Figures)





