Explaining Why Things Go Where They Go Interpretable Constructs of Human Organizational Preferences
š Original Paper Info
- Title: Explaining Why Things Go Where They Go Interpretable Constructs of Human Organizational Preferences- ArXiv ID: 2512.24829
- Date: 2025-12-31
- Authors: Emmanuel Fashae, Michael Burke, Leimin Tian, Lingheng Meng, Pamela Carreno-Medrano
š Abstract
Robotic systems for household object rearrangement often rely on latent preference models inferred from human demonstrations. While effective at prediction, these models offer limited insight into the interpretable factors that guide human decisions. We introduce an explicit formulation of object arrangement preferences along four interpretable constructs: spatial practicality (putting items where they naturally fit best in the space), habitual convenience (making frequently used items easy to reach), semantic coherence (placing items together if they are used for the same task or are contextually related), and commonsense appropriateness (putting things where people would usually expect to find them). To capture these constructs, we designed and validated a self-report questionnaire through a 63-participant online study. Results confirm the psychological distinctiveness of these constructs and their explanatory power across two scenarios (kitchen and living room). We demonstrate the utility of these constructs by integrating them into a Monte Carlo Tree Search (MCTS) planner and show that when guided by participant-derived preferences, our planner can generate reasonable arrangements that closely align with those generated by participants. This work contributes a compact, interpretable formulation of object arrangement preferences and a demonstration of how it can be operationalized for robot planning.š” Summary & Analysis
1. **Interpretable formulation of arrangement preferences:** People decide where to place objects based on spatial practicality, habitual convenience, semantic coherence, and commonsense appropriateness. 2. **A measurement tool for the proposed constructs:** A questionnaire measures how strongly each construct influences decisions and verifies their reliability. 3. **Preferences-aligned arrangement generation:** Using Monte Carlo Tree Search (MCTS) planners, arrangements are generated that align with human preferences.This research enables robots to arrange objects in a way similar to humans while also explaining why they make such choices, making the process more transparent and understandable for users.
š Full Paper Content (ArXiv Source)
<ccs2012> <concept> <concept_id>10010520.10010553.10010554</concept_id> <concept_desc>Computer systems organizationĀ Robotics</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10003120.10003121.10003122.10003332</concept_id> <concept_desc>Human-centered computingĀ User models</concept_desc> <concept_significance>500</concept_significance> </concept> <concept> <concept_id>10003120.10003121.10003122.10003334</concept_id> <concept_desc>Human-centered computingĀ User studies</concept_desc> <concept_significance>300</concept_significance> </concept> <concept> <concept_id>10010405.10010455.10010459</concept_id> <concept_desc>Applied computingĀ Psychology</concept_desc> <concept_significance>300</concept_significance> </concept> <concept> <concept_id>10010147.10010178.10010199.10010204</concept_id> <concept_desc>Computing methodologiesĀ Robotic planning</concept_desc> <concept_significance>300</concept_significance> </concept> </ccs2012>
Introduction
Object rearrangement, the problem of organizing items within a space to achieve a desired configuration , is a central challenge for service robots operating in everyday environments. Here, a robot must be capable not only of manipulating objects, but also of deciding where each object should go in a way that aligns with a userās organizational preferences. Human organizational preferences are diverse (e.g. one person may want mugs by the kettle, while another may prefer them in a cabinet) and one-size-fits-all Ā definitions of what an acceptable arrangement is might fail to account for these differences. For robots to be useful in this context, they must be equipped with object rearrangement models that capture the salient criteria behind these preferences and that can adapt to differences across users and scenes, especially in shared environments.
Prior work on the personalization of object rearrangement has aimed to tailor placements to reflect an individual userās subjective spatial preferences rather than a universal notion of tidinessĀ . Abdo et al. predicted user-specific groupings via collaborative filtering, while introduced a framework for learning latent embeddings of tidying style from demonstrations. More recent systems approximate user preferences with zero-shot visual prompting of visionālanguage models , infer them from prior and current scene context , or actively query users when demonstrations are ambiguous . While these methods move beyond a āone-size-fits-allā approach, they do so by implicitly using latent representations that capture an overall preference signal without revealing the underlying factors that shape it. This makes it difficult to both understand why objects are placed where they are or tune arrangements according to specific priorities (e.g., convenience over aesthetics) or different scenarios without intensive retraining.
To address these limitations, we propose grounding personalized object rearrangement in interpretable constructs that reflect how people organize their environments, while remaining adaptable to variation across users and contexts. Specifically, we formulate a compact representation of human organizational preferences in terms of four constructs: spatial practicality, habitual convenience, semantic coherence, and commonsense appropriateness, and investigate whether these human-aligned constructs are sufficient to explain how people reason about object arrangements in common household spaces. Our work makes three contributions:
-
Interpretable formulation of arrangement preferences: We show that the four explicit arrangement constructs (spatial, habitual, semantic, commonsense) capture variation across individuals and scenarios (i.e. kitchen and living room).
-
A measurement tool for the proposed constructs: We design and validate a self-report questionnaire that quantifies how strongly each construct influences participantsā judgments and establish that the constructs form a reliable and psychologically meaningful basis.
-
Preferences-alinged arrangement generation: We formulate cost functions for the constructs and integrate these into a Monte Carlo Tree Search (MCTS) planner for arrangement. This approach produces arrangements that align with human preferences when using participant-derived weights.
Related Work
Most robotic object rearrangement systems optimize for a single, universal definition of what constitutes a āgoodā organization. In the indoor household environments, e.g., kitchens and living rooms, organization is primarily defined at the object- and room-levels, which are often described in spatial cognition as figural and vista spacesĀ . These methods use visuo-semantic priors and commonsense reasoning to move objects to plausible locationsĀ , minimize spatial flow fieldsĀ , learn arrangement cost functionsĀ , or leverage 3D mapping and semantic searchĀ . While effective at achieving tidy configurations, these methods cannot account for diverse user-specific organizational styles. In contrast, our work formulates usersā object (re)arrangement preference as a combination of four interpretable constructs, which is flexible and capable of accommodating diverse user preferences.
For personalized rearrangement, Abdo et al.Ā used collaborative filtering to model co-occurrence patterns of object groupings, but this assumes a fixed organizational schema is given a priori and thus captures statistical regularities without explaining the underlying rationale. Other approaches to personalized rearrangement extract latent ātidying styles" from user-arranged scenesĀ , use large language models to summarize examples into rulesĀ , infer preferred placements from partial arrangementsĀ , employ zero-shot visionālanguage modelsĀ , or actively query users to resolve ambiguitiesĀ . These advances enable personalization and achieve good predictive performance, but rely on implicit representations that hide the principles guiding the generated arrangements.
This lack of interpretability limits practical adoption. Reviews in humanārobot interaction (HRI) and explainable roboticsĀ emphasize that users (especially in personal spaces) benefit from explanations that communicate a robotās goals and reasoning in human terms, rather than abstract model outputs. People prefer robots whose actions are legible and explainableĀ . Both robotics researchĀ and broader AI contextsĀ increasingly recognize that inherently interpretable models are preferable to black-box systems requiring post-hoc explanation, particularly when trust and transparency affect adoption. We address these drawbacks by explicitly formulating arrangement preferences along four interpretable constructs. This design provides two key benefits. First, it enables transparent characterization of individual and group organizational styles within a unified framework. Second, it provides a foundation for robots that can personalize behavior and communicate reasoning using simple, understandable terms.
Methodology
To address the lack of interpretable constructs in current robotic object rearrangement research, we propose four constructs motivated by human organizational reasoning as detailed in SecĀ 3.1. We validate the proposed constructs with a user study detailed in SecĀ 3.2, and demonstrate how they can be used for computational generation of human-like arrangements as detailed in SecĀ 3.3.
Theoretical Motivation
Inspired by the analysis of psychological designs involving spatial cognition, ergonomics, and humanāenvironment interaction, and the reviewing on robotics literature, we propose four constructs to provide comprehensive coverage of human organizational reasoning: spatial practicality, habitual convenience, semantic coherence, and commonsense appropriateness.
Spatial practicality captures how people place items in locations that fit the physical layout of the room and support efficient, physically feasible use of the arranged objects. Because our scenarios involve indoor kitchens and living rooms, we focus on organization at figural and vista spatial scales, that is, object-to-surface relations and within-room layouts, rather than larger environmental navigation scalesĀ . At these indoor scales, research on scene grammar shows that people learn regularities about where objects typically appear relative to functional regions and stable anchors (e.g., sinks or stoves)Ā , and that violations of these regularities reduce perceived plausibility and can incur measurable processing costsĀ . Contextual cueing studies demonstrate that people implicitly learn recurring spatial configurations and use them to guide expectation and attention during visual searchĀ . In robotics, related ideas appear in object-placement systems that evaluate candidate placements using geometric structure and physical feasibility criteria (e.g., support contact, stability, etc.), including learning-based placing from 3D point clouds and planners that search for stable poses on available surfacesĀ .
Habitual convenience reflects how people make frequently used items easy to reach. Actions repeated in the same environment become automatic rather than deliberateĀ . Neuroscience research shows that familiar environments trigger these automatic behaviors instead of conscious decision-makingĀ . This creates a natural drive to minimize effort for routine tasks by positioning frequently used objects within easy reach. This principle is also used in design guidelines and ergonomic standards, which often recommend placing high-use items in primary reach zones to reduce physical strainĀ . Manufacturing guidelines like 5S apply the same logic, organizing tools by usage frequency to eliminate wasted motionĀ .
Semantic coherence emerges when people place items together if they are used for the same task or are contextually related. People tend to group objects that participate in the same activities because our brains link them through functional relationshipsĀ . Prior research also shows humans classify environments primarily by the activities they afford rather than how things lookĀ . As a result, items used together become mentally chunked as units, improving both memory retrieval and search efficiencyĀ . This follows associative learning principles under which items that regularly co-occur in our experience become mentally linked and are treated as belonging together by the brainĀ . Recent approaches to object rearrangement in robotics, such as ConSORĀ and ContextSortLMĀ , exploit this semantic context by grouping objects according to their functional relationships and organizational schemas.
Commonsense appropriateness drives people to put things where others usually expect to find them. Humans rapidly detect when objects are āout of placeā because of internalized expectations about what belongs whereĀ . These expectations often reflect accumulated wisdom about widely accepted safety, hygiene, and social norms. This construct is compelling because, while aesthetic preferences might vary across culturesĀ , many basic safety and social norms (e.g. placing heavy objects on stable surfaces, placing utensils near where they are used, keeping cleaning chemicals away from food) are more standard Ā . Systems like TIDEEĀ achieve human-like tidying performance precisely by respecting these fundamental normative constraints, demonstrating that commonsense rules can be learned and applied systematically.
User Study
p0.18 p0.78 Construct & Extended Form (3 items)
& I had a clear spot in mind for each item.
& I avoided placements that felt awkward or out of place.
& I tried to place items as close as possible to their ideal spot.
& I placed each item based on my everyday routine.
& I made sure the items I use most often were easier to grab.
& I considered how often I use each item when deciding placement.
& I placed items near each other if they are usually used together.
& I placed items together if they served a similar purpose.
& I avoided grouping items that do not belong near each other.
& I placed items where most people would expect to find them.
& I used what Iāve generally learned about how rooms are organized.
& I avoided placements that would look messy or unusual to others.
We conducted an online user study aiming to validate the proposed constructs via Qualtrics1. We adopted a within-subjects design in which each participant completed four organization tasks: two in the kitchen and two in the living room, (Fig.Ā 1). In each scenario, participants performed TaskĀ 1, arranging a set of objects from scratch, and TaskĀ 2, re-arranging a pre-existing configuration into a layout they found preferable. Two distinct scenarios were used to determine whether selected constructs generalise across settings, while task variations were selected to increase measurement validity.
Participants interacted with pre-rendered household scenes drawn from the Habitat Synthetic Scenes Dataset (HSSD-200D)Ā . Each scenario contained a fixed set of objects and receptacles chosen to reflect realistic organization challenges. They were tasked with placing each object into one of the available receptacle zones in the scene, using a drag-and-drop interface, to create an arrangement that felt natural and appropriate (see Fig.Ā 1). After completing each task, participants rated their satisfaction, from 0 to 100, with the resulting arrangement (with both pre- and post-ratings collected in TaskĀ 2).
Measures were collected on a 5-point Likert scale along the four proposed constructs of organizational preference, introduced in SectionĀ 3.1. To capture these, participants rated their agreement with three items per construct (12 items total). To minimize potential bias, the constructs themselves were never presented explicitly to participants; instead, items were phrased as natural self-reflection statements (e.g., āI placed each item based on my everyday routineā). TableĀ [tab:principle_items] summarizes the constructs and the corresponding extended-form items.
In addition to these structured ratings, the survey included several open-ended prompts asking participants about what influenced their satisfaction ratings, what additional factors may have shaped their placement decisions, and whether any aspect of the task felt difficult or unnatural. A final prompt invited participants to share any additional reflections about how they organized items across tasks or about the survey in general. These open-ended prompts allowed participants to articulate considerations beyond the four proposed constructs, ensuring that emergent factors could be captured and qualitatively analyzed. Attention checks were also included to maintain engagement and detect poor quality submissions.
We recruited a total of $`N=63`$ participants through the Prolific online crowd-sourcing platform and institutional networks. Participants were required to be at least 18 years old and proficient in English. Participants had a mean age of $`M=32`$ years ($`SD=13`$), spanning the 18ā65+ range. Recruitment and study procedures were approved by Monash University Human Research Ethics Committee (ID: 47370) prior to data collection. Participants provided informed consent and received Ā£3 for a median completion time of $`\sim`$20 minutes.
Computational Generation of Human-Aligned Arrangements
The four proposed constructs can be formulated as cost functions within a personalised object rearrangement task. We model personalized object rearrangement as the task of assigning a set of objects $`\mathcal{O}=\{o_i\}_{i=1}^N`$ to a set of receptacles $`\mathcal{R}=\{\rho_j\}_{j=1}^M`$. Arrangements are represented as a set of objectāreceptacle placements:
\begin{equation}
X = \{ (o_i, \rho_j, v_i) \mid o_i \in \mathcal{O}, \, \rho_j \in \mathcal{R}, \, v_i \in P_j \}, \label{eq:arrangement}
\end{equation}
where $`v_i`$ is the placement position of object $`o_i`$ on receptacle $`\rho_j`$, and $`P_j`$ is the valid placement surface of $`\rho_j`$. Feasible arrangements $`\mathcal{F}`$ must satisfy the following: unique assignment: each object is placed exactly once; surface containment: $`v_i \in P_j`$ for all placements $`(o_i,\rho_j,v_i)`$; non-overlap: objects placed on the same receptacle do not intersect in 3D space. Arrangement quality is evaluated through four normalized scoring functions $`\{f_k(X)\}_{k=1}^4`$ with outputs in $`[0,1]`$ corresponding to the constructs introduced in Sec.Ā 3.1. We mathematically instantiate these constructs as follows:
Spatial practicality, where $`v_i`$ is the current placement of object $`o_i`$ and $`v_i^\star`$ is a preferred prior location inferred from demonstrations:
\begin{equation}
f_1(X) = \frac{1}{N} \sum_{i=1}^{N} \frac{1}{1 + \| v_i - v_i^\star \|}.
\end{equation}
Habitual convenience, where $`u_{\max} = \max_{i=1,\dots,N} u_i`$ is used to normalize usage frequency, and $`\alpha_j \in [0,1]`$ denotes the accessibility of receptacle $`\rho_j`$ ( higher values indicate greater accessibility):
\begin{equation}
f_2(X) = 1 - \frac{1}{N} \sum_{i=1}^{N} \left( \frac{u_i}{u_{\max}} - \alpha_j \right)^2.
\end{equation}
Semantic coherence, with $`d_{ij} = \| v_i - v_j \|`$ the distance between objects $`o_i`$ and $`o_j`$. Object affinities $`\sigma_{ik} \in [-1,1]`$ are estimated from demonstrations, usage statistics, or semantic knowledge bases:
\begin{equation}
f_3(X) = 1 - \frac{1}{N(N-1)} \sum_{i=1}^{N} \sum_{\substack{j=1 \\ j \neq i}}^{N}
\begin{cases}
\sigma_{ij} \cdot \frac{d_{ij}}{1 + d_{ij}}, & \sigma_{ij} > 0 \\
|\sigma_{ij}| \cdot \left(1 - \frac{d_{ij}}{1 + d_{ij}} \right), & \sigma_{ij} < 0
\end{cases}.
\end{equation}
Commonsense appropriateness $`\in [0,1]`$ estimated by querying language model conditioned on the current arrangement state $`s_t`$ and a JSON description of objects and receptacles:
\begin{equation}
f_4(X) = \frac{1}{N} \sum_{i=1}^{N} \text{commonsense\_score}(o_i, \rho_j; s_t)
\end{equation}
.
These functions are aggregated into a scalar reward:
\begin{equation}
R(X; \mathbf{w}^{(k)}) = \sum_{m=1}^4 w^{(k)}_m f_m(X), \label{eq:reward}
\end{equation}
where $`\mathbf{w}^{(k)} = \left[ w^{(k)}_1,\dots,w^{(k)}_4 \right] \in [0,1]^4`$ denotes the preference vector of user $`k`$ that captures how strongly a user prioritizes each organizational construct. Given a vector $`\mathbf{w}^{(k)}`$ encoding a userās organizational preferences, the objective is then to find a sequence of placement actions that produces arrangements reflecting these preferences by maximizing the corresponding weighted reward function in Eq.Ā [eq:reward]. Formally, this is formulated as:
\begin{equation}
X^{*(k)} = \arg\max_{X \in \mathcal{F}} R(X; \mathbf{w}^{(k)}). \label{eq:optimization}
\end{equation}
where $`\mathcal{F}`$ denotes the set of all feasible arrangements.
We employ Monte Carlo Tree Search (MCTS)Ā to efficiently explore the combinatorial space of object-to-receptacle assignments. MCTS is well-suited to this domain due to its ability to balance exploration and exploitation in large discrete action spaces, making it an effective method to find high-quality arrangement policies.
We specify a user profile with a ground-truth preference vector $`\mathbf{w}_{\text{gt}}^{(k)} = \left[w_1^{(k)}, w_2^{(k)}, \dots, w_4^{(k)}\right]`$, where each $`w_i^{(k)}`$ denotes the importance assigned to the $`i`$-th organizational principle by user $`k`$; along with construct-specific priors which are estimated separately and held constant during planning (Sec.Ā 4.3). Given this profile, MCTS constructs a search tree where nodes correspond to partial arrangements $`X_t`$ at time step $`t`$ and edges represent actions of assigning unplaced objects to valid receptacle locations. At each step $`t`$, the admissible action space $`\mathcal{A}(X_t)`$ is state-dependent, consisting of all feasible placements of currently unplaced objects. At time step $`t`$, the tree policy selects an action $`a_t`$ using the Upper Confidence Bound (UCB) criterion:
\begin{equation}
a_t = \arg\max_{a \in \mathcal{A}(X_t)} \left(Q(X_t, a) + c\sqrt{\frac{\ln n(X_t)}{n(X_t, a)}}\right),
\end{equation}
where $`Q(X_t, a)`$ is the empirical action-value estimate computed as the mean return from rollouts initiated with $`(X_t,a)`$ using Eq.Ā [eq:optimization], $`n(X_t)`$ and $`n(X_t,a)`$ are the visit counts for state $`X_t`$ and stateāaction pair $`(X_t, a)`$, and $`c>0`$ controls the explorationāexploitation trade-off. We set $`c = 1/\sqrt{2}`$, following the UCT analysis inĀ , which establishes this value under rewards bounded in $`[0,1]`$. The action selection process terminates once every object has been placed. We set horizon length to $`T=N`$, i.e., the number of objects, to ensure that different action sequences leading to the same final configuration are equivalent and prevent degenerate behaviors such as reward-hacking through repeated placements.
The best action, i.e., assigning an object to a valid receptacle, at each node, i.e., object, is determined using
\begin{equation}
a^{*}_{t}(X_t) = \arg\max_{a \in \mathcal{A}(X_t)} \frac{\text{TotalReward}(X_t,a)}{n(X_t,a)},
\end{equation}
where $`\text{TotalReward}(X_t,a)=\sum_{k=1}^{n(X_t,a)}\sum_{t}^{T}R(X_t;w^{(k)})`$ is the accumulated reward over rollouts starting from state-action pair $`(X_t,a)`$.
The resulting sequence of actions defines a trajectory:
\begin{equation}
\pi^{*} = [a^{*}_{1}(X_1), a^{*}_{2}(X_2), \dots, a^{*}_{T}(X_{T})],
\end{equation}
that showcases how arrangements consistent with a given preference profile can be realized.
Results and Discussion
We structure our results to answer complementary questions about the proposed constructs and their roles in explaining arrangement preferences. First, Sec.Ā 4.1 validates the questionnaire and examines whether participantsā responses organize into coherent factors aligned with the four constructs. Next, Sec.Ā 4.2 tests whether variation in these construct ratings is reflected in participantsā reported satisfaction with arrangements, establishing their explanatory value. Finally, Sec.Ā 4.3 shows how participant-derived construct weights can be operationalized within our planning framework to generate arrangements that better align with human preferences. All statistical analyses (factor analyses, regressions, and nonparametric tests) were conducted in jamovi (Version 2.7)Ā and R (Version 4.5)Ā , with regression models estimated using GAMLjĀ .
Psychometric Validation of Questionnaire
To assess whether the questionnaire provides a reliable basis for measuring the four proposed constructs, we conducted an Exploratory Factor Analysis (EFA)Ā on the 12 Likert items (three per construct) in TableĀ [tab:principle_items]. Responses from all taskāscene combinations were included, yielding four observations per participant. The analysis used minimum residual extraction with oblimin rotationĀ , which is recommended when psychological constructs are expected to correlate rather than be strictly orthogonal. This choice was also consistent with the observed inter-factor correlations in our data, which fell in the moderate range ($`r = 0.30 \text{ to } 0.48`$). Data suitability checks were conducted before EFA, where Bartlettās test of sphericity ($`\chi^2(66) = 660`$, $`p < 0.001`$) indicates the variables are significantly correlated and suitable for EFA, and the KaiserāMeyerāOlkin (KMO) measure of sampling adequacy with a value 0.80 (0.5 is considered a minimum acceptable threshold) also supported the use of EFA.
Factor retention was guided by parallel analysis and the inspection of the scree plot, both supporting a four-factor solutionĀ . Together, these factors explained 61% of the variance in participantsā item responses. The factor loadings broadly aligned with the hypothesized structure: semantic items loaded strongly on a single factor ($`0.70`$ā$`0.92`$), spatial items clustered together ($`0.55`$ā$`0.78`$), and habitual ($`0.62`$ā$`0.81`$) and commonsense ($`0.34`$ā$`0.93`$) items generally grouped as expected, though with greater variability. Higher loading values indicate a stronger correspondence between an item and its intended underlying construct and can be taken as evidence that the questionnaire items functioned as intended.
EFA analysis also revealed two instances in which one item reported significant factor loadings on more than one construct. First, one habitual item loaded on the spatial factor ($`0.62`$). Given the substantial inter-factor correlation between spatial and habitual factors ($`r = 0.40`$), this pattern is interpretable as conceptual overlap, that is, routine-driven placement decisions often involve some notion of spatial reasoning (e.g., people may habitually store coffee mugs near the kettle, a choice that is both convenient for daily use and spatially logical relative to the appliance). Second, one commonsense item showed near-equal loadings on both spatial ($`0.35`$) and commonsense ($`0.34`$) factors. This cross-loading may reflect ambiguity in item wording but is also consistent with the observed correlation between spatial and commonsense factors ($`r = 0.40`$), suggesting these constructs are likely related. Internal consistency was acceptable to excellent across the four scales (Cronbachās $`\alpha`$Ā : Spatial = 0.72, Habitual = 0.73, Semantic = 0.87, Commonsense = 0.75), indicating that items within each scale exhibited correlated response patterns that reliably measured the same underlying construct.
Overall, the EFA successfully validated the four constructs introduced in Sec.Ā 3.3. Items designed for each construct clustered together as hypothesized, and they explained a substantial proportion of ratings variance. However, observed cross-loadings suggest that participantsā reasoning about organizational constructs is intertwined. For example, storing mugs near a kettle reflects both habitual convenience and spatial logic, while placing heavy items low involves both commonsense safety and spatial coherence.
Empirical Validation of Proposed Constructs
We assumed that organizational preferences can be represented by four shared constructs and that individuals differ in how they prioritize them. To evaluate this claim, we tested (i) whether variation along these constructs is associated with participantsā reasoning about arrangements, and (ii) whether participantsā arrangements exhibit heterogeneity consistent with individualized preferences.
| Factor | $`\hat{\beta}`$ | SE | OR [95% CI] | $`z`$ | $`p`$ | Marginal $`R^2`$ | Conditional $`R^2`$ | ICC |
|---|---|---|---|---|---|---|---|---|
| Spatial | 1.10 | 0.30 | 3.01 [1.67, 5.44] | 3.65 | $`<`$0.001 | 0.20 | 0.73 | 0.66 |
| Habitual | 0.64 | 0.25 | 1.90 [1.17, 3.09] | 2.59 | 0.010 | 0.12 | 0.70 | 0.66 |
| Semantic | 1.13 | 0.37 | 3.10 [1.49, 6.45] | 3.03 | 0.002 | 0.19 | 0.77 | 0.71 |
| Commonsense | 0.48 | 0.30 | 1.62 [0.91, 2.90] | 1.64 | 0.102 | 0.10 | 0.71 | 0.67 |
| Construct | Example open codes (satisfaction / placement) | Representative quotes |
|---|---|---|
| Spatial | access_reach, workflow_proximity, save_space, design_affordance / access_reach, design_affordance, exit_location, proximity_task | (S) āMix of practicality,
safety, and how often I use them.ā (P) āI would place items where they would be easiest to reach and most convenient for the task.ā |
| Habitual | habit_schema, freq_use, label_confusion / habit_schema, freq_use, less_used_far, memory_findability | (S) āInitial arrangement
non-functional ⦠reorganized based on how I function in my
kitchen.ā (P) āThe less often used items can be stored further away ⦠the controller can be stored on the fireplace mantle.ā |
| Semantic | semantic_grouping, context_unknown, clutter_risk / semantic_grouping, canonical | (S) āA few items donāt
fit this room overall.ā (P) āThings like spoons in a block naturally go on the top ⦠useful to have basics out to grab while cooking.ā |
| Commonsense | perishability, hygiene_safety, workflow_proximity / perishability, hygiene_safety, safety | (S) āWorked well ā
keeping the kitchen island clear ⦠toddler ⦠prefer drawers.ā (P) āI also make sure that perishable items are either in the fridge or in a cupboard (not on the counter where theyāll spoil).ā |
| Emergent | aesthetics, clutter_risk, personal_constraint, label_confusion / aesthetics, social_others, temporary_use, personal_constraint | (S) āSpice jars away from
stove didnāt work; prefer drawer for ease (height).ā (P) āBecause I saw many of the items as ones I would not store in the living room ⦠I placed them on the coffee table.ā |
Shared Dimensions as a Basis for Assessing Arrangements
We perform a linear regression analysis to examine whether the proposed four constructs had an effect on participantsā satisfaction ratings for each arrangement task. Specifically, we test whether variations in these dimensionsā ratings are statistically associated with differences in reported satisfaction. We also analyzed participantsā responses to open-ended questions to check alignment with participantsā stated reasoning and identify any considerations outside our hypothesized set.
Regression Analysis: We fit Generalized Linear Mixed Models (GLMMs) for ordinal outcomes, using proportional odds models with satisfaction ratings recoded into three categories (low, medium and high) via quantile binning at the 33rd and 66th percentiles to mitigate skew in the continuous scale. The aim of this step was not to claim that āmore of a given principle always yields higher satisfactionāā in any universal sense, but rather to assess whether variation along these dimensions was statistically associated with satisfaction.
For each latent construct
Z \in \{\text{Spatial, Habitual, Semantic, Commonsense}\},
we estimated a proportional-odds mixed model of the form:
\begin{equation}
\begin{split}
\text{logit}\!\left(P(Y_{ij} \leq k)\right)
&= \theta_k
- \big( \beta_1 Z_{ij} + \beta_2 \,\text{room/task}_{ij} \\
& \quad + \beta_3 \,(Z_{ij} \times \text{room/task}_{ij}) + u_i \big),
\end{split}
\label{eq:glmm_model}
\end{equation}
where $`Y_{ij}`$ is the satisfaction rating for participant $`i`$ on observation $`j`$, $`k \in \{1,2\}`$ indexes the ordinal thresholds, with $`\theta_1`$ separating Low from Medium/High and $`\theta_2`$ separating Low/Medium from High, $`Z_{ij}`$ is the construct predictor, room/task$`_{ij}`$ is a fixed effect for scene $`\times`$ task, and $`u_i`$ is a random intercept for participant ($`u_i \sim \mathcal{N}(0,\sigma^2)`$).
Given the partial cross-loadings and moderate correlations observed in the EFA (Sec. Ā 4.1), we fit separate models for each construct to ensure clearer interpretation. Results (TableĀ [tab:regression_results]) show that three of the four hypothesized constructs were significantly associated with satisfaction. Spatial ratings were the strongest predictor ($`\hat{\beta}=1.10`$, OR = 3.01, 95% CI [1.67, 5.44], $`p<0.001`$), indicating that a 1-standard-deviation increase in spatial alignment was associated with roughly tripled odds of reporting higher satisfaction. Habitual ratings were also predictive ($`\hat{\beta}=0.64`$, OR = 1.90, 95% CI [1.17, 3.09], $`p=0.010`$), though with smaller effect size. Semantic ratings had a similar effect magnitude to Spatial ($`\hat{\beta}=1.13`$, OR = 3.10, 95% CI [1.49, 6.45], $`p=0.002`$). All models converged successfully with an acceptable fit. Marginal $`R^2`$ values (0.10ā0.20) indicated that fixed effects explained modest variance, while high conditional $`R^2`$ values (0.70ā0.77) and ICCs (0.66ā0.71) allude to substantial variation among participantsā satisfaction ratings. Commonsense ratings showed a positive trend but were not statistically significant as a unique predictor ($`\hat{\beta}=0.48`$, OR$`=1.62`$, 95% CI $`[0.91, 2.90]`$, $`p=0.102`$). Given the moderate inter-factor correlations and cross-loading patterns observed in the EFA (Sec.Ā 4.1), this result is consistent with the fact that commonsense appropriateness is often applied alongside other reasoning modes in everyday organization. Normative judgments about what is safe, hygienic, or socially appropriate frequently co-occur with spatial feasibility (e.g., reachable and stable placements), habitual accessibility, or semantic grouping, so their explanatory variance is shared. As a consequence, the GLMM coefficient for commonsense can be attenuated even when commonsense reasoning is active.
Overall, these results suggest that satisfaction judgments varied systematically with spatial, habitual, and semantic principles, while commonsense expectations played a more context-dependent role and were less influential as independent predictors. These differences in predictive strength and substantial participant-level variance support modeling preferences as personalized weightings over a shared basis of latent constructs.
Qualitative Analysis of Reasoning We analyzed the free-text responses from participants who provided reasoning for their arrangement decisions. Participants explained both their satisfaction ratings and placement considerations across roomātask contexts, yielding 118 satisfaction reasoning responses and 47 placement consideration responses. We employed a two-stage inductiveādeductive coding procedure to analyze participantsā responses. First, responses were coded openly using thematic analysisĀ . Second, codes were mapped to our four constructs, with unmapped codes retained as emergent categories. This approach allowed us to confirm whether the hypothesized constructs spontaneously emerged in participantsā reasoning, as well as identify additional themes as potential extensions for future modeling. TableĀ [tab:qual_examples_combined]shows this mapping with open codes and illustrative participant quotes.
Spatial considerations dominated both satisfaction reasoning (57%) and placement considerations (60%). Habitual factors appeared consistently (31% and 30% respectively), while Semantic reasoning was more prominent in satisfaction judgments (27%) than placement decisions (4%). Commonsense appeared infrequently (7% and 15%), typically combined with other principles rather than independently. These patterns mirror our quantitative findings where spatial and habitual were strongest predictors, semantic played a secondary role, and commonsense showed context-dependent effects. Constructs frequently co-occurred rather than appearing in isolation. For instance, spatial reasoning commonly paired with habitual (17 satisfaction; 10 placement responses) and semantic considerations (15 satisfaction responses). Commonsense rarely appeared independently, instead coupling with spatial or habitual factors. This qualitative pattern is consistent with the āfilteringā role for commonsense: normative constraints (e.g., safety or social norms) can rule out otherwise plausible placements, while the remaining variation in satisfaction is more strongly explained by spatial, habitual, and semantic considerations.
Emergent themes appeared in 21ā32% of responses, including design affordances (missing hooks, outlets), label confusion, context uncertainty, aesthetics, social influences, and personal constraints. Many emergent themes represent refinements of core constructs. For instance, design affordances and context uncertainty relate to spatial practicality, while personal constraints like reachability align with habitual convenience. However, themes like aesthetics and social influences extend beyond our framework, suggesting other potential constructs to explore in future work.
Overall, the qualitative analysis strongly supports our hypothesized constructs. Spatial and habitual reasoning dominated participantsā explanations, semantic coherence appeared as a consistent secondary factor, and commonsense contributed primarily through combinations with other principles. The four constructs provide a parsimonious foundation for modeling arrangement preferences, capturing stable organizational logic. Emergent themes highlight situational variations that could inform future extensions.
Behavioral Heterogeneity in Organization
We further hypothesize that participants make individualized placement decisions, reflecting distinct trade-offs in how different organizational considerations are prioritized. To test this, we analyzed (i) the similarity of participantsā final layouts and (ii) the relative importance they assigned to the four hypothesized principles. For each scenario $`c = \left\{ (a, b) |\;a\in \left\{ \text{Kitchen}, \text{Living}\right\} \text{and}\; b\in \left\{ \text{Task 1}, \text{Task 2}\right\} \right\}`$ (see Sec.Ā 3.2), we represent a participantās arrangement as a set of objectāreceptacle assignments $`S_p = \{(o,\rho)\}`$, which is a simplified version of Eq. [eq:arrangement], where $`o`$ denotes an object, $`\rho`$ is a receptacle, and $`p`$ indexes a participant. Similarity between any two arrangements $`S_p`$ and $`S_q`$ was quantified using the Jaccard similarity index .
For each scenario, we computed pairwise Jaccard similarities across participantsā arrangements and reported the mean values with bootstrapped 95% confidence intervals. Similarity was consistently low overall ($`M = 0.27`$, 95% CI [0.26, 0.28]), indicating substantial variation in how participants organized the same objects. Kitchen scenarios showed modestly higher similarity ($`M = 0.33`$, 95% CI [0.32, 0.34]) compared to living room scenarios ($`M = 0.22`$, 95% CI [0.21, 0.23]). This difference likely reflects stronger functional constraints in kitchens, where established conventions dictate logical placements, e.g., placing cooking utensils near the stove or storing dishes near the sink. Living rooms, by contrast, offer greater flexibility in object arrangement, as items like books, decorations, or electronics can be placed in multiple locations without violating clear functional principles. Task type had minimal impact: whether participants arranged objects from scratch (TaskĀ 1) or modified an existing layout (TaskĀ 2) yielded similar agreement levels within each scenario.
We further examined whether participants differed in the importance they assigned to the four constructs using a repeated-measures Friedman test on participantsā average construct ratings. The Friedman test revealed significant overall differences in ratings ($`\chi^2(3)=154`$, $`p<0.001`$). Post-hoc DurbināConover comparisons (Fig.Ā 2) indicated that Spatial and Habitual were both rated significantly higher than Semantic and Commonsense ($`p<0.001`$ in all cases). Spatial and Habitual also differed slightly ($`p{=}0.015`$), while Semantic and Commonsense did not ($`p{=}0.876`$). Both analyses point to strong behavioral heterogeneity. Participant placements showed little similarity, and their construct ratings revealed distinct trade-off patterns: Spatial and Habitual were prioritized, while Semantic and Commonsense were treated as secondary. These results confirm that organizational choices are individual rather than following a fixed canonical template, and that modeling must accommodate user-specific priorities over different arrangement principles.
Preference-Aligned Trajectory Generation
Given a participantās preference vector $`w^{(k)}`$ (see Eq.Ā 2), we generated object arrangements by optimizing the weighted sum of four construct-specific scores via MCTS planning (see Sec.Ā [sec:mcts]). To ensure a fair evaluation, all construct parameters and priors were estimated exclusively from TaskĀ 1 participant data. Specifically, spatial priors, receptacle accessibility scores, usage frequencies, and object correlations were estimated from TaskĀ 1 placements. Commonsense objectāreceptacle priors were obtained by querying a large language model (GPT-4) with a structured textual representation (JSON graph) of the scene context and candidate objectāreceptacle pairs. Preference weights were extracted from questionnaire responses by mapping Likert ratings to numerical scores, averaging items within each construct, and normalizing to yield a personalized weight vector. Generated arrangements were compared against TaskĀ 2 participant placements using object accuracy, defined as the proportion of objects placed in the same receptacle as a participant.
We selected four representative participant profiles spanning distinct weighting patterns, i.e., spatial-dominant, balanced, habitual-dominant, and semantic-dominant, across the two scenes to evaluate the planner. TableĀ 1 summarizes key characteristics of these profiles, including the scenes, weights, and resulting accuracies ((see AppendixĀ B for per-object placement details).
| Pattern | ID | Scene | ||
| (S,H,Se,C) | ||||
| Spa.-dom. | P23 | Living R. | [0.37, 0.29, 0.17, 0.17] | 0.60 |
| Balanced | P32 | Living R. | [0.25, 0.25, 0.25, 0.25] | 0.40 |
| Hab.-dom. | P24 | Kitchen | [0.34, 0.40, 0.18, 0.08] | 0.80 |
| Sem.-dom. | P16 | Kitchen | [0.26, 0.21, 0.30, 0.23] | 0.90 |
Representative participant profiles used for trajectory generation. Weights are normalized across spatial (S), habitual (H), semantic (Se), and commonsense (C).
Spatial-dominant (Living Room, P23). With spatial practicality carrying the most weight, the planner emphasized placing items where they ānaturally fitā into the roomās layout. This produced successes that aligned with furniture affordances. At the same time, the weaker semantic and commonsense weights meant the system tolerated oddities. These are not random errors but the trade-off of prioritizing layout coherence above category or normative consistency. The overall signature is a room that looks spatially coherent, but with some object groupings that are counterintuitive.
Habitual-dominant (Kitchen, P24). Strong habitual weighting drove the planner to place frequently used objects into the most accessible receptacles. This approach yielded high alignment with participantsā placements. Discrepancies arose primarily when the participantās storage preferences diverged from the broader usage patterns. The low commonsense weight prevented the planner from correcting toward more typical placements, while the habitual component, which models receptacle accessibility based on usage frequency, could not account for these associations. This shows how habitual bias produces strong routine fidelity but also exposes the limits of our current modeling when participants deviate from normative patterns.
Semantic-dominant (Kitchen, P16). Semantic grouping dominated this profile, with commonsense moderately supporting it. The planner produced category-faithful groupings that align closely with human expectations and TaskĀ 2 ground truth. Errors arose when semantic links to storage locations were weak and commonsense weighting failed to compensate, leading to atypical placements a human would likely avoid. The high accuracy (0.90) demonstrates that semantic bias reliably produces human-like groupings, but is dependent on the completeness of semantic priors.
Balanced (Living Room, P32). This participant reported nearly uniform weights, leaving the planner without a dominant construct to guide the search. While some placements were still correct, other items drifted. Because no construct provided strong direction, the planner explored many near-equal options, producing an arrangement that was acceptable but not tightly structured. The lower accuracy (0.40) may reflect limited guidance from a flat profile and the use of estimated hyperparameters and coarse cost-term definitions, which together can mask subtle behavioral biases.
These examples show that the generated arrangements were reasonably accurate overall, with accuracies ranging from 0.40 to 0.90 depending on the participantās weight profile (mean $`\approx 0.68\% \pm 0.20`$). Notably, we observed that the kitchen scene yielded higher accuracies than the living room. This likely reflects that kitchens contain stronger habitual and semantic regularities (e.g., food in fridges, cutlery in drawers) that our cost terms captured well. By contrast, living rooms involve more ambiguous placements where multiple surfaces are equally plausible (e.g., a magazine could be on a coffee table, side table, or shelf), making subtle individual biases harder to model with hyperparameters estimated from limited data.
Conclusions and Future Work
We proposed four interpretable constructs of human organizational preference ( i.e., spatial practicality, habitual convenience, semantic coherence, and commonsense appropriateness) and validated them with a user study of 63 participants. Our analyses confirmed that these constructs capture meaningful variation in arrangement preferences across both users and scene contexts (kitchen and living room). Qualitative responses also showed that participants naturally reasoned about their organizational choices in terms consistent with these constructs. We mathematically formulated the constructs as cost functions and integrated them into an MCTS planner guided by participant-specific weight profiles. The generated arrangements mostly aligned with human placements in both quantitative accuracy and qualitative signature, demonstrating that the constructs can be operationalized for planning.
While demonstrating strong alignment with human reasoning, operationalizing the proposed constructs within a computational model required certain simplifying assumptions. Cost function hyperparameters were estimated by proxy rather than learned from demonstrations. Additionally, we treated the constructs as an independent linear combination, despite moderate correlations in the user data suggesting these dimensions can be entangled in human reasoning. While the four constructs accounted for substantial variance in organizational preferences, the residual variation indicates that additional factors may be needed to capture remaining individual idiosyncrasies.
Future work will learn construct parameters and preference weights directly from human demonstrations, refine the formulation to account for construct interactions, and integrate emergent factors from our qualitative analysis, such as aesthetics and design affordances. We also aim to extend the method to handle continuous receptacle configurations. Overall, our results demonstrate that organizational principles can be explicitly modeled and leveraged as interpretable building blocks for personalized robot assistance in household object rearrangement.
š ė ¼ė¬ø ģź°ģė£ (Figures)


