Decoding Psychological States Through Movement: Inferring Human Kinesic Functions with Application to Built Environments

Decoding Psychological States Through Movement: Inferring Human Kinesic Functions with Application to Built Environments
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Social infrastructure and other built environments are increasingly expected to support well-being and community resilience by enabling social interaction. Yet in civil and built-environment research, there is no consistent and privacy-preserving way to represent and measure socially meaningful interaction in these spaces, leaving studies to operationalize “interaction” differently across contexts and limiting practitioners’ ability to evaluate whether design interventions are changing the forms of interaction that social capital theory predicts should matter. To address this field-level and methodological gap, we introduce the Dyadic User Engagement DataseT (DUET) dataset and an embedded kinesics recognition framework that operationalize Ekman and Friesen’s kinesics taxonomy as a function-level interaction vocabulary aligned with social capital-relevant behaviors (e.g., reciprocity and attention coordination). DUET captures 12 dyadic interactions spanning all five kinesic functions-emblems, illustrators, affect displays, adaptors, and regulators-across four sensing modalities and three built-environment contexts, enabling privacy-preserving analysis of communicative intent through movement. Benchmarking six open-source, state-of-the-art human activity recognition models quantifies the difficulty of communicative-function recognition on DUET and highlights the limitations of ubiquitous monadic, action-level recognition when extended to dyadic, socially grounded interaction measurement. Building on DUET, our recognition framework infers communicative function directly from privacy-preserving skeletal motion without handcrafted action-to-function dictionaries; using a transfer-learning architecture, it reveals structured clustering of kinesic functions and a strong association between representation quality and classification performance while generalizing across subjects and contexts.


💡 Research Summary

The paper addresses a critical methodological gap in built‑environment research: the lack of a consistent, scalable, and privacy‑preserving way to quantify socially meaningful interaction, which is the conduit through which social infrastructure influences social capital. Drawing on Ekman and Friesen’s kinesics taxonomy, the authors define five communicative functions—emblems, illustrators, affect displays, adaptors, and regulators—that map directly onto social‑capital mechanisms such as trust, reciprocity, and norm formation.

To operationalize this theory, they introduce the Dyadic User Engagement Dataset (DUET). DUET contains 12 carefully designed dyadic activities covering all five kinesic functions. Data were captured from two participants simultaneously using four sensing modalities (RGB video, infrared, depth, and 3D skeletal keypoints) across three distinct campus locations, yielding 14,400 samples (1,200 per class), the highest sample‑to‑class ratio among publicly available dyadic datasets. The multimodal collection protocol varies viewpoints while keeping a fixed sensor, ensuring robustness to background variation.

The authors argue that conventional human activity recognition (HAR) models, trained on monadic datasets, excel at identifying physical actions but fail to infer the underlying social intent, especially in dyadic contexts. To overcome this, they propose an embedded kinesics recognition framework that bypasses handcrafted action‑to‑function dictionaries. The framework freezes a pre‑trained spatial‑temporal graph convolutional network (ST‑GCN) to extract low‑level motion embeddings from skeletal data, then attaches a trainable convolutional head that learns to cluster activities according to their communicative function. This transfer‑learning architecture enables function‑level inference directly from privacy‑preserving skeletal motion.

Benchmarking six state‑of‑the‑art HAR models on DUET demonstrates that these models achieve modest accuracy (often below 70 %) and struggle particularly with distinguishing adaptors and regulators, highlighting the inadequacy of monadic, action‑level approaches for socially grounded measurement. In contrast, the proposed ST‑GCN + CNN pipeline attains an average classification accuracy above 88 % across the five functions. Moreover, representation quality of the frozen ST‑GCN correlates strongly with classification performance (Pearson ρ = 0.91, 95 % CI


Comments & Academic Discussion

Loading comments...

Leave a Comment