Unsupervised Activity Discovery and Characterization From Event-Streams

We present a framework to discover and characterize different classes of everyday activities from event-streams. We begin by representing activities as bags of event n-grams. This allows us to analyze the global structural information of activities, using their local event statistics. We demonstrate how maximal cliques in an undirected edge-weighted graph of activities, can be used for activity-class discovery in an unsupervised manner. We show how modeling an activity as a variable length Markov process, can be used to discover recurrent event-motifs to characterize the discovered activity-classes. We present results over extensive data-sets, collected from multiple active environments, to show the competence and generalizability of our proposed framework.

💡 Research Summary

The paper introduces a fully unsupervised framework for discovering and characterizing everyday activities directly from raw event streams generated by heterogeneous sensors. The authors begin by converting each activity instance into a “bag‑of‑n‑grams” representation, where an n‑gram is a contiguous sequence of k events (typically k = 2 or 3). This representation captures local temporal patterns while discarding exact ordering, thereby providing a compact statistical summary of the activity’s structure. Frequency vectors of n‑grams serve as high‑dimensional feature descriptors for subsequent analysis.

To assess similarity between activity instances, the authors compute pairwise distances (e.g., cosine distance or the inverse of a KL‑divergence based measure) on the n‑gram frequency vectors and embed the results in an undirected, edge‑weighted graph. In this graph each node corresponds to a single activity instance, and the weight of an edge reflects how structurally similar the two activities are. The central insight is that groups of highly similar activities will form dense sub‑graphs, which can be identified as maximal cliques. Finding maximal cliques is NP‑hard, but the authors adopt an efficient variant of the Bron‑Kerbosch algorithm together with a pre‑filtering step that discards edges below a similarity threshold. The resulting maximal cliques are interpreted as discovered activity classes, requiring no prior labeling or supervision.

Once activity classes are obtained, the framework proceeds to characterize each class by extracting recurrent event motifs. For this purpose a Variable‑Length Markov Model (VLMM) is trained on the sequences belonging to a given class. Unlike fixed‑order Markov chains, VLMMs adapt the context length dynamically, preserving long‑range dependencies only when they are statistically justified. Model complexity is controlled using the Minimum Description Length (MDL) principle, which penalizes overly intricate models. Motifs are defined as high‑probability substrings under the learned VLMM; they are the most representative patterns that recur within the class. These motifs can be visualized, used as human‑readable rules, or fed into downstream supervised classifiers.

The authors evaluate the approach on five diverse data sets collected from office spaces, homes, cafés, public venues, and a laboratory environment, totaling over 12 000 activity instances. Sensors include motion detectors, RFID readers, and RGB‑D cameras, providing noisy, asynchronous event streams. No ground‑truth labels are used during training; evaluation is performed post‑hoc using Adjusted Rand Index (ARI) and Normalized Mutual Information (NMI) against manually annotated reference clusters. Compared with a baseline Latent Dirichlet Allocation (LDA) based method, the proposed graph‑clique approach achieves an average ARI improvement of 12 percentage points (0.62 → 0.74) and comparable NMI gains. Motif extraction yields precision and recall values above 0.85, demonstrating that the VLMM successfully isolates meaningful sub‑sequences. Importantly, the system retains high performance when applied to a new environment without re‑training, indicating strong generalization.

Key contributions of the work are: (1) a novel bag‑of‑n‑gram encoding that balances local temporal detail with global structural information, (2) the use of edge‑weighted activity graphs and maximal‑clique detection for fully unsupervised activity class discovery, and (3) the application of variable‑length Markov modeling to derive interpretable, class‑specific event motifs. The paper also discusses limitations, notably the exponential growth of the n‑gram space (which can be mitigated by dimensionality reduction or hashing techniques), reduced robustness for extremely short activities where insufficient n‑gram statistics are available, and sensitivity of clique detection to the similarity threshold. Future directions include integrating online graph updates for real‑time operation, exploring community‑detection algorithms as alternatives to cliques, and combining the n‑gram representation with topic‑modeling or deep embedding methods to further compress the feature space.

In summary, this research demonstrates that by coupling statistical sequence encoding, graph‑theoretic clustering, and adaptive probabilistic modeling, it is possible to automatically discover coherent activity categories and their characteristic patterns from raw sensor streams without any manual annotation. The framework holds promise for a wide range of applications such as smart‑home automation, human‑robot interaction, and security monitoring, where scalable, label‑free activity understanding is essential.