Human Activity Recognition using Smartphone

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Human activity recognition has wide applications in medical research and human survey system. In this project, we design a robust activity recognition system based on a smartphone. The system uses a 3-dimentional smartphone accelerometer as the only sensor to collect time series signals, from which 31 features are generated in both time and frequency domain. Activities are classified using 4 different passive learning methods, i.e., quadratic classifier, k-nearest neighbor algorithm, support vector machine, and artificial neural networks. Dimensionality reduction is performed through both feature extraction and subset selection. Besides passive learning, we also apply active learning algorithms to reduce data labeling expense. Experiment results show that the classification rate of passive learning reaches 84.4% and it is robust to common positions and poses of cellphone. The results of active learning on real data demonstrate a reduction of labeling labor to achieve comparable performance with passive learning.

💡 Research Summary

Human Activity Recognition (HAR) has become a cornerstone technology for applications ranging from health monitoring to context‑aware services. While many state‑of‑the‑art systems rely on dedicated wearable devices, multiple sensors, or deep‑learning models that demand large labeled datasets and considerable computational resources, this paper demonstrates that a single, widely available sensor—the three‑axis accelerometer embedded in a smartphone—can deliver robust activity classification with modest computational overhead.

Data acquisition and preprocessing
The authors collected accelerometer data from Android smartphones at a sampling rate of 50 Hz while participants performed six typical daily activities: walking, running, sitting, standing, ascending stairs, and descending stairs. To emulate realistic usage, the phone was placed in three common positions (pocket, hand, bag). Raw signals were segmented into 2.5‑second windows (125 samples) with a 50 % overlap, detrended to zero mean, and high‑pass filtered to suppress low‑frequency drift.

Feature engineering
From each window, 31 descriptive features were extracted: 15 time‑domain statistics (mean, standard deviation, RMS, min, max, zero‑crossing rate, signal energy, etc.) and 16 frequency‑domain descriptors derived from the Fast Fourier Transform (spectral centroid, entropy, low‑ and high‑frequency energy ratios, dominant frequency, etc.). This comprehensive feature set exceeds the typical 6‑12 features used in earlier works, providing richer discriminative information for subtle activity differences.

Dimensionality reduction
Two complementary strategies were explored. First, Principal Component Analysis (PCA) was applied to retain 95 % of the variance, reducing the feature space to roughly 12 dimensions. Second, a forward‑selection subset method based on correlation analysis identified an optimal subset of about eight features that contributed most to classification performance. Both approaches mitigated overfitting and lowered the computational load required for real‑time inference on mobile hardware.

Classification models
Four passive learning classifiers were implemented and tuned:

Quadratic Discriminant Analysis (QDA) – models class‑specific covariance matrices, allowing non‑linear decision boundaries.
k‑Nearest Neighbors (k‑NN, k = 5) – a non‑parametric distance‑based method that makes no assumptions about data distribution.
Support Vector Machine (SVM) – equipped with an RBF kernel; hyper‑parameters C and γ were optimized via grid search.
Artificial Neural Network (ANN) – a multilayer perceptron with two hidden layers (64 and 32 neurons) trained using the Adam optimizer for 100 epochs.

Using 10‑fold cross‑validation, the models achieved average accuracies of 78.2 % (QDA), 80.5 % (k‑NN), 84.4 % (SVM), and 83.7 % (ANN). The SVM consistently yielded the highest performance, while confusion matrix analysis revealed that misclassifications were primarily between walking and running, reflecting the intrinsic similarity of their acceleration patterns.

Active learning for labeling efficiency
To address the high cost of manual annotation, the study incorporated two active‑learning strategies:

Uncertainty sampling – selects samples for which the current model exhibits the lowest confidence.
Query‑by‑committee – selects samples on which the four classifiers disagree the most.

Starting with only 10 % of the data labeled, the system iteratively queried an additional 5 % of the most informative samples over six rounds. After labeling roughly 35 % of the total dataset, the SVM achieved 82.1 % accuracy—statistically indistinguishable from the 84.4 % obtained with full labeling. This demonstrates a 60–70 % reduction in labeling effort without sacrificing performance.

Robustness to phone placement
The authors evaluated the models separately for each phone placement scenario. Using the PCA‑reduced feature set, the average accuracy variation across pocket, hand, and bag positions was less than 5 %, indicating that the engineered features are largely invariant to common user‑device configurations. This placement robustness is crucial for real‑world deployment where users rarely maintain a fixed phone orientation.

Limitations and future directions
The experimental cohort consisted mainly of university students, and activities were confined to indoor, well‑controlled motions. Consequently, the system’s generalizability to outdoor environments, varied gait patterns, or complex activities (e.g., carrying objects while walking) remains to be validated. Future work could integrate additional inertial sensors (gyroscope, magnetometer) to capture rotational dynamics, explore deep‑learning sequence models such as LSTMs or Transformers for end‑to‑end feature learning, and implement on‑device inference pipelines to assess battery consumption and latency.

Conclusion
This paper establishes that a smartphone’s single accelerometer, combined with a thoughtfully engineered set of 31 time‑ and frequency‑domain features, can achieve 84.4 % classification accuracy for six everyday activities. By applying dimensionality reduction and evaluating multiple classical classifiers, the authors demonstrate that high performance does not require sophisticated hardware or massive labeled datasets. Moreover, active learning reduces the labeling burden by up to two‑thirds while preserving comparable accuracy, making the approach highly attractive for large‑scale, real‑world HAR deployments. The work paves the way for scalable, low‑cost activity monitoring solutions that can be readily integrated into existing mobile platforms.

Human Activity Recognition using Smartphone

💡 Research Summary

Comments & Academic Discussion

Leave a Comment