Hybrid Deep Learning Framework for CSI-Based Activity Recognition in Bandwidth-Constrained Wi-Fi Sensing
This paper presents a novel hybrid deep learning framework designed to enhance the robustness of CSI-based Human Activity Recognition (HAR) within bandwidth-constrained Wi-Fi sensing environments. The core of our proposed methodology is a preliminary Doppler trace extraction stage, implemented to amplify salient motion-related signal features before classification. Subsequently, these enhanced inputs are processed by a hybrid neural architecture, which integrates Inception networks responsible for hierarchical spatial feature extraction and Bidirectional Long Short-Term Memory (BiLSTM) networks that capture temporal dependencies. A Support Vector Machine (SVM) is then utilized as the final classification layer to optimize decision boundaries. The framework’s efficacy was systematically validated using a public dataset across 20, 40, and 80 MHz bandwidth configurations. The model yielded accuracies of 89.27% (20 MHz), 94.13% (40 MHz), and 95.30% (80 MHz), respectively. These results confirm a marked superiority over standalone deep learning baselines, especially in the most constrained low-bandwidth scenarios. This study underscores the utility of combining Doppler-based feature engineering with a hybrid learning architecture for reliable HAR in bandwidth-limited wireless sensing applications.
💡 Research Summary
The paper introduces IBIS, a hybrid deep‑learning framework designed to maintain high human activity recognition (HAR) accuracy when Wi‑Fi channel state information (CSI) is captured under severe bandwidth constraints. Recognizing that narrow bandwidths (e.g., 20 MHz) dramatically reduce the number of usable sub‑carriers and thus degrade the richness of CSI‑based features, the authors first apply a Doppler‑trace extraction step to the CSI phase component. By converting phase variations into time‑frequency spectrograms, motion‑related information is amplified while amplitude, which is more susceptible to noise and hardware variability, is discarded.
The processed Doppler spectrograms are then fed into a two‑stage neural architecture. An Inception module performs multiscale 1‑D convolutions (kernel sizes 1, 3, 5) in parallel, extracting spatial (frequency‑domain) patterns that survive even when the spectral resolution is limited. The resulting feature maps are passed to a bidirectional LSTM (BiLSTM) that captures forward and backward temporal dependencies, crucial for distinguishing activities with similar instantaneous Doppler signatures (e.g., walking vs. running).
Instead of relying solely on the softmax output of the BiLSTM, the authors employ a Support Vector Machine (SVM) as a final classifier. The SVM receives the probability vector from the BiLSTM and is tuned via exhaustive grid search over kernel type (polynomial, radial basis function, sigmoid) and hyper‑parameters C and γ. This post‑processing step maximizes the margin between classes, which is especially beneficial when the underlying feature space is low‑dimensional due to bandwidth reduction.
Experiments are conducted on the publicly available CSI dataset of Meneghello et al., focusing on the most challenging scenario (S7) that contains many reflective surfaces and strong interference. The dataset is artificially constrained to three bandwidths—20 MHz, 40 MHz, and 80 MHz—by selecting appropriate sub‑carrier subsets. Five activities (Empty, Sitting, Walking, Running, Jumping) are represented, yielding 234 samples collected with an 80 MHz IEEE 802.11ac channel and processed with the Nexmon tool. For each bandwidth, the authors repeat training and testing ten times and report average accuracies.
Results show that IBIS achieves 89.27 % accuracy at 20 MHz, 94.13 % at 40 MHz, and 95.30 % at 80 MHz. In comparison, a baseline model that uses only the Inception network (no BiLSTM, no SVM, no Doppler preprocessing) reaches 73.22 %, 86.48 %, and 94.13 % respectively. The most pronounced gain—16.07 percentage points—occurs at the lowest bandwidth, confirming that the Doppler‑trace extraction combined with temporal modeling and SVM refinement effectively recovers information lost due to spectral narrowing. Kernel analysis reveals that a polynomial kernel works best for 20 MHz, while the RBF kernel is optimal for 40 MHz and 80 MHz. An additional antenna‑merge strategy (majority voting across multiple antennas) provides a modest further boost.
The authors conclude that the three‑fold integration of Doppler‑based feature engineering, multiscale spatial learning (Inception), bidirectional temporal modeling (BiLSTM), and margin‑maximizing classification (SVM) constitutes a robust solution for bandwidth‑constrained Wi‑Fi sensing. They highlight three contributions: (1) the novel IBIS pipeline, (2) a systematic evaluation of bandwidth impact on CSI‑based HAR, and (3) empirical evidence of substantial performance improvements in low‑bandwidth regimes. Future work is outlined to include real‑time lightweight implementations, multi‑user signal separation, and multimodal fusion with other wireless or optical sensing modalities.
Comments & Academic Discussion
Loading comments...
Leave a Comment