AtGCN: A Graph Convolutional Network For Ataxic Gait Detection

AtGCN: A Graph Convolutional Network For Ataxic Gait Detection
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Video-based gait analysis can be defined as the task of diagnosing pathologies, such as ataxia, using videos of patients walking in front of a camera. This paper presents a graph convolution network called AtGCN for detecting ataxic gait and identifying its severity using 2D videos. The problem is especially challenging as the deviation of an ataxic gait from a healthy gait is very subtle. The datasets for ataxic gait detection are also quite small, with the largest dataset having only 149 videos. The paper addresses the first problem using special spatiotemporal graph convolution that successfully captures important gait-related features. To handle the small dataset size, a deep spatiotemporal graph convolution network pre-trained on an action recognition dataset is systematically truncated and then fine-tuned on the ataxia dataset to obtain the AtGCN model. The paper also presents an augmentation strategy that segments a video sequence into multiple gait cycles. The proposed AtGCN model then operates on a graph of body part locations belonging to a single gait cycle. The evaluation results support the strength of the proposed AtGCN model, as it outperforms the state-of-the-art in detection and severity prediction with an accuracy of 93.46% and a MAE of 0.4169, respectively, while being 5.5 times smaller than the state-of-the-art.


💡 Research Summary

The paper introduces AtGCN, a specialized spatiotemporal graph convolutional network designed to detect ataxic gait and estimate its severity from ordinary 2‑D video recordings. The authors first convert each video into a sequence of 2‑D skeletal keypoints using OpenPose, and track individuals across frames with DeepSORT. To increase the effective training data, they segment each video into individual gait cycles by analyzing the sinusoidal pattern of the distance between the left and right ankles; smoothing filters (Savitzky‑Golay and moving average) are applied to reduce noise. Each gait cycle is then represented as a spatiotemporal graph: nodes correspond to body joints at each time step (including x, y coordinates and confidence scores), intra‑frame edges follow the anatomical connectivity defined by OpenPose’s part‑affinity fields, and inter‑frame edges connect the same joint across consecutive frames.
The core of AtGCN builds on the ST‑GCN architecture (Yan et al., 2018) but adapts it for the subtle variations characteristic of ataxic gait. A neighbor set for each node is defined both spatially (within one graph hop) and temporally (within a configurable window Γ). Nodes are partitioned into three spatial labels (same distance, closer, farther from the body centre) and temporal labels are added linearly, enabling a weight‑sharing scheme that respects both spatial hierarchy and temporal ordering. The network consists of six spatiotemporal graph convolution blocks: the first four with 64 channels, the last two with 128 channels, each using a temporal kernel size of 9. Batch normalization, a dropout of 0.5 after the first four blocks, and global average pooling are employed to improve stability and prevent over‑fitting. Classification is performed with a SoftMax head, while severity regression uses a 1×1 convolution to output a continuous SARA gait score.
Because publicly available ataxic‑gait datasets are tiny (the largest contains only 149 videos), the authors adopt a transfer‑learning strategy. They start from a model pre‑trained on the large‑scale Kinetics‑400 action‑recognition dataset, which contains ten spatiotemporal graph blocks. By systematically truncating the backbone to 5, 6, or 7 blocks and fine‑tuning the remaining layers on the ataxia data, they obtain compact models ranging from 0.38 M to 0.78 M parameters. Training uses stochastic gradient descent (learning rate = 3 × 10⁻⁵, batch size = 64, 500 epochs) on a single NVIDIA V100 GPU.
Evaluation is carried out on two publicly released datasets: Auto‑Gait (89 participants, 149 videos, SARA scores 0‑6) and CA‑Gait (20 participants, 40 videos, simulated ataxic gait). Data augmentation via gait‑cycle splitting expands the training set up to threefold. The authors perform 20 repetitions of 10‑fold cross‑validation on Auto‑Gait and a 5‑fold scheme on CA‑Gait. AtGCN with six blocks achieves 93.46 % accuracy, 93.76 % F1‑score, and 93.36 % ROC‑AUC on Auto‑Gait, outperforming the prior Random‑Forest baseline (≈83 % accuracy) and the lightweight GaitGraph model (≈84 % accuracy). The model also predicts SARA severity with a mean absolute error of 0.4169, a substantial improvement over earlier methods. Ablation studies confirm that the full spatiotemporal graph formulation, the chosen temporal kernel size, and the six‑block depth are critical for optimal performance.
The authors discuss the practical implications of their work: despite the limited size of medical video datasets, careful pre‑training, model truncation, and gait‑cycle augmentation enable a deep graph network to generalize well. The reliance solely on 2‑D RGB cameras makes the approach inexpensive and easily deployable in routine clinical settings, avoiding the privacy concerns associated with depth sensors. Future directions include integrating 3‑D pose estimation, multimodal sensor fusion, and real‑time deployment on edge devices. In summary, AtGCN demonstrates that graph‑based spatiotemporal learning can effectively capture the nuanced dynamics of ataxic gait, delivering high‑accuracy detection and reliable severity estimation while maintaining a lightweight footprint suitable for real‑world medical applications.


Comments & Academic Discussion

Loading comments...

Leave a Comment