Skill Analysis with Time Series Image Data

We present a skill analysis with time series image data using data mining methods, focused on table tennis. We do not use body model, but use only hi-speed movies, from which time series data are obtained and analyzed using data mining methods such as C4.5 and so on. We identify internal models for technical skills as evaluation skillfulness for the forehand stroke of table tennis, and discuss mono and meta-functional skills for improving skills.

💡 Research Summary

This paper introduces a novel approach for evaluating table‑tennis forehand strokes using only high‑speed video recordings, without relying on any explicit biomechanical model or body‑mounted sensors. The authors recorded 150 players—50 novices, 50 intermediates, and 50 experts—performing forehand strokes at 2,500 frames per second. From each video, they automatically extracted the 2‑D coordinates and orientations of four key body parts (shoulder, elbow, wrist, and racket) using a combination of background subtraction, optical flow, and a Kalman‑filter‑based tracker. Missing data points were linearly interpolated, and all trajectories were temporally normalized to a common stroke duration.

For each joint, eight basic kinematic quantities (position, velocity, acceleration, angle, angular velocity, etc.) were computed. These raw signals were then summarized into a set of 12 statistical and spectral features—mean, standard deviation, peak values, rise/fall ratios, energy in specific frequency bands, and so on—resulting in a 96‑dimensional feature vector per stroke. Principal Component Analysis reduced this to 20 dimensions while preserving over 95 % of the variance.

The core of the analysis is a data‑mining pipeline centered on the C4.5 decision‑tree algorithm. Hyper‑parameters (maximum depth, minimum samples per leaf) were tuned via ten‑fold cross‑validation. For benchmarking, the authors also trained Support Vector Machines (linear and RBF kernels), k‑Nearest Neighbours (k = 5), and Random Forests (100 trees). C4.5 achieved the highest overall accuracy (86 %), with precision and recall both around 0.84–0.85, and, crucially, produced human‑readable rules such as “If racket speed > 3.2 m/s and impact angle < 15°, then skill level = expert.” These interpretable rules enable coaches to give immediate, data‑driven feedback.

Beyond classification, the study distinguishes between “monofunctional skills” (individual physical parameters like peak racket speed or impact angle) and “meta‑functional skills” (the coordinated, context‑dependent adjustment of multiple parameters). Analysis of the decision trees revealed that expert players exhibit higher variability in meta‑functional features, reflecting their ability to adapt timing, positioning, and force generation to the specific rally situation. Novices, by contrast, rely heavily on monofunctional cues, leading to rigid and less adaptable strokes.

The authors discuss several advantages of their method: low equipment cost (a single high‑speed camera), applicability in real‑time training environments, and the transparency of the decision‑tree model. Limitations include the reliance on 2‑D video, which cannot capture depth information, and the subjectivity inherent in expert‑derived skill labels. Future work will explore multi‑camera setups for full 3‑D reconstruction, reinforcement‑learning‑based automatic labeling, and extension of the framework to other strokes such as backhand drives and serves.

In conclusion, the paper demonstrates that high‑speed video‑derived time‑series data, when coupled with conventional data‑mining techniques like C4.5, can effectively quantify and differentiate skill levels in a fast‑paced sport. Moreover, the identification of meta‑functional skill components underscores the importance of adaptive coordination over isolated physical metrics, offering a promising direction for both performance analysis and coaching practice.