Comparative Evaluation of Machine Learning Algorithms for Affective State Recognition from Children's Drawings
Autism spectrum disorder (ASD) represents a neurodevelopmental condition characterized by difficulties in expressing emotions and communication, particularly during early childhood. Understanding the affective state of children at an early age remains challenging, as conventional assessment methods are often intrusive, subjective, or difficult to apply consistently. This paper builds upon previous work on affective state recognition from children’s drawings by presenting a comparative evaluation of machine learning models for emotion classification. Three deep learning architectures – MobileNet, EfficientNet, and VGG16 – are evaluated within a unified experimental framework to analyze classification performance, robustness, and computational efficiency. The models are trained using transfer learning on a dataset of children’s drawings annotated with emotional labels provided by psychological experts. The results highlight important trade-offs between lightweight and deeper architectures when applied to drawing-based affective computing tasks, particularly in mobile and real-time application contexts.
💡 Research Summary
This paper addresses the challenge of non‑invasive emotional assessment for children with autism spectrum disorder (ASD) by leveraging free‑form drawings as a natural expressive medium. The authors assembled a dataset of 1,472 drawings, each annotated by psychological experts with one of five dominant affective states: happy, sad, angry, fear, and insecure. To mitigate moderate class imbalance, they applied on‑the‑fly data augmentation (±15° rotations, ±5% zoom, ±5% translations) during training while preserving the semantic integrity of the drawings. All images were resized to 224 × 224 RGB to match the input requirements of three pretrained convolutional neural networks: MobileNetV2, EfficientNetB0, and VGG16.
A unified experimental protocol was employed to ensure a fair comparison. The train‑test split was fixed at 75 %/25 %, batch size, early‑stopping criteria, optimizer (Adam), and learning‑rate schedule were identical across models. The convolutional backbones were frozen after loading ImageNet weights, and a lightweight classification head consisting of global average pooling, dropout, and a soft‑max layer was attached to each. Performance metrics included overall accuracy, loss, precision, recall, F1‑score, and per‑class confusion matrices.
Results show a clear hierarchy of effectiveness. EfficientNetB0 achieved the highest accuracy of 62.77 % with a loss of 1.8688, demonstrating rapid convergence and stable validation performance. Its compound scaling strategy—jointly optimizing depth, width, and resolution—allowed it to capture richer visual cues from the abstract drawings without a proportional increase in computational cost. MobileNetV2 followed closely with 59.24 % accuracy and a lower loss of 1.1821, confirming its suitability for resource‑constrained, real‑time applications due to its reduced parameter count and faster inference. In contrast, VGG16 lagged significantly, reaching only 46.20 % accuracy and a loss of 1.7367; the deep, parameter‑heavy architecture suffered from underfitting and poor generalization on the relatively small, heterogeneous dataset.
Confusion‑matrix analysis revealed that EfficientNet excelled at recognizing “happy” and “fear” categories, while most misclassifications occurred between emotionally adjacent classes such as “sad” and “insecure,” reflecting inherent visual overlap in children’s drawings. MobileNet displayed a similar pattern but showed slightly higher confusion between “angry” and “sad.” VGG16 exhibited the highest overall misclassification rates, especially for minority classes, indicating overfitting and insufficient adaptation of deep features to this domain.
The authors discuss practical implications for deployment in assistive technologies, notably the PandaSays mobile application, which aims to provide caregivers and clinicians with AI‑driven insights into a child’s affective state based on their drawings. EfficientNet’s superior accuracy combined with acceptable computational demands makes it a strong candidate for integration, while MobileNet offers a lightweight alternative for devices with stricter resource limits. VGG16’s performance suggests that deeper, unoptimized models are less appropriate for this task unless larger, more diverse datasets become available.
In summary, the study demonstrates that efficient deep‑learning architectures—particularly those employing balanced scaling like EfficientNet—outperform both lighter and deeper counterparts for affective state recognition from children’s drawings. The findings support the feasibility of non‑intrusive, AI‑assisted emotional assessment tools for ASD populations and highlight the importance of model‑dataset alignment, careful augmentation, and computational efficiency for real‑world deployment. Future work is suggested to expand the dataset, explore multimodal inputs (e.g., speech, text), and incorporate explainability techniques to further enhance clinical trust and usability.
Comments & Academic Discussion
Loading comments...
Leave a Comment