Predicting Anemia Among Under-Five Children in Nepal Using Machine Learning and Deep Learning

Predicting Anemia Among Under-Five Children in Nepal Using Machine Learning and Deep Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Childhood anemia remains a major public health challenge in Nepal and is associated with impaired growth, cognition, and increased morbidity. Using World Health Organization hemoglobin thresholds, we defined anemia status for children aged 6-59 months and formulated a binary classification task by grouping all anemia severities as \emph{anemic} versus \emph{not anemic}. We analyzed Nepal Demographic and Health Survey (NDHS 2022) microdata comprising 1,855 children and initially considered 48 candidate features spanning demographic, socioeconomic, maternal, and child health characteristics. To obtain a stable and substantiated feature set, we applied four features selection techniques (Chi-square, mutual information, point-biserial correlation, and Boruta) and prioritized features supported by multi-method consensus. Five features: child age, recent fever, household size, maternal anemia, and parasite deworming were consistently selected by all methods, while amenorrhea, ethnicity indicators, and provinces were frequently retained. We then compared eight traditional machine learning classifiers (LR, KNN, DT, RF, XGBoost, SVM, NB, LDA) with two deep learning models (DNN and TabNet) using standard evaluation metrics, emphasizing F1-score and recall due to class imbalance. Among all models, logistic regression attained the best recall (0.701) and the highest F1-score (0.649), while DNN achieved the highest accuracy (0.709), and SVM yielded the strongest discrimination with the highest AUC (0.736). Overall, the results indicate that both machine learning and deep learning models can provide competitive anemia prediction and the interpretable features such as child age, infection proxy, maternal anemia, and deworming history are central for risk stratification and public health screening in Nepal.


💡 Research Summary

This study addresses the pressing public‑health issue of childhood anemia in Nepal by developing and benchmarking predictive models using the 2022 Nepal Demographic and Health Survey (NDHS) micro‑data. The authors defined anemia status for children aged 6–59 months according to World Health Organization hemoglobin thresholds and framed the problem as a binary classification: “anemic” versus “not anemic.” After cleaning the dataset, 1,855 observations remained, each described by 48 candidate variables spanning demographic, socioeconomic, maternal, and child‑health domains.

To obtain a robust and interpretable feature set, four feature‑selection techniques were applied in parallel: Chi‑square test, mutual information, point‑biserial correlation, and the Boruta algorithm (a random‑forest‑based wrapper). Consensus across these methods identified five variables that were selected by all: child age, recent fever, household size, maternal anemia, and parasite deworming. Additional variables such as amenorrhea, ethnicity indicators, and provincial identifiers were retained because they were chosen by three of the four methods. In total, 13 features were fed into the predictive models.

The dataset exhibited class imbalance (38.5 % anemic, 61.5 % non‑anemic). To mitigate this, the Synthetic Minority Over‑sampling Technique (SMOTE) was integrated into the training pipeline, and stratified 80/20 train‑test splitting was used. Model evaluation employed repeated stratified 5‑fold cross‑validation (three repeats) and a suite of performance metrics: accuracy, precision, recall, F1‑score, area under the ROC curve (AUC), average precision, and Cohen’s Kappa.

Eight traditional machine‑learning classifiers were compared: Logistic Regression (LR), K‑Nearest Neighbors (KNN), Decision Tree (DT), Random Forest (RF), XGBoost, Support Vector Machine (SVM), Naïve Bayes (NB), and Linear Discriminant Analysis (LDA). Two deep‑learning architectures were also tested: a fully connected Deep Neural Network (DNN) and TabNet, a recent attention‑based model designed for tabular data. Hyper‑parameter tuning was performed via exhaustive grid search for each algorithm.

Performance results revealed that Logistic Regression achieved the highest recall (0.701) and the best F1‑score (0.649), indicating superior ability to identify anemic children while maintaining a balanced trade‑off between precision and recall. The DNN attained the highest overall accuracy (0.709) but lagged behind LR in recall, suggesting a higher rate of false negatives. SVM produced the greatest discrimination power with an AUC of 0.736, reflecting strong capability to separate the two classes when non‑linear decision boundaries are needed. Random Forest and XGBoost delivered moderate results, and TabNet, despite its built‑in interpretability via sparse feature masks, did not surpass the traditional models in any metric.

Statistical analysis of the selected features confirmed known epidemiological patterns: younger children (especially 6–12 months) faced the highest anemia risk; recent fever increased odds by roughly 1.4‑fold; larger households showed a modest risk elevation; maternal anemia more than doubled the child’s odds; and receipt of deworming tablets reduced risk by about 35 %. Ethnicity and provincial location also displayed significant associations, with Madhesh province and certain hill‑caste groups exhibiting higher prevalence.

The authors discuss several key insights. First, multi‑method feature selection mitigates bias inherent in any single technique and yields a set of variables that are both statistically robust and clinically meaningful. Second, in a moderate‑size health survey dataset, simpler linear models can outperform complex deep‑learning architectures, especially when the priority is high recall for screening purposes. Third, integrating SMOTE within the modeling pipeline effectively addresses class imbalance without leaking information from the test set. Fourth, the interpretability of Logistic Regression makes it a practical tool for policymakers who need transparent risk scores to guide targeted interventions such as maternal nutrition programs, child deworming campaigns, and focused health‑education efforts.

Limitations noted include the cross‑sectional nature of the NDHS data, which precludes causal inference; exclusion of several potentially relevant variables due to high missingness (e.g., iron‑supplement intake); and the relatively small sample size for deep‑learning, which may have contributed to overfitting. The authors recommend future work to incorporate longitudinal data, environmental covariates (e.g., food security, climate), and model compression techniques to enable deployment on low‑resource mobile platforms for community health workers.

In conclusion, this paper provides the first comprehensive comparison of machine‑learning and deep‑learning methods for predicting childhood anemia in Nepal using nationally representative survey data. It demonstrates that a well‑tuned Logistic Regression model, built on a consensus‑derived feature set, offers the most reliable screening performance, while also delivering clear, actionable insights for public‑health decision‑making.


Comments & Academic Discussion

Loading comments...

Leave a Comment