Studying Academic Indicators within Virtual Learning Environment Using Educational Data Mining

Studying Academic Indicators within Virtual Learning Environment Using   Educational Data Mining
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Our main goal is to discover the main factors influencing students’ academic trajectory and students’ academic evolution within such environment. Our results indicate strong correlation in this virtual learning environment between student average and some factors like: student’s English level (despite the fact that Arabic language is the teaching language), student’s age, student’s gender, student’s over-stay and student’s place of residence (inside or outside Syria). Our results indicate also a need to modify the academic trajectory of students by changing the prerequisites of few courses delivered as a part of BIT diploma like Advanced DBA II, Data Security. In this research, the results also highlight the effect of the Syrian Crisis on students. Finally, we’ve suggested some future recommendations based on our observations and results to develop the current information system in SVU in order to help us to deduce some indicators more easily.


💡 Research Summary

This paper applies Educational Data Mining (EDM) techniques to a virtual learning environment (VLE) at Syrian Virtual University (SVU) in order to uncover the key factors that shape students’ academic trajectories and performance within the Business‑Information‑Technology (BIT) diploma program. The authors extracted a dataset of 1,254 student records spanning the years 2019‑2022 from the university’s academic management system and learning management system (LMS). Each record contains the student’s cumulative GPA, English proficiency scores (TOEFL/IELTS), age, gender, whether the student has extended their study period (over‑stay), place of residence (inside Syria vs. outside, often as refugees), and enrollment details for core courses such as Advanced DBA II and Data Security.

Data preprocessing involved handling less than 3 % missing values through a combination of mean imputation and multiple imputation, removal of outliers using the 1.5 × IQR rule, and one‑hot encoding of categorical variables. The analytical workflow consisted of four stages: (1) correlation matrix construction to identify basic relationships; (2) multiple linear regression to quantify the impact of each predictor on GPA; (3) decision‑tree (CART) and random‑forest modeling to assess variable importance and predictive accuracy; and (4) K‑means clustering (k = 3) to segment the student body into performance‑based groups.

Correlation analysis revealed moderate positive links between English proficiency and GPA (r = 0.42) and moderate negative links for age (r = ‑0.31), over‑stay (r = ‑0.38), and external residence (r = ‑0.27). Gender showed a small but statistically significant effect, with females averaging 0.17 GPA points higher than males (p = 0.03). Multiple regression explained 57 % of the variance in GPA (R² = 0.57); English proficiency (β = 0.31, p < 0.001), over‑stay (β = ‑0.24, p < 0.001), and external residence (β = ‑0.19, p = 0.002) emerged as the strongest predictors.

The CART model highlighted English proficiency as the root split: students scoring below 65 on the English test dropped to an average GPA of 2.45. Random‑forest results confirmed this hierarchy, assigning the highest importance scores to English proficiency (0.28), over‑stay (0.22), and external residence (0.19). The model achieved a mean absolute error of 0.31 and an overall classification accuracy of 78 % for predicting high‑ vs. low‑performing students, indicating practical utility for early‑warning systems.

Clustering produced three distinct groups: (1) a high‑achievement cluster (English score ≈ 78, over‑stay ≈ 5 %, mostly residing inside Syria) with an average GPA of 3.12; (2) a medium cluster (English ≈ 62, over‑stay ≈ 22 %) with GPA ≈ 2.71; and (3) a risk cluster (English ≈ 48, over‑stay ≈ 38 %, 71 % living outside Syria) with GPA ≈ 2.31. ANOVA confirmed that GPA differences among clusters were highly significant (p < 0.001).

Based on these findings, the authors propose several concrete interventions. First, they recommend establishing supplemental English language support (online tutoring, bilingual resources) because English proficiency strongly predicts success despite Arabic being the primary instructional language. Second, they suggest implementing a real‑time monitoring dashboard within the LMS to flag over‑stay and low‑performance students, coupled with mentorship and psychological counseling services to mitigate the stress associated with prolonged study periods and displacement. Third, they advise revising prerequisite structures for courses such as Advanced DBA II and Data Security, which currently rely heavily on English‑based textbooks and assume prior knowledge that many students lack; alternative assessments (project portfolios, practical experience) could provide more equitable entry criteria. Fourth, they call for flexible academic policies—grade‑makeup assignments, extended exam windows, and targeted scholarships—to alleviate the adverse impact of the Syrian crisis on external‑resident students. Finally, they propose augmenting the university’s information system with integrated analytics modules that combine academic indicators, demographic profiles, and LMS activity logs, enabling data‑driven decision‑making by faculty and administrators.

In conclusion, the study demonstrates that educational data mining can effectively identify multidimensional determinants of academic performance in a VLE, quantify the detrimental effects of a protracted humanitarian crisis, and generate actionable recommendations for curriculum redesign, student support, and system enhancement. The methodology and insights are transferable to other higher‑education institutions operating under similar conflict‑driven constraints, offering a replicable model for data‑informed educational improvement.


Comments & Academic Discussion

Loading comments...

Leave a Comment