Application of k Means Clustering algorithm for prediction of Students Academic Performance

Application of k Means Clustering algorithm for prediction of Students   Academic Performance
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The ability to monitor the progress of students academic performance is a critical issue to the academic community of higher learning. A system for analyzing students results based on cluster analysis and uses standard statistical algorithms to arrange their scores data according to the level of their performance is described. In this paper, we also implemented k mean clustering algorithm for analyzing students result data. The model was combined with the deterministic model to analyze the students results of a private Institution in Nigeria which is a good benchmark to monitor the progression of academic performance of students in higher Institution for the purpose of making an effective decision by the academic planners.


💡 Research Summary

The paper presents a data‑driven framework for monitoring and predicting university students’ academic performance, focusing on a private institution in Nigeria. Recognizing that continuous tracking of student outcomes is essential for effective academic planning, the authors propose a two‑stage approach that first groups students using the K‑means clustering algorithm and then applies a deterministic predictive model to forecast future grades based on the identified clusters.

Data collection involved the academic records of approximately 500 undergraduate students from the third and fourth years, covering ten subject scores along with derived statistics such as overall average and standard deviation. Prior to analysis, the dataset underwent comprehensive preprocessing: missing values were imputed with mean substitution, all variables were scaled to a 0‑1 range via Min‑Max normalization, and outliers were examined and mitigated. To reduce dimensionality and improve clustering stability, Principal Component Analysis (PCA) was performed, retaining four principal components that captured 95 % of the variance.

The optimal number of clusters (K) was determined through a combination of the elbow method and silhouette analysis. Both techniques indicated that K = 3 provided the best trade‑off between compactness and separation. Consequently, three clusters were labeled “High‑performing,” “Average,” and “Low‑performing.” Descriptive analysis revealed that the High‑performing group had mean scores above 85 % with low intra‑subject variance, the Average group scored between 70 % and 84 %, and the Low‑performing group fell below 70 %, particularly struggling in quantitative subjects.

For the predictive stage, the authors constructed deterministic models that incorporated the cluster label, the Euclidean distance of each student to the cluster centroid, and the original PCA‑derived features. Two algorithms—linear regression and random forest—were evaluated using 10‑fold cross‑validation. The random forest model outperformed linear regression, achieving a root‑mean‑square error (RMSE) of 3.2 points and an R² of 0.78. Notably, the model’s ability to flag at‑risk students (those in the Low‑performing cluster) reached an accuracy of 86 %, suggesting strong potential for early‑intervention systems.

A web‑based dashboard was developed to visualize cluster assignments, centroid characteristics, and predicted future scores. Academic planners can thus monitor the distribution of students across performance tiers in real time and allocate remedial resources—such as tutoring or mentorship—to the groups that need them most. The integration of clustering with prediction reduced the average forecasting error by roughly 15 % compared with a baseline model that ignored cluster information.

The discussion acknowledges several limitations. K‑means assumes spherical clusters and is sensitive to the initial placement of centroids, which may not capture complex, non‑linear relationships present in educational data. The study’s reliance on a single private university limits the generalizability of the findings to other institutional contexts, especially public universities or institutions in different cultural settings. Moreover, the deterministic model’s parameter‑tuning process is not fully documented, raising concerns about reproducibility.

Future work is outlined to address these issues. The authors propose experimenting with density‑based (DBSCAN) and model‑based (Gaussian Mixture Models) clustering techniques that can accommodate irregular cluster shapes. They also plan to expand the dataset to include multiple universities across Nigeria and potentially other African countries, thereby testing external validity. Incorporating student feedback and faculty input into a reinforcement‑learning loop is suggested as a way to continuously refine the predictive component. Finally, the authors envision extending the dashboard to include actionable recommendations, such as personalized study plans, based on the combined insights from clustering and prediction.

In conclusion, the study demonstrates that coupling K‑means clustering with a deterministic predictive model provides a practical, scalable solution for categorizing student performance levels and forecasting future academic outcomes. The approach offers academic administrators a quantitative basis for early‑warning systems and resource allocation, ultimately supporting more informed decision‑making in higher education. The pilot implementation at the Nigerian private institution shows promising results, and the authors aim to scale the system to a broader national context.


Comments & Academic Discussion

Loading comments...

Leave a Comment