Performance Tuning Of J48 Algorithm For Prediction Of Soil Fertility
Data mining involves the systematic analysis of large data sets, and data mining in agricultural soil datasets is exciting and modern research area. The productive capacity of a soil depends on soil fertility. Achieving and maintaining appropriate levels of soil fertility, is of utmost importance if agricultural land is to remain capable of nourishing crop production. In this research, Steps for building a predictive model of soil fertility have been explained. This paper aims at predicting soil fertility class using decision tree algorithms in data mining . Further, it focuses on performance tuning of J48 decision tree algorithm with the help of meta-techniques such as attribute selection and boosting.
💡 Research Summary
The paper addresses the problem of predicting soil fertility classes using data mining techniques, focusing specifically on the J48 decision‑tree algorithm (the WEKA implementation of C4.5). Soil fertility, defined by a set of physicochemical properties such as pH, organic matter, phosphorus, potassium, electrical conductivity, texture, and moisture, is a critical factor for sustainable crop production. The authors collected a dataset of 1,200 soil samples, each labeled with one of five fertility categories ranging from “very fertile” to “very infertile.” After standard preprocessing (missing‑value imputation, outlier removal, normalization, and one‑hot encoding of categorical attributes), the raw data were fed into a baseline J48 model. The baseline achieved an average classification accuracy of 78.4 % with a relatively deep tree (average depth ≈12) and 150 nodes, indicating potential over‑fitting due to high dimensionality and noisy features.
To improve performance, the study applied two meta‑techniques in sequence: attribute selection and boosting. For attribute selection, a hybrid filter approach combined information‑gain ratio, chi‑square statistics, and Pearson correlation to prune the original 15 attributes down to seven most informative ones (pH, organic matter, P₂O₅, K₂O, EC, sand‑clay ratio, and moisture). This reduction lowered computational cost by more than 30 % and produced a more compact tree (average depth ≈9, 95 nodes). The accuracy of the attribute‑selected J48 rose to 84.7 %.
Next, AdaBoost.M1 was employed with J48 as the weak learner, running 20 boosting rounds. By re‑weighting misclassified instances, the boosted ensemble focused on difficult cases, resulting in a final average accuracy of 89.2 %, with precision and recall both exceeding 0.88. The boosted model retained interpretability while mitigating over‑fitting, demonstrating that ensemble techniques can substantially enhance a single decision‑tree classifier in this domain.
For benchmarking, the authors also trained Random Forest and Support Vector Machine (RBF‑kernel) models on the same data and evaluation protocol (10‑fold cross‑validation). Random Forest achieved 87.5 % accuracy but required a large number of trees (≈100) and higher memory consumption. SVM reached 85.3 % accuracy but demanded extensive hyper‑parameter tuning and longer training times. In contrast, the tuned J48 offered a favorable trade‑off: high accuracy, low computational overhead, and a transparent tree structure that agronomists can readily interpret for field recommendations.
The paper concludes with a discussion of limitations and future work. The dataset originates from a single geographic region, so external validation on diverse soils and climatic zones is necessary to confirm generalizability. Incorporating temporal data and weather variables (rainfall, temperature) could further refine predictions. Finally, the authors propose integrating the optimized J48 model into a real‑time decision‑support system for farmers, enabling actionable fertility management recommendations directly at the point of need.
Comments & Academic Discussion
Loading comments...
Leave a Comment