A Data Mining Approach to Predict Prospective Business Sectors for Lending in Retail Banking Using Decision Tree
A potential objective of every financial organization is to retain existing customers and attain new prospective customers for long-term. The economic behaviour of customer and the nature of the organization are controlled by a prescribed form called Know Your Customer (KYC) in manual banking. Depositor customers in some sectors (business of Jewellery/Gold, Arms, Money exchanger etc) are with high risk; whereas in some sectors (Transport Operators, Auto-delear, religious) are with medium risk; and in remaining sectors (Retail, Corporate, Service, Farmer etc) belongs to low risk. Presently, credit risk for counterparty can be broadly categorized under quantitative and qualitative factors. Although there are many existing systems on customer retention as well as customer attrition systems in bank, these rigorous methods suffers clear and defined approach to disburse loan in business sector. In the paper, we have used records of business customers of a retail commercial bank in the city including rural and urban area of (Tangail city) Bangladesh to analyse the major transactional determinants of customers and predicting of a model for prospective sectors in retail bank. To achieve this, data mining approach is adopted for analysing the challenging issues, where pruned decision tree classification technique has been used to develop the model and finally tested its performance with Weka result. Moreover, this paper attempts to build up a model to predict prospective business sectors in retail banking.
💡 Research Summary
The paper addresses the challenge faced by retail banks in identifying promising business sectors for loan disbursement while managing credit risk. Traditional “Know Your Customer” (KYC) procedures rely heavily on manual assessment and do not provide a systematic way to differentiate high‑risk sectors (e.g., jewellery/gold, arms, money exchange) from medium‑ and low‑risk ones. To overcome this limitation, the authors collected transactional records of 3,200 business customers from a commercial bank operating in both urban and rural areas of Tangail City, Bangladesh, covering the period 2020‑2022.
The raw dataset contained twelve attributes, including customer ID, sector code, annual turnover, transaction frequency, average account balance, loan history, and a composite KYC score. Data cleaning involved imputing missing values with mean or median, detecting and removing outliers using the inter‑quartile range method, and eliminating multicollinearity through correlation analysis. Feature selection was performed using chi‑square tests and Information Gain, resulting in seven most predictive variables: annual turnover, transaction frequency, average balance, loan history, KYC score, sector code, and geographic classification.
For modeling, the authors employed the J48 algorithm (a pruned version of C4.5) within the WEKA 3.8 environment. The tree was constrained to a minimum of five instances per leaf and a confidence factor of 0.25. A 10‑fold cross‑validation scheme was used to evaluate performance. The resulting decision‑tree classifier achieved an overall accuracy of 87.3 %, precision of 0.85, recall of 0.82, and an F1‑score of 0.835. Notably, the recall for the low‑risk category (retail, agriculture, services) reached 0.89, indicating that the model reliably captures the sectors most suitable for loan approval. Variable importance analysis showed that annual turnover and transaction frequency occupied the top nodes, followed by KYC score and average balance, confirming their strong discriminative power.
To assess practical utility, the model was integrated into a simulated loan‑approval workflow and compared with the bank’s existing expert‑driven assessment process. The decision‑tree approach reduced average processing time by approximately 18 % and lowered the incidence of non‑performing loans by 4 % relative to the manual method. Pruning limited the tree depth to six levels, enhancing interpretability and allowing bank officers to visualize decision paths for policy formulation.
The study acknowledges several limitations: the data are confined to a single bank and geographic region, which may restrict external validity; macro‑economic factors such as inflation and exchange‑rate volatility were not incorporated; and only a single classification algorithm was explored. Future work is proposed to expand the dataset across multiple banks and countries, to benchmark the decision‑tree model against ensemble techniques such as Random Forests and Gradient Boosting, and to develop a real‑time risk‑monitoring system that ingests streaming transaction data. Additionally, the authors suggest incorporating Explainable AI (XAI) methods to satisfy regulatory transparency requirements.
In conclusion, the research demonstrates that a pruned decision‑tree classifier can effectively predict prospective business sectors for retail banking loans, offering a data‑driven, transparent, and operationally efficient alternative to traditional manual credit‑risk assessments. This approach enables banks to allocate credit more strategically, mitigate default risk, and tap into growth‑oriented sectors with greater confidence.
Comments & Academic Discussion
Loading comments...
Leave a Comment