Indebted households profiling: a knowledge discovery from database approach
📝 Abstract
A major challenge in consumer credit risk portfolio management is to classify households according to their risk profile. In order to build such risk profiles it is necessary to employ an approach that analyses data systematically in order to detect important relationships, interactions, dependencies and associations amongst the available continuous and categorical variables altogether and accurately generate profiles of most interesting household segments according to their credit risk. The objective of this work is to employ a knowledge discovery from database process to identify groups of indebted households and describe their profiles using a database collected by the Consumer Credit Counselling Service (CCCS) in the UK. Employing a framework that allows the usage of both categorical and continuous data altogether to find hidden structures in unlabelled data it was established the ideal number of clusters and such clusters were described in order to identify the households who exhibit a high propensity of excessive debt levels.
💡 Analysis
A major challenge in consumer credit risk portfolio management is to classify households according to their risk profile. In order to build such risk profiles it is necessary to employ an approach that analyses data systematically in order to detect important relationships, interactions, dependencies and associations amongst the available continuous and categorical variables altogether and accurately generate profiles of most interesting household segments according to their credit risk. The objective of this work is to employ a knowledge discovery from database process to identify groups of indebted households and describe their profiles using a database collected by the Consumer Credit Counselling Service (CCCS) in the UK. Employing a framework that allows the usage of both categorical and continuous data altogether to find hidden structures in unlabelled data it was established the ideal number of clusters and such clusters were described in order to identify the households who exhibit a high propensity of excessive debt levels.
📄 Content
Authors, affiliations and addresses:
Rodrigo Arnaldo Scarpel, Instituto Tecnológico de Aeronáutica, Praça Marechal Eduardo Gomes, 50 ITA - IEM sala 2311, São José dos Campos - SP, Brazil, rodrigo@ita.br
Alexandros Ladas, University of Nottingham, School of Computer Science, Jubilee Campus, Wollaton Road, Nottingham, UK, NG8 1BB, psxal2@nottingham.ac.uk
Uwe Aickelin, University of Nottingham, School of Computer Science, Jubilee Campus, Wollaton Road, Nottingham, UK, NG8 1BB, uwe.aickelin@nottingham.ac.uk
Correspondig author: Rodrigo Arnaldo Scarpel, rodrigo@ita.br, +55(12)3947-6973
Title: Indebted households profiling: a knowledge discovery from database approach
Abstract: A major challenge in consumer credit risk portfolio management is to classify households according to their risk profile. In order to build such risk profiles it is necessary to employ an approach that analyses data systematically in order to detect important relationships, interactions, dependencies and associations amongst the available continuous and categorical variables altogether and accurately generate profiles of most interesting household segments according to their credit risk. The objective of this work is to employ a knowledge discovery from database process to identify groups of indebted households and describe their profiles using a database collected by the Consumer Credit Counselling Service (CCCS) in the UK. Employing a framework that allows the usage of both categorical and continuous data altogether to find hidden structures in unlabelled data it was established the ideal number of clusters and such clusters were described in order to identify the households who exhibit a high propensity of excessive debt levels.
Keywords: Clustering, Homogeneity analysis, Silhouette width, credit risk.
Indebted households profiling: a knowledge discovery from database approach
- Introduction Indebtedness in private households as a result of growing consumer credit use has dramatically risen and according to Kamleitner and Kirchler [1] it has various consequences on social, psychological, economic, and political levels. As reported by McCarthy [2], financial distress at an individual and household level can have serious consequences which go far beyond those experienced by the individual or household involved. Furthermore, the enormous fiscal costs associated with a financial crisis are a reminder that heightened financial distress and poor financial behaviour on the part of a relatively small number of people can have serious negative externality effects on the rest of the economy. In the literature, there are different works that employed data mining approaches to deal with credit risk assessment. Shi et al. [3] obtained promising results on bankruptcy prediction employing a multiple criteria linear programming (MCLP) approach to data mining. Peng et al. [4] employed cluster analysis for credit card accounts classification and improved clustering classification results using ensemble and supervised learning methods. Aihua et al. [5] proposed a data mining approach based on the combination of Multi-criteria linear programming and Principal Component Analysis in order to improve the classification of credit cardholders. Peng et al. [6] proposed a mathematical programming model to deal with Credit Classification Problems addressing speed and scalability that are two essential issues in data mining and knowledge discovery. Li, Shi and He [7] proposed three multiple criteria linear programming (MCLP) to improve the overall accuracy of the classification models. The first one is called MCLP with unbalanced training set selection, the second one is called fuzzy linear programming (FLP) method with moving boundary, and the third one is called penalized multi criteria linear programming (PMCLP). This work intends to contribute to the existing literature and to consumer credit risk portfolio management by providing an approach to classify households according to their risk profile. Such profiling is useful in different ways. On the aggregate level is important to model the distribution of the expected number of defaults according to both the economic factors and the social and demographic developments, such as the increase of the number of divorces. Moreover, events like recessions may impact some households more acutely than others. Therefore, the identified profiles are useful in indicating which socio-economic factors should be considered on such macro level models. Households’ profiles may also support the development of a portfolio level forecast by clustering the individuals according to the generated profiles. Thus, one needs only to sum across predictions by cluster in order to produce a portfolio level forecast. In order to build such risk profiles it is necessary to employ an approach that analyses data systematically in order t
This content is AI-processed based on ArXiv data.