An Integrated Classification Model for Financial Data Mining
📝 Abstract
Nowadays, financial data analysis is becoming increasingly important in the business market. As companies collect more and more data from daily operations, they expect to extract useful knowledge from existing collected data to help make reasonable decisions for new customer requests, e.g. user credit category, churn analysis, real estate analysis, etc. Financial institutes have applied different data mining techniques to enhance their business performance. However, simple ap-proach of these techniques could raise a performance issue. Besides, there are very few general models for both understanding and forecasting different finan-cial fields. We present in this paper a new classification model for analyzing fi-nancial data. We also evaluate this model with different real-world data to show its performance.
💡 Analysis
Nowadays, financial data analysis is becoming increasingly important in the business market. As companies collect more and more data from daily operations, they expect to extract useful knowledge from existing collected data to help make reasonable decisions for new customer requests, e.g. user credit category, churn analysis, real estate analysis, etc. Financial institutes have applied different data mining techniques to enhance their business performance. However, simple ap-proach of these techniques could raise a performance issue. Besides, there are very few general models for both understanding and forecasting different finan-cial fields. We present in this paper a new classification model for analyzing fi-nancial data. We also evaluate this model with different real-world data to show its performance.
📄 Content
An Integrated Classification Model for Financial Data
Mining
Fan Cai
Software School, Computer Science, Fudan University
825 Zhangheng Road, Shanghai, China
fan.cai.cn@gmail.com
N-A. LeKhac, M-Tahar Kechadi
School of Computer Science and Informatics, University College Dublin
Belfield, Dublin 4, Ireland
{an.lekhac, tahar.kechadi}@ucd.ie
Abstract. Nowadays, financial data analysis is becoming increasingly im-
portant in the business market. As companies collect more and more data from
daily operations, they expect to extract useful knowledge from existing collect-
ed data to help make reasonable decisions for new customer requests, e.g. user
credit category, churn analysis, real estate analysis, etc. Financial institutes have
applied different data mining techniques to enhance their business performance.
However, simple approach of these techniques could raise a performance issue.
Besides, there are very few general models for both understanding and forecast-
ing different financial fields. We present in this paper a new classification mod-
el for analyzing financial data. We also evaluate this model with different real-
world data to show its performance.
Keywords: Data mining, decision tree, multilayer perceptron, Gaussian Pro-
cess, classification model
1
Introduction
Today, we have a deluge of financial datasets. Due to the large sizes of the data
sources it is not possible for a human analyst to come up with interesting information
(or patterns) that can be used in the decision making process. Global competitions,
dynamic markets, and rapid development in the information and communication tech-
nologies are some of the major challenges in today’s financial industry. For instance,
financial institutions are in constant needs for more data analysis, which is becoming
more very large and complex. As the amount of data available is constantly increas-
ing, our ability to process it becomes more and more difficult. Efficient discovery of
useful knowledge from these datasets is therefore becoming a challenge and a mas-
sive economic need.
On the other hand, data mining (DM) is the process of extracting useful, often pre-
viously unknown information, so-called knowledge, from large data sets. This mined
knowledge can be used for various applications such as market analysis, fraud detec-
tion [1], churn analysis [2], etc. DM has also proven to be very effective and profita-
ble in analyzing financial datasets [3]. However, mining financial data presents spe-
cial challenges; complexity, external factors, confidentiality, heterogeneity, and size.
The data miners’ challenge is to find the trends quickly while they are valid, as well as
to recognize the time when the trends are no longer valid. Besides, designing an ap-
propriate process for discovering valuable knowledge in financial data is also a com-
plex task.
Different DM techniques have been proposed in the literature for data analyzing in
various financial applications. For instance, decision-tree [4] and first-order learning
[5] are used in stock selection. Neural networks [6] and support vector machine
(SVM) [7] techniques were used to predict bankruptcy, nearest-neighbors classifica-
tion [8] for the fraud detection. Users also have used these techniques for analyzing
financial time series [9], imputed financial data [10], outlier detection [11], etc. As
different businesses have different behavior-response mapping relationships, and to
find a universal fitting model for every particular field is time-consuming if not be
impossible, a common approach for mining financial data classification capable of
adapting to different business area is needed.
Indeed, as financial dataset is always very large and building a universal model for
classification is usually impracticable. A lot of hybrid [12][13] and parallel models
[14] for particular financial dataset are developed. However they are not a common
structure and do not follow the financial dataset feature, e.g. categorical attributes are
summarized concepts, whose rules are uncertain in classification task. On the contra-
ry, numerical data is usually from ETL (Extract, Transform and Load) process and so
they are unified in one field. We thus need an approach to minimize using nominal
attribute logically and seeking the optimal model for classification to help business
instant decision making, e.g. credit risk analysis, customer churn prediction, and
house price rank instant notification, etc.
In this paper, we propose a new hybrid classification process that can not only un-
derstand and forecast the financial datasets, but also gain useful structural knowledge,
e.g. significant nominal groups, tightness of groups. We also evaluate our model with
real-world datasets. Indeed, we present the capacity of our model for parallel compu-
ting paradigm to speed up the training and analyzing process.
The rest of this paper is organized as follows. In Section 2
This content is AI-processed based on ArXiv data.