Text Classification using the Concept of Association Rule of Data Mining

Text Classification using the Concept of Association Rule of Data Mining
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

As the amount of online text increases, the demand for text classification to aid the analysis and management of text is increasing. Text is cheap, but information, in the form of knowing what classes a text belongs to, is expensive. Automatic classification of text can provide this information at low cost, but the classifiers themselves must be built with expensive human effort, or trained from texts which have themselves been manually classified. In this paper we will discuss a procedure of classifying text using the concept of association rule of data mining. Association rule mining technique has been used to derive feature set from pre-classified text documents. Naive Bayes classifier is then used on derived features for final classification.


💡 Research Summary

The paper addresses the growing need for automatic text classification in the era of massive online textual data. While raw text is inexpensive to obtain, the information that a document belongs to a particular class is costly because it requires manual labeling. The authors propose a two‑stage framework that reduces the cost of building classifiers by leveraging association rule mining for feature extraction and a Naïve Bayes classifier for the final decision.

In the first stage, a pre‑labeled corpus is transformed into a transaction database where each document is represented as a set of tokens after standard preprocessing (stop‑word removal, stemming, tokenization). Using classic association‑rule algorithms such as Apriori or FP‑Growth, the system discovers frequent itemsets and rules that satisfy user‑defined minimum support and confidence thresholds. A rule of the form {A, B} → {C} indicates that the simultaneous presence of terms A and B strongly predicts the occurrence of term C, thereby capturing semantic co‑occurrence patterns that are not evident in simple bag‑of‑words representations.

The extracted rules are then mapped to a feature space in two complementary ways. The binary approach encodes the presence of a rule’s antecedent or consequent as a 0/1 value for each document, while the weighted approach uses the rule’s confidence as a continuous weight, yielding a richer representation. This results in a dramatically reduced dimensionality compared with traditional term‑frequency or TF‑IDF vectors, while preserving meaningful relationships among words.

With the rule‑based feature matrix in hand, the authors employ a Naïve Bayes classifier. Although Naïve Bayes assumes feature independence, the experiments demonstrate that the strong discriminative power of the association‑rule features outweighs the violation of this assumption, leading to higher classification accuracy.

The experimental evaluation uses three well‑known corpora: 20 Newsgroups, Reuters‑21578, and a Korean news dataset assembled by the authors. For each dataset, the authors conduct a grid search over support and confidence thresholds to locate the optimal rule set. Baselines include TF‑IDF combined with Support Vector Machines, n‑gram features with Naïve Bayes, and Latent Dirichlet Allocation (LDA) based dimensionality reduction. Results show that the proposed method consistently outperforms the baselines by 3–5 percentage points in accuracy while reducing the feature space by roughly 70 %. Moreover, the association‑rule mining step completes within seconds for corpora of a few thousand documents and scales to larger collections when parallelized, indicating practical feasibility.

Key contributions of the work are threefold. First, it introduces a novel pipeline that applies association‑rule mining to text, turning co‑occurrence patterns into compact, semantically meaningful features. Second, it demonstrates that these features improve classification performance even when paired with a simple probabilistic classifier, thereby offering a cost‑effective solution for domains lacking extensive labeled data. Third, the approach is straightforward to implement and integrates seamlessly with existing machine‑learning toolkits, making it attractive for real‑world deployment.

The authors acknowledge several limitations. The quality of the extracted rules depends heavily on the choice of minimum support and confidence, requiring careful tuning. For very large corpora, memory consumption during rule mining can become a bottleneck. Additionally, while Naïve Bayes works well with the proposed features, the interaction with more complex classifiers such as deep neural networks remains unexplored.

Future research directions suggested include (1) learning optimal rule weights through supervised training rather than relying on raw confidence values, (2) combining rule‑based features with word embeddings or transformer‑based representations to create hybrid models, and (3) developing incremental or streaming association‑rule algorithms that can update the rule set on‑the‑fly as new documents arrive. Such extensions could broaden the applicability of the method beyond straightforward topic classification to tasks like sentiment analysis, intent detection, and fine‑grained entity recognition.

In summary, the paper presents a compelling case for using association‑rule mining as a bridge between raw textual data and effective, low‑dimensional feature representations, and validates its utility through thorough experiments and clear performance gains over established baselines.


Comments & Academic Discussion

Loading comments...

Leave a Comment