Text classification based on ensemble extreme learning machine
In this paper, we propose a novel approach based on cost-sensitive ensemble weighted extreme learning machine; we call this approach AE1-WELM. We apply this approach to text classification. AE1-WELM is an algorithm including balanced and imbalanced multiclassification for text classification. Weighted ELM assigning the different weights to the different samples improves the classification accuracy to a certain extent, but weighted ELM considers the differences between samples in the different categories only and ignores the differences between samples within the same categories. We measure the importance of the documents by the sample information entropy, and generate cost-sensitive matrix and factor based on the document importance, then embed the cost-sensitive weighted ELM into the AdaBoost.M1 framework seamlessly. Vector space model(VSM) text representation produces the high dimensions and sparse features which increase the burden of ELM. To overcome this problem, we develop a text classification framework combining the word vector and AE1-WELM. The experimental results show that our method provides an accurate, reliable and effective solution for text classification.
💡 Research Summary
This paper introduces a novel text classification framework called AE1‑WELM (Adaptive Ensemble 1‑Weighted Extreme Learning Machine). The authors identify two major challenges in modern text classification: (1) the high dimensionality and sparsity of traditional vector‑space representations, which impose heavy computational burdens on learning algorithms, and (2) the prevalence of class imbalance, which degrades the performance of many classifiers, including Extreme Learning Machines (ELM). While ELM is praised for its extremely fast training speed and good generalization, its standard formulation suffers from random initialization of input weights and hidden‑layer biases, leading to unstable results, and its weighted variants only address inter‑class imbalance, ignoring intra‑class variations.
To overcome these limitations, the authors propose three key innovations. First, they quantify the importance of each document using Shannon information entropy. Two entropy measures are defined: inter‑class entropy (capturing how uniformly a term appears across all categories) and inner‑class entropy (capturing term distribution within a specific category). By combining these, a “category entropy” is derived for each term, and the aggregate term entropy yields a document‑level importance score. Second, the document importance scores are transformed into a cost‑sensitive matrix and a cost‑sensitive factor, which are incorporated into a weighted ELM (W‑ELM). Unlike conventional W‑ELM that assigns the same weight to all samples of a minority class, the proposed cost‑sensitive weighting differentiates samples both across and within classes, allowing the learner to focus more on hard or informative instances. Third, the cost‑sensitive W‑ELM is embedded as the weak learner inside the AdaBoost.M1 ensemble. During each boosting iteration, the cost‑sensitive weights are updated, effectively re‑balancing the training distribution and mitigating the bias toward majority classes.
To address the high‑dimensionality issue, the authors replace the classic VSM (e.g., TF‑IDF, LSI) with low‑dimensional word embeddings. They adopt the Skip‑gram model (Mikolov et al.) to learn dense vectors for each word from an unlabeled corpus, then construct document vectors by averaging the word vectors of the document. This reduces the feature space to a few hundred dimensions while preserving semantic relationships, thereby easing the computational load of ELM.
The experimental evaluation uses three widely used benchmark corpora: 20Newsgroups (relatively balanced multi‑class), Reuters‑21578 (highly imbalanced multi‑class), and WebKB (small, noisy). The proposed AE1‑WELM is compared against a suite of baselines: standard ELM, weighted ELM, Ada‑WELM (a previously published cost‑sensitive boosting variant), Bagging‑ELM, as well as traditional classifiers such as SVM, Naïve Bayes, and k‑NN. Performance is measured using accuracy, F1‑score, G‑mean, and AUC to capture both overall correctness and balance across classes.
Results show that AE1‑WELM consistently achieves the highest accuracy and F1‑score across the datasets. The gains are especially pronounced on the imbalanced Reuters‑21578, where G‑mean improves markedly, indicating that the method successfully avoids the common pitfall of neglecting minority classes. Moreover, the boosting process stabilizes the learning: as the number of AdaBoost rounds increases, variance due to random weight initialization diminishes, confirming that the ensemble mitigates ELM’s inherent instability. The use of word embeddings also yields substantial reductions in training time and memory consumption compared with VSM‑based representations, demonstrating practical scalability.
The paper acknowledges some limitations. Computing the entropy‑based importance scores incurs additional preprocessing cost, which may become significant for very large corpora. The reliance on pre‑trained Skip‑gram embeddings could limit performance on domain‑specific vocabularies where the embeddings lack coverage. Furthermore, while hyper‑parameters such as the number of boosting rounds, hidden‑node count, and cost‑factor scaling are mentioned, detailed settings are not fully disclosed, potentially hindering exact reproducibility.
In conclusion, AE1‑WELM offers a compelling combination of cost‑sensitive weighting and AdaBoost ensemble within the fast‑training ELM paradigm, effectively tackling both high dimensionality and class imbalance in text classification. Future work could explore more efficient entropy computation, adaptive embedding fine‑tuning for specialized domains, and extensive hyper‑parameter optimization to further solidify the method’s applicability to real‑world large‑scale text mining tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment