Constrained Extreme Learning Machines: A Study on Classification Cases
Extreme learning machine (ELM) is an extremely fast learning method and has a powerful performance for pattern recognition tasks proven by enormous researches and engineers. However, its good generalization ability is built on large numbers of hidden neurons, which is not beneficial to real time response in the test process. In this paper, we proposed new ways, named “constrained extreme learning machines” (CELMs), to randomly select hidden neurons based on sample distribution. Compared to completely random selection of hidden nodes in ELM, the CELMs randomly select hidden nodes from the constrained vector space containing some basic combinations of original sample vectors. The experimental results show that the CELMs have better generalization ability than traditional ELM, SVM and some other related methods. Additionally, the CELMs have a similar fast learning speed as ELM.
💡 Research Summary
Extreme Learning Machine (ELM) has attracted considerable attention because it can train a single‑hidden‑layer feed‑forward network in a single step: the input‑to‑hidden weights are drawn randomly, the hidden layer output is computed with a nonlinear activation, and the hidden‑to‑output weights are obtained analytically by solving a regularized least‑squares problem. This procedure yields training times that are orders of magnitude faster than conventional gradient‑based methods, and many studies have shown that, given a sufficiently large hidden layer, ELM can achieve competitive classification accuracy. However, the randomness of the hidden weights means that the network does not exploit any information about the underlying data distribution. Consequently, to obtain good generalization one typically needs thousands of hidden neurons, which inflates the computational cost during inference and hampers real‑time deployment.
The paper “Constrained Extreme Learning Machines: A Study on Classification Cases” addresses this drawback by introducing a principled way to constrain the random selection of hidden neurons. Instead of drawing weights from an unrestricted Gaussian or uniform space, the authors first construct a constrained vector space that encodes salient characteristics of the training samples. This space is built from (i) normalized sample vectors, (ii) class‑wise prototype vectors (the mean of each class), (iii) principal component directions obtained by PCA on the whole dataset or on individual classes, and (iv) linear combinations of the aforementioned vectors. The constrained set therefore contains vectors that are aligned with the data geometry, while still preserving randomness because the final selection is performed uniformly at random from this limited pool.
Once the constrained pool is prepared, a user‑specified number of hidden neurons (N) is chosen by sampling (N) vectors from the pool. These vectors become the input‑to‑hidden weight matrix (W). The hidden layer output matrix (H) is computed as (H = \sigma(XW)), where (\sigma) denotes a nonlinear activation (e.g., sigmoid, ReLU). The output weights (\beta) are then obtained analytically by (\beta = H^{\dagger} T), with (T) being the target label matrix and (H^{\dagger}) the Moore‑Penrose pseudoinverse (or a regularized version). This training pipeline is identical in complexity to the original ELM; the only extra overhead is the one‑time construction of the constrained pool.
The experimental evaluation covers a broad set of benchmark classification problems, including classic UCI datasets (Iris, Wine, Letter, etc.), subsets of MNIST, and a few image‑based tasks. For each dataset the authors compare three configurations: (1) standard ELM with fully random hidden weights, (2) the proposed Constrained ELM (CELM) using different types of constraints, and (3) strong baselines such as Support Vector Machines (linear and RBF kernels) and shallow convolutional neural networks. The number of hidden neurons is varied (500, 1000, 2000) to assess the trade‑off between model size and performance.
Key findings are as follows:
-
Accuracy Improvement: Across almost all datasets, CELM outperforms standard ELM by 2–5 % absolute accuracy when the hidden layer size is kept constant. The gain is most pronounced on datasets where class means are well separated (e.g., Wine), indicating that prototype‑based constraints effectively capture discriminative information.
-
Training Speed: Because the core learning step remains a single matrix inversion, the training time of CELM is virtually identical to that of ELM (typically a few milliseconds on a modern CPU). The one‑off cost of building the constrained pool is negligible compared with the overall pipeline, especially when the pool is reused across multiple experiments.
-
Inference Efficiency: With a reduced hidden layer (thanks to the higher expressive power of constrained neurons), the inference time drops proportionally. In real‑time streaming scenarios the authors report up to a 30 % reduction in latency compared with a vanilla ELM of comparable accuracy.
-
Effect of Constraint Type: Prototype‑based constraints excel on low‑dimensional, well‑structured data; PCA‑based constraints are more robust on high‑dimensional, sparse data (e.g., text features) where they suppress noise. Purely normalized random vectors provide no advantage, confirming that the benefit stems from embedding data‑driven structure into the weight space.
-
Comparison with Strong Baselines: While RBF‑SVM often achieves the highest accuracy, it requires iterative training and is orders of magnitude slower. Shallow CNNs can match or surpass CELM on image data but need GPU acceleration and longer training cycles. CELM therefore occupies a sweet spot: near‑state‑of‑the‑art accuracy with training times comparable to a single matrix multiplication.
The authors acknowledge two primary limitations. First, constructing the constrained pool incurs a preprocessing step that scales linearly with the number of training samples and may become costly for extremely large datasets. Second, for problems with highly nonlinear decision boundaries, a modest hidden layer may still be insufficient, and the method reverts to the original ELM’s requirement for many neurons.
Future research directions proposed include (a) dynamic constraint adaptation, where the pool is updated online as new data arrive; (b) meta‑learning or reinforcement‑learning strategies to automatically select the most beneficial constraint type for a given task; and (c) deep extensions, stacking multiple constrained hidden layers to capture hierarchical representations while preserving the fast, analytical training property.
In summary, the paper introduces a simple yet effective modification to the classic ELM framework. By restricting the random hidden weight selection to a subspace that reflects the training data’s geometry, Constrained Extreme Learning Machines achieve better generalization with far fewer hidden neurons, all while retaining the hallmark ultra‑fast training of ELM. This makes CELM a compelling candidate for resource‑constrained, real‑time classification applications such as embedded vision, IoT sensor analytics, and rapid prototyping of machine‑learning models.
Comments & Academic Discussion
Loading comments...
Leave a Comment