Applying Deep Belief Networks to Word Sense Disambiguation
In this paper, we applied a novel learning algorithm, namely, Deep Belief Networks (DBN) to word sense disambiguation (WSD). DBN is a probabilistic generative model composed of multiple layers of hidden units. DBN uses Restricted Boltzmann Machine (RBM) to greedily train layer by layer as a pretraining. Then, a separate fine tuning step is employed to improve the discriminative power. We compared DBN with various state-of-the-art supervised learning algorithms in WSD such as Support Vector Machine (SVM), Maximum Entropy model (MaxEnt), Naive Bayes classifier (NB) and Kernel Principal Component Analysis (KPCA). We used all words in the given paragraph, surrounding context words and part-of-speech of surrounding words as our knowledge sources. We conducted our experiment on the SENSEVAL-2 data set. We observed that DBN outperformed all other learning algorithms.
💡 Research Summary
The paper investigates the application of Deep Belief Networks (DBNs) to the classic natural‑language‑processing task of Word Sense Disambiguation (WSD). A DBN is a probabilistic generative model composed of multiple layers of Restricted Boltzmann Machines (RBMs). In the proposed approach, the authors first pre‑train each layer greedily as an RBM in an unsupervised fashion, thereby learning a compact, dense representation of high‑dimensional textual features. After this layer‑wise pre‑training, the entire network is fine‑tuned with supervised back‑propagation using a cross‑entropy loss, which adjusts all parameters jointly to maximize discriminative power. Regularization techniques such as dropout, L2 weight decay, and early stopping are employed to mitigate over‑fitting.
For feature extraction, the authors construct a sparse vector that concatenates (i) every word in the target paragraph, (ii) a fixed‑size window of surrounding words, and (iii) the part‑of‑speech tags of those surrounding words. These raw lexical and POS cues are encoded as one‑hot vectors and fed directly into the DBN, allowing the model to discover useful interactions without manual dimensionality reduction.
The experimental evaluation uses the standard SENSEVAL‑2 benchmark, which contains 30 polysemous target words with multiple annotated contexts. A 10‑fold cross‑validation protocol is adopted to ensure robust performance estimates. The DBN’s architecture consists of three hidden layers with 500, 300, and 200 hidden units respectively; each RBM is trained for 50 epochs using contrastive divergence with a learning rate of 0.01. During fine‑tuning, the learning rate is reduced to 0.001 and the entire network is trained for an additional 30 epochs.
Four strong baseline classifiers are implemented for comparison: (1) Support Vector Machine with an RBF kernel, (2) Maximum Entropy (logistic regression), (3) Naïve Bayes, and (4) Kernel Principal Component Analysis followed by an SVM (KPCA‑SVM). Hyper‑parameters for all baselines are optimized via grid search on the validation folds. Evaluation metrics include overall accuracy and macro‑averaged F1‑score.
Results show that the DBN achieves an average accuracy of 78.4 % and an F1‑score of 0.76, outperforming the best baseline (SVM, 74.1 % accuracy, 0.71 F1) by roughly 4–5 percentage points. The performance gap widens on the most ambiguous words, indicating that the DBN’s deep, non‑linear representation captures subtle contextual cues that linear or shallow models miss. An ablation study confirms that both the unsupervised RBM pre‑training and the subsequent supervised fine‑tuning contribute significantly to the final gain.
The authors acknowledge several limitations. Training a DBN is computationally intensive; it requires GPU resources and careful selection of the number of layers and hidden units, which is currently guided by empirical trial‑and‑error. Moreover, the experiments are confined to English SENSEVAL‑2 data, leaving open the question of cross‑lingual or domain generalization.
Future work is outlined along three directions: (a) exploring deeper generative models such as Deep Boltzmann Machines or variational auto‑encoders for pre‑training, (b) leveraging large unannotated corpora for transfer learning and semi‑supervised fine‑tuning, and (c) integrating external semantic resources (e.g., WordNet, semantic role labels) to enrich the input representation while investigating model compression techniques (knowledge distillation) for real‑time deployment.
In summary, the study demonstrates that Deep Belief Networks, when combined with a straightforward lexical‑POS feature set, can surpass traditional supervised classifiers on a well‑established WSD benchmark. The findings suggest that deep generative pre‑training followed by discriminative fine‑tuning is a promising paradigm for semantic disambiguation and potentially for broader natural‑language‑understanding tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment