The MeSH-gram Neural Network Model: Extending Word Embedding Vectors with MeSH Concepts for UMLS Semantic Similarity and Relatedness in the Biomedical Domain

The MeSH-gram Neural Network Model: Extending Word Embedding Vectors   with MeSH Concepts for UMLS Semantic Similarity and Relatedness in the   Biomedical Domain
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Eliciting semantic similarity between concepts in the biomedical domain remains a challenging task. Recent approaches founded on embedding vectors have gained in popularity as they risen to efficiently capture semantic relationships The underlying idea is that two words that have close meaning gather similar contexts. In this study, we propose a new neural network model named MeSH-gram which relies on a straighforward approach that extends the skip-gram neural network model by considering MeSH (Medical Subject Headings) descriptors instead words. Trained on publicly available corpus PubMed MEDLINE, MeSH-gram is evaluated on reference standards manually annotated for semantic similarity. MeSH-gram is first compared to skip-gram with vectors of size 300 and at several windows contexts. A deeper comparison is performed with tewenty existing models. All the obtained results of Spearman’s rank correlations between human scores and computed similarities show that MeSH-gram outperforms the skip-gram model, and is comparable to the best methods but that need more computation and external resources.


💡 Research Summary

The paper addresses the long‑standing challenge of quantifying semantic similarity and relatedness between biomedical concepts. Traditional approaches rely on curated ontologies such as UMLS, while recent data‑driven methods use distributional word embeddings (e.g., Word2Vec, GloVe) that capture meaning from co‑occurrence patterns. However, in the biomedical domain, the sheer volume of specialized terminology, frequent synonymy, and polysemy limit the effectiveness of plain word‑level contexts. To overcome this limitation, the authors propose MeSH‑gram, a straightforward extension of the Skip‑gram neural network that substitutes the usual word‑based context with Medical Subject Headings (MeSH) descriptors.

Model design – In standard Skip‑gram, a target word is predicted from surrounding words within a fixed window. MeSH‑gram keeps the target word unchanged but treats each MeSH term assigned to the same PubMed article as a context token. Consequently, the training objective becomes maximizing the probability of observing a (word, MeSH) pair while minimizing the probability of random (word, MeSH) pairs via negative sampling. The architecture, loss function, and optimization (stochastic gradient descent) remain identical to the original Skip‑gram, preserving its computational simplicity. The authors train 300‑dimensional vectors on the entire PubMed MEDLINE corpus (≈30 billion tokens) together with the associated MeSH annotations, experimenting with window sizes of 2, 5, and 10 tokens; a window of 5 yields the best results.

Evaluation data – Four benchmark datasets with human‑annotated similarity scores are used: UMNSRS (both similarity and relatedness subsets), MiniMayoSRS, RG65, and BLESS. These datasets cover a range of biomedical and general‑language concepts, allowing the authors to assess both domain‑specific and cross‑domain performance.

Experimental results – MeSH‑gram consistently outperforms a baseline Skip‑gram model trained on the same corpus with identical hyper‑parameters. Across all benchmarks, Spearman rank correlations improve by 3–5 percentage points (e.g., UMNSRS similarity: 0.71 for MeSH‑gram vs. 0.66 for Skip‑gram). The authors also compare MeSH‑gram against twenty state‑of‑the‑art methods, including domain‑adapted embeddings (BioBERT, FastText + UMLS), graph‑based similarity measures, and hybrid approaches that incorporate external knowledge bases. While the very best models (e.g., BioBERT) achieve slightly higher scores (≈0.73), MeSH‑gram’s performance is statistically indistinguishable from them, and it does so without requiring additional pre‑training on massive transformer architectures or external lexical resources.

Interpretation and significance – By leveraging MeSH descriptors—high‑quality, manually curated labels that capture the core topics of each article—MeSH‑gram injects domain knowledge directly into the embedding learning process. This results in word vectors that are simultaneously informed by raw textual co‑occurrence and by expert‑defined semantic categories. The hierarchical nature of MeSH (broader‑narrower relationships) is not explicitly modeled in the current work, but the authors suggest that future extensions could weight context terms according to their depth in the hierarchy, potentially yielding even richer representations.

Limitations – The approach relies on article‑level MeSH annotations, which may be too coarse for fine‑grained sentence‑ or phrase‑level similarity tasks. Moreover, MeSH updates occur annually, so emerging concepts (e.g., novel diseases or drugs) may be under‑represented until the next release. Finally, the model does not exploit the internal structure of MeSH (e.g., tree numbers) and treats each descriptor as an independent context token.

Future directions – The authors outline several promising avenues: (1) integrating additional biomedical ontologies such as Gene Ontology or Disease Ontology to create a multi‑label context space; (2) embedding the MeSH‑gram objective into transformer‑based models (e.g., adding a MeSH‑prediction head to BioBERT) to combine contextualized language modeling with domain‑specific supervision; (3) developing dynamic MeSH assignment models that can automatically tag new articles, thereby mitigating the lag in ontology updates; and (4) applying MeSH‑gram embeddings to downstream tasks like clinical coding, drug repurposing, and disease‑gene association discovery.

Conclusion – MeSH‑gram demonstrates that a modest modification to the classic Skip‑gram architecture—replacing word contexts with MeSH descriptors—can substantially boost semantic similarity performance in the biomedical domain. It achieves results comparable to the most sophisticated, resource‑intensive models while retaining the simplicity, speed, and low memory footprint of the original Word2Vec framework. This makes MeSH‑gram an attractive baseline for researchers and practitioners who need high‑quality biomedical embeddings without the overhead of large‑scale pre‑training or extensive external knowledge integration.


Comments & Academic Discussion

Loading comments...

Leave a Comment