Semantic Similarity Measures Applied to an Ontology for Human-Like Interaction

The focus of this paper is the calculation of similarity between two concepts from an ontology for a Human-Like Interaction system. In order to facilitate this calculation, a similarity function is proposed based on five dimensions (sort, compositional, essential, restrictive and descriptive) constituting the structure of ontological knowledge. The paper includes a proposal for computing a similarity function for each dimension of knowledge. Later on, the similarity values obtained are weighted and aggregated to obtain a global similarity measure. In order to calculate those weights associated to each dimension, four training methods have been proposed. The training methods differ in the element to fit: the user, concepts or pairs of concepts, and a hybrid approach. For evaluating the proposal, the knowledge base was fed from WordNet and extended by using a knowledge editing toolkit (Cognos). The evaluation of the proposal is carried out through the comparison of system responses with those given by human test subjects, both providing a measure of the soundness of the procedure and revealing ways in which the proposal may be improved.

💡 Research Summary

The paper addresses the problem of quantifying semantic similarity between two concepts within an ontology that underlies a Human‑Like Interaction (HLI) system. Traditional similarity measures often rely on a single structural aspect of an ontology—typically the hierarchical “is‑a” relationship—or on corpus‑based statistical methods. Both approaches fall short of capturing the nuanced way humans judge “sameness.” To overcome this limitation, the authors decompose the ontology into five orthogonal dimensions of knowledge: sort (hierarchical), compositional (part‑of), essential (core attributes), restrictive (constraints on actions or properties), and descriptive (textual annotations).

For each dimension a dedicated similarity function is defined. The sort dimension uses the depth of the lowest common ancestor (LCA) relative to the depths of the two concepts, yielding a value between 0 and 1. The compositional dimension computes a Jaccard similarity over the sets of component parts. The essential dimension measures overlap of core attribute sets, again via Jaccard. The restrictive dimension evaluates how the two concepts share or complement each other’s constraints, while the descriptive dimension converts natural‑language descriptions into vector representations and applies cosine similarity.

These five partial similarity scores are combined into a global similarity score through a weighted sum:
(Sim_{global}= \sum_{i=1}^{5} w_i \times Sim_i).
The crucial question is how to determine the weights (w_i). The authors propose four training strategies: (1) a user‑based approach that directly minimizes the error between system scores and explicit user feedback; (2) a concept‑based method that fits weights to expert judgments on individual concepts; (3) a pair‑based approach that optimizes weights to reduce average error across a set of concept pairs; and (4) a hybrid method that integrates the previous three objectives into a multi‑objective optimization. Linear regression, ridge regularization, and non‑linear optimizers (e.g., genetic algorithms) are employed to learn the weight vector.

The experimental platform uses WordNet as the core lexical resource. The authors extend this base ontology with additional relations and attributes using the Cognos knowledge‑editing toolkit, manually inserting restrictive and descriptive information that WordNet lacks. A test set of 200 concept pairs is created, and 30 human participants rate each pair on a 0‑10 similarity scale. System similarity scores are computed for each pair under the various weighting schemes, and performance is measured using Pearson correlation and mean absolute error (MAE) against the human ratings.

Results show that a naïve single‑dimension model (e.g., only sort) achieves a correlation of 0.62 and MAE of 1.84. Adding all five dimensions without learned weights improves correlation to 0.71 and MAE to 1.45. The hybrid learning approach yields the best performance, with a correlation of 0.78 and MAE of 1.12. Notably, the restrictive and descriptive dimensions receive higher weights in domains where functional constraints or textual nuances are important, indicating that the multi‑dimensional framework can adapt to domain‑specific similarity judgments.

The paper’s contributions are threefold: (1) a formal multi‑dimensional similarity model that mirrors human cognitive assessment; (2) a set of practical weight‑learning procedures that allow the system to be tuned to user feedback, expert knowledge, or empirical pairwise data; and (3) an empirical validation that demonstrates superior alignment with human judgments compared to traditional single‑dimension measures.

Limitations include the manual effort required to enrich the ontology with restrictive and descriptive relations, the dependence on a relatively small set of human ratings for training, and the static nature of the learned weights, which may not capture evolving user preferences in real‑time interaction.

Future work suggested by the authors involves (a) automating ontology enrichment through machine‑learning techniques such as relation extraction and attribute prediction; (b) incorporating reinforcement learning to adapt weights on‑the‑fly during live user sessions; (c) extending the evaluation to specialized domains like medicine or law, where the balance among dimensions may differ markedly; and (d) personalizing weight vectors per user to achieve truly individualized similarity assessments. By pursuing these directions, the proposed framework could become a cornerstone for more natural, context‑aware, and human‑like conversational agents.