A New Efficient Method for Calculating Similarity Between Web Services
Web services allow communication between heterogeneous systems in a distributed environment. Their enormous success and their increased use led to the fact that thousands of Web services are present on the Internet. This significant number of Web services which not cease to increase has led to problems of the difficulty in locating and classifying web services, these problems are encountered mainly during the operations of web services discovery and substitution. Traditional ways of search based on keywords are not successful in this context, their results do not support the structure of Web services and they consider in their search only the identifiers of the web service description language (WSDL) interface elements. The methods based on semantics (WSDLS, OWLS, SAWSDL…) which increase the WSDL description of a Web service with a semantic description allow raising partially this problem, but their complexity and difficulty delays their adoption in real cases. Measuring the similarity between the web services interfaces is the most suitable solution for this kind of problems, it will classify available web services so as to know those that best match the searched profile and those that do not match. Thus, the main goal of this work is to study the degree of similarity between any two web services by offering a new method that is more effective than existing works.
💡 Research Summary
The paper addresses the growing difficulty of discovering and substituting Web services in an environment where thousands of services are publicly available. Traditional keyword‑based search mechanisms, which rely solely on the textual identifiers of WSDL elements, fail to capture the functional similarity between services. Semantic extensions such as SAWSDL, OWL‑S, and other ontology‑based approaches can improve relevance, but their adoption is hampered by the high cost of ontology creation, maintenance, and integration. To bridge this gap, the authors propose a hybrid similarity measurement framework that combines structural analysis of WSDL interfaces with lightweight semantic information derived from word embeddings.
The method proceeds in four stages. First, each WSDL document is parsed to construct an interface tree where nodes represent operations, inputs, and outputs. Second, the textual labels of these nodes are tokenized and mapped to pre‑trained embedding vectors (e.g., Word2Vec or FastText). Third, a structural distance between two interface trees is computed using Tree Edit Distance (TED), while a semantic similarity score is obtained by averaging cosine similarities of the corresponding node embeddings. Fourth, the two scores are merged through a weighted sum. Crucially, the weights are not fixed; they are automatically tuned for a given dataset using Bayesian optimization, allowing the system to adapt the relative importance of structure versus semantics for different domains. To resolve the many‑to‑many matching problem that arises when aligning nodes of two trees, the Hungarian algorithm is employed, guaranteeing an optimal bipartite matching. The final similarity value is normalized to the interval
Comments & Academic Discussion
Loading comments...
Leave a Comment