In this work several semantic approaches to concept-based query expansion and reranking schemes are studied and compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on concept-based query expansion schemes, where, in order to effectively increase the precision of web document retrieval and to decrease the users browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, i.e. a measure which can be computed by using statistical results from web search engine. Experiments show that the proposed technique can provide users with more satisfying expansion results and improve the quality of web document retrieval.
Deep Dive into Semantic Evolutionary Concept Distances for Effective Information Retrieval in Query Expansion.
In this work several semantic approaches to concept-based query expansion and reranking schemes are studied and compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on concept-based query expansion schemes, where, in order to effectively increase the precision of web document retrieval and to decrease the users browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING distance, a collaborative semantic proximity measure, i.e. a measure which can be computed by using statistical results from web search eng
Collective Evolutionary Concept Distance Based
Query Expansion for Effective Web Document
Retrieval
C. H. C Leung
Dept. of Computer Science
Hong Kong Baptist University
Hong Kong
clement@comp.hkbu.edu.hk
Alfredo Milani
Dept. of Mathematics and Computer Science
University of Perugia
Perugia, Italy
milani@unipg.it
Yuanxi Li
Dept. of Computer Science
Hong Kong Baptist University
Hong Kong
yxli@comp.hkbu.edu.hk
Valentina Franzoni
Dept. of Mathematics and Computer Science
University of Perugia
Perugia, Italy
valentina.franzoni@dmi.unipg.it
Abstract— In this work several semantic approaches to concept-based query expansion and re-ranking schemes are studied and
compared with different ontology-based expansion methods in web document search and retrieval. In particular, we focus on
concept-based query expansion schemes, where, in order to effectively increase the precision of web document retrieval and to
decrease the users’ browsing time, the main goal is to quickly provide users with the most suitable query expansion. Two key tasks
for query expansion in web document retrieval are to find the expansion candidates, as the closest concepts in web document
domain, and to rank the expanded queries properly. The approach we propose aims at improving the expansion phase for better
web document retrieval and precision. The basic idea is to measure the distance between candidate concepts using the PMING
distance, a collaborative semantic proximity measure, i.e. a measure which can be computed by using statistical results from web
search engine. Experiments show that the proposed technique can provide users with more satisfying expansion results and
improve the quality of web document retrieval.
Keywords- web document retrieval; concept distance; PMING distance; semantic similarity measures; query expansion; precision
and recall
I.
INTRODUCTION
Query expansion (QE) is the process of reformulating a seed query to improve retrieval performance in information
retrieval operations.[1] In the context of web search engines, query expansion involves evaluating a user’s input (which words
were typed into the search query area, and sometimes other types of data) and expanding the search query to match additional
documents. Query expansion involves techniques such as the following:
Finding synonyms of words, and searching for the synonyms as well
Finding all the various morphological forms of words by stemming each word in the search query
Fixing spelling errors and automatically searching for the corrected form or suggesting it in the results
Re-weighting the terms in the original query
Query expansion is a widely studied methodology in the field of computer science, particularly within the realm of natural
language processing and information retrieval.
Most casual users of IR systems type short queries. Recent research [3] has shown that adding new words to these queries
can improve the retrieval effectiveness of such queries.
In the web document search engines, the goal of query expansion in this regard is that, by increasing recall, precision can
potentially increase (rather than decrease), including in the result set pages which are more relevant (of higher quality), or at
least equally relevant. With query expansion, pages having higher potential to be relevant, and that are otherwise not included,
can be included.
In order to increase the precision of web document retrieval and decrease the users’ browsing time, the most important task
is to provide users the most suitable expanded queries quickly. Therefore, to find the closest expansion candidate concepts in
web document domain, and to rank the expansion queries properly, are two main issues for query expansion in web document
retrieval.
Our work mainly focuses on these two targets to improve the expansion results for better precision. In particular the use of
a semantic proximity measure, the PMING distance [2, 37], is proposed and experimented.
This paper is organized as follows.
Related work on query expansion and proximity measures will be introduced in Section two; the proposed distance-based
query expansion system for web document search will be presented in detail in Section three. The experimental results are
reported in Section four, followed by conclusions in the last Section.
II.
RELATED WORKS
A. Expansion techniques
In order to find the candidate concepts for query expansion in web document domain, different classes of expansion
techniques can be considered.
One of the main approach to query expansion consists in using the associativity rules underlying the domain and the
context of the query. For example, if a document contains two objects/concepts, say U and V, where only U is indexed, then
searching for V will not return the web document in the query result, even though V is present in the web document but for
some reasons it has not been explicitly i
…(Full text truncated)…
This content is AI-processed based on ArXiv data.