This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.
Deep Dive into Learning Better Context Characterizations: An Intelligent Information Retrieval Approach.
This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.
arXiv:1004.3478v2 [cs.IR] 27 Apr 2010
Learning Better Context Characterizations:
An Intelligent Information Retrieval Approach⋆
Carlos M. Lorenzetti
Ana G. Maguitman
Grupo de Investigaci´on en Recuperaci´on de Informaci´on y Gesti´on del Conocimiento
LIDIA - Laboratorio de Investigaci´on y Desarrollo en Inteligencia Artificial
Departamento de Ciencias e Ingenier´ıa de la Computaci´on
Universidad Nacional del Sur, Av. Alem 1253, (B8000CPB) Bah´ıa Blanca, Argentina
CONICET - Consejo Nacional de Investigaciones Cient´ıficas y T´ecnicas
phone: 54-291-4595135 fax: 54-291-4595136
e-mail: {cml,agm}@cs.uns.edu.ar
Abstract. This paper proposes an incremental method that can be used by an
intelligent system to learn better descriptions of a thematic context. The method
starts with a small number of terms selected from a simple description of the topic
under analysis and uses this description as the initial search context. Using these
terms, a set of queries are built and submitted to a search engine. New documents
and terms are used to refine the learned vocabulary. Evaluations performed on a
large number of topics indicate that the learned vocabulary is much more effec-
tive than the original one at the time of constructing queries to retrieve relevant
material.
1
Introduction
Today’s search engine interfaces are appropriate when the user knows what to seek
and how to seek it. However, they are unable to reflect the user context and there-
fore they are not smart enough to understand the real user’s needs. For several years
researchers in the Artificial Intelligent community have talked about the importance of
intelligent systems that cooperate with the user to facilitate a number of computer medi-
ated task [10,12]. More recently, the problem of accessing relevant information through
intelligent systems has become a main research area. In order to implement intelligent
Information Retrieval (IR) systems some researchers have proposed taking advantage
of existing services to build more powerful tools on top of them [8,7]. Examples of
systems that apply this approach take advantage of major search engines to perform
intelligent context-based search [2,4,17,13,16]
The Web can be regarded as a rich repository of collective memory. An intelligent
system that incrementally searches this repository to find material that is useful to the
user’s current needs can act as a memory augmentation aid. By an association of simi-
larities, this aid can help users remember information, assure that areas relevant to the
current task have been considered, and pursue new directions.
⋆This research work is supported by Agencia Nacional de Promoci´on Cient´ıfica y Tecnol´ogica
(PICT 2005 Nro. 32373) and Universidad Nacional del Sur (PGI 24/ZN13).
Descriptions of a user’s needs, however, are usually deficient because they are typ-
ically based on the a priori knowledge of the topic of interest. This knowledge might
be insufficient to formulate a good query, or more commonly, the vocabulary used by
the user might not be appropriate to target the request at the right kind of material. In
certain scenarios, attaining novelty and diversity may be as important, or even more
important, than attaining similarity. For human-generated queries users frequently de-
cide, based on initial results, to refine subsequent queries. If contextual information is
available, part of the query formation and refinement process can be automated.
This paper proposes a new technique for incrementally learning a better character-
ization of the user context. The work presented here suggests and tests the following
hypotheses: (1) the vocabulary describing the initial context can be used to identify
semantically related documents and terms, but (2) the terms describing the initial con-
text are not necessarily the most appropriate ones to generate search queries, and (3)
the characterization of the search context can be incrementally improved by a semi-
supervised learning algorithm.
Our algorithm is based on the dynamic extraction of topic descriptors and discrim-
inators, as first introduced in [14]. The main contribution of this paper is the proposal
of a new mechanism for learning rich vocabularies associated with a thematic context.
The learned vocabulary provides an improved characterization of the topic of interest
in the sense that it allows to better identify topically relevant material. The effective-
ness of our proposal is assessed by carrying out a comprehensive evaluation on a large
collection of human-generated topic descriptions.
2
Context Characterizations
For many computer-mediated tasks, the user context provides a rich set of terms that can
be exploited by intelligent systems to generate queries and present related information
to the user. Such systems can be equipped with special monitoring capabilities, designed
to generate a model of the user context. The system will be in charge of observing
how the user interacts with different kinds of computer util
…(Full text truncated)…
This content is AI-processed based on ArXiv data.