Learning Better Context Characterizations: An Intelligent Information Retrieval Approach

Reading time: 5 minute
...

📝 Original Info

  • Title: Learning Better Context Characterizations: An Intelligent Information Retrieval Approach
  • ArXiv ID: 1004.3478
  • Date: 2010-04-28
  • Authors: ** - Carlos M. Lorenzetti - Ana G. Maguitman **

📝 Abstract

This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.

💡 Deep Analysis

Deep Dive into Learning Better Context Characterizations: An Intelligent Information Retrieval Approach.

This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effective than the original one at the time of constructing queries to retrieve relevant material.

📄 Full Content

arXiv:1004.3478v2 [cs.IR] 27 Apr 2010 Learning Better Context Characterizations: An Intelligent Information Retrieval Approach⋆ Carlos M. Lorenzetti Ana G. Maguitman Grupo de Investigaci´on en Recuperaci´on de Informaci´on y Gesti´on del Conocimiento LIDIA - Laboratorio de Investigaci´on y Desarrollo en Inteligencia Artificial Departamento de Ciencias e Ingenier´ıa de la Computaci´on Universidad Nacional del Sur, Av. Alem 1253, (B8000CPB) Bah´ıa Blanca, Argentina CONICET - Consejo Nacional de Investigaciones Cient´ıficas y T´ecnicas phone: 54-291-4595135 fax: 54-291-4595136 e-mail: {cml,agm}@cs.uns.edu.ar Abstract. This paper proposes an incremental method that can be used by an intelligent system to learn better descriptions of a thematic context. The method starts with a small number of terms selected from a simple description of the topic under analysis and uses this description as the initial search context. Using these terms, a set of queries are built and submitted to a search engine. New documents and terms are used to refine the learned vocabulary. Evaluations performed on a large number of topics indicate that the learned vocabulary is much more effec- tive than the original one at the time of constructing queries to retrieve relevant material. 1 Introduction Today’s search engine interfaces are appropriate when the user knows what to seek and how to seek it. However, they are unable to reflect the user context and there- fore they are not smart enough to understand the real user’s needs. For several years researchers in the Artificial Intelligent community have talked about the importance of intelligent systems that cooperate with the user to facilitate a number of computer medi- ated task [10,12]. More recently, the problem of accessing relevant information through intelligent systems has become a main research area. In order to implement intelligent Information Retrieval (IR) systems some researchers have proposed taking advantage of existing services to build more powerful tools on top of them [8,7]. Examples of systems that apply this approach take advantage of major search engines to perform intelligent context-based search [2,4,17,13,16] The Web can be regarded as a rich repository of collective memory. An intelligent system that incrementally searches this repository to find material that is useful to the user’s current needs can act as a memory augmentation aid. By an association of simi- larities, this aid can help users remember information, assure that areas relevant to the current task have been considered, and pursue new directions. ⋆This research work is supported by Agencia Nacional de Promoci´on Cient´ıfica y Tecnol´ogica (PICT 2005 Nro. 32373) and Universidad Nacional del Sur (PGI 24/ZN13). Descriptions of a user’s needs, however, are usually deficient because they are typ- ically based on the a priori knowledge of the topic of interest. This knowledge might be insufficient to formulate a good query, or more commonly, the vocabulary used by the user might not be appropriate to target the request at the right kind of material. In certain scenarios, attaining novelty and diversity may be as important, or even more important, than attaining similarity. For human-generated queries users frequently de- cide, based on initial results, to refine subsequent queries. If contextual information is available, part of the query formation and refinement process can be automated. This paper proposes a new technique for incrementally learning a better character- ization of the user context. The work presented here suggests and tests the following hypotheses: (1) the vocabulary describing the initial context can be used to identify semantically related documents and terms, but (2) the terms describing the initial con- text are not necessarily the most appropriate ones to generate search queries, and (3) the characterization of the search context can be incrementally improved by a semi- supervised learning algorithm. Our algorithm is based on the dynamic extraction of topic descriptors and discrim- inators, as first introduced in [14]. The main contribution of this paper is the proposal of a new mechanism for learning rich vocabularies associated with a thematic context. The learned vocabulary provides an improved characterization of the topic of interest in the sense that it allows to better identify topically relevant material. The effective- ness of our proposal is assessed by carrying out a comprehensive evaluation on a large collection of human-generated topic descriptions. 2 Context Characterizations For many computer-mediated tasks, the user context provides a rich set of terms that can be exploited by intelligent systems to generate queries and present related information to the user. Such systems can be equipped with special monitoring capabilities, designed to generate a model of the user context. The system will be in charge of observing how the user interacts with different kinds of computer util

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut