Characterising through Erasing: A Theoretical Framework for Representing Documents Inspired by Quantum Theory

Reading time: 6 minute
...

📝 Original Info

  • Title: Characterising through Erasing: A Theoretical Framework for Representing Documents Inspired by Quantum Theory
  • ArXiv ID: 0802.1738
  • Date: 2008-02-19
  • Authors: Researchers from original ArXiv paper

📝 Abstract

The problem of representing text documents within an Information Retrieval system is formulated as an analogy to the problem of representing the quantum states of a physical system. Lexical measurements of text are proposed as a way of representing documents which are akin to physical measurements on quantum states. Consequently, the representation of the text is only known after measurements have been made, and because the process of measuring may destroy parts of the text, the document is characterised through erasure. The mathematical foundations of such a quantum representation of text are provided in this position paper as a starting point for indexing and retrieval within a ``quantum like'' Information Retrieval system.

💡 Deep Analysis

Deep Dive into Characterising through Erasing: A Theoretical Framework for Representing Documents Inspired by Quantum Theory.

The problem of representing text documents within an Information Retrieval system is formulated as an analogy to the problem of representing the quantum states of a physical system. Lexical measurements of text are proposed as a way of representing documents which are akin to physical measurements on quantum states. Consequently, the representation of the text is only known after measurements have been made, and because the process of measuring may destroy parts of the text, the document is characterised through erasure. The mathematical foundations of such a quantum representation of text are provided in this position paper as a starting point for indexing and retrieval within a ``quantum like’’ Information Retrieval system.

📄 Full Content

CHARACTERISING THROUGH ERASING A Theoretical Framework for Representing Documents Inspired by Quantum Theory A. F. Huertas-Rosero and L. A. Azzopardi and C. J. van Rijsbergen Dept. of Computing Science, University of Glasgow, Glasgow, United Kingdom {alvaro, leif, keith}@dcs.gla.ac.uk Abstract The problem of representing text documents within an Infor- mation Retrieval system is formulated as an analogy to the problem of representing the quantum states of a physical sys- tem. Lexical measurements of text are proposed as a way of representing documents which are akin to physical measure- ments on quantum states. Consequently, the representation of the text is only known after measurements have been made, and because the process of measuring may destroy parts of the text, the document is characterised through erasure. The mathematical foundations of such a quantum representation of text are provided in this position paper as a starting point for indexing and retrieval within a “quantum like” Informa- tion Retrieval system. Introduction The problem of indexing, i.e. generating compact and in- formative representations of documents, is an important is- sue in Information Retrieval (IR). For text documents, the most successful representations have been based on the oc- currence of terms in documents. Either their presence or absence, or some statistical information about the term’s occurrence in the document. Consequently, a document is represented as a array of terms, and assumed to be fixed or static in nature. These representations are used in stan- dard IR models such as the Boolean model, Binary Inde- pendence Model (BIM), Vector Space Model, Language Model, etc (van Rijsbergen 1979; Ponte & Croft 1998; Salton & Lesk 1968) where the representation employed tends to be dictated by the model. For example, both the Boolean model and BIM expect a binary representation, whereas the Language Model expects a probability distri- bution over the vocabulary. In this work, a different approach is taken, where instead of focusing directly on building an IR model, the focus is put on devising an underlying representation of documents, which is inspired by Quantum Theory (QT). Such a repre- sentation should be suitable for being used by an IR system. An important part of physics deals with the problem of representing in the state of a system, the information an ob- server can obtain from a set of measurements. QT provides Copyright c⃝2021, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. a solution in which measurements on a quantum system can be obtained to provide a representation of the state of the system. This theory is based on the science of natural ob- jects (i.e. photon, electrons, etc). However, IR is a science of artificial objects (i.e. text / documents) (van Rijsbergen 2004). Consequently, it is necessary to explain how QT can be applied in the context of IR. Documents can be thought of as states of a physical sys- tem, and their features (such as terms), can be viewed as physical observables to be measured in such system. If a suitable definition of the measurements to be performed on documents is used, then the powerful theoretical machinery of QT can be engaged to represent and use the information obtained. The main contribution of this position paper is to define suitable lexical measurements which can be per- formed on text which will form the basis for a document representation scheme. Historically, the most successful methods for automatic indexing of text documents have been based mainly on the statistical analysis of the occurrence of terms in docu- ments (Sp¨arck-Jones 2003). It is reasonable, therefore, to propose measurements which are based on the features re- lated to the frequency of occurrence of terms in text docu- ments. These will be referred to as lexical measurements. In the next section, lexical measurements on text docu- ments are proposed and defined, and it is shown how these measurements reflect the properties of ideal quantum mea- surements. Then, operations between the measurements are defined, which enable different relationships to be captured. The proposed measurements are then discussed and direc- tions for further work outlined. Lexical measurements on Textual Documents In a physical system, the state of the system is defined by the probabilities of the possible outcomes of measurements performed on that system. However, the state of a quantum system can only have some of the measurement outcomes determined, not all of them. For example, there is an im- possibility of determining both position and velocity of an electron (Heisenberg indeterminacy principle): only one of the two properties can be determined with certainty, while the other becomes uncertain when the first is determined. For some pairs of measurements, the value of the corre- sponding observables will not depend on the order in which arXiv:0802.1738v2 [cs.IR] 18 Feb 2008 Fi

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut