The problem of representing text documents within an Information Retrieval system is formulated as an analogy to the problem of representing the quantum states of a physical system. Lexical measurements of text are proposed as a way of representing documents which are akin to physical measurements on quantum states. Consequently, the representation of the text is only known after measurements have been made, and because the process of measuring may destroy parts of the text, the document is characterised through erasure. The mathematical foundations of such a quantum representation of text are provided in this position paper as a starting point for indexing and retrieval within a ``quantum like'' Information Retrieval system.
Deep Dive into Characterising through Erasing: A Theoretical Framework for Representing Documents Inspired by Quantum Theory.
The problem of representing text documents within an Information Retrieval system is formulated as an analogy to the problem of representing the quantum states of a physical system. Lexical measurements of text are proposed as a way of representing documents which are akin to physical measurements on quantum states. Consequently, the representation of the text is only known after measurements have been made, and because the process of measuring may destroy parts of the text, the document is characterised through erasure. The mathematical foundations of such a quantum representation of text are provided in this position paper as a starting point for indexing and retrieval within a ``quantum like’’ Information Retrieval system.
CHARACTERISING THROUGH ERASING
A Theoretical Framework for Representing Documents Inspired by Quantum Theory
A. F. Huertas-Rosero and L. A. Azzopardi and C. J. van Rijsbergen
Dept. of Computing Science, University of Glasgow,
Glasgow, United Kingdom
{alvaro, leif, keith}@dcs.gla.ac.uk
Abstract
The problem of representing text documents within an Infor-
mation Retrieval system is formulated as an analogy to the
problem of representing the quantum states of a physical sys-
tem. Lexical measurements of text are proposed as a way of
representing documents which are akin to physical measure-
ments on quantum states. Consequently, the representation of
the text is only known after measurements have been made,
and because the process of measuring may destroy parts of
the text, the document is characterised through erasure. The
mathematical foundations of such a quantum representation
of text are provided in this position paper as a starting point
for indexing and retrieval within a “quantum like” Informa-
tion Retrieval system.
Introduction
The problem of indexing, i.e. generating compact and in-
formative representations of documents, is an important is-
sue in Information Retrieval (IR). For text documents, the
most successful representations have been based on the oc-
currence of terms in documents. Either their presence or
absence, or some statistical information about the term’s
occurrence in the document. Consequently, a document is
represented as a array of terms, and assumed to be fixed
or static in nature. These representations are used in stan-
dard IR models such as the Boolean model, Binary Inde-
pendence Model (BIM), Vector Space Model, Language
Model, etc (van Rijsbergen 1979; Ponte & Croft 1998;
Salton & Lesk 1968) where the representation employed
tends to be dictated by the model. For example, both the
Boolean model and BIM expect a binary representation,
whereas the Language Model expects a probability distri-
bution over the vocabulary.
In this work, a different approach is taken, where instead
of focusing directly on building an IR model, the focus is
put on devising an underlying representation of documents,
which is inspired by Quantum Theory (QT). Such a repre-
sentation should be suitable for being used by an IR system.
An important part of physics deals with the problem of
representing in the state of a system, the information an ob-
server can obtain from a set of measurements. QT provides
Copyright c⃝2021, Association for the Advancement of Artificial
Intelligence (www.aaai.org). All rights reserved.
a solution in which measurements on a quantum system can
be obtained to provide a representation of the state of the
system. This theory is based on the science of natural ob-
jects (i.e. photon, electrons, etc). However, IR is a science
of artificial objects (i.e. text / documents) (van Rijsbergen
2004). Consequently, it is necessary to explain how QT can
be applied in the context of IR.
Documents can be thought of as states of a physical sys-
tem, and their features (such as terms), can be viewed as
physical observables to be measured in such system. If a
suitable definition of the measurements to be performed on
documents is used, then the powerful theoretical machinery
of QT can be engaged to represent and use the information
obtained. The main contribution of this position paper is
to define suitable lexical measurements which can be per-
formed on text which will form the basis for a document
representation scheme.
Historically, the most successful methods for automatic
indexing of text documents have been based mainly on
the statistical analysis of the occurrence of terms in docu-
ments (Sp¨arck-Jones 2003). It is reasonable, therefore, to
propose measurements which are based on the features re-
lated to the frequency of occurrence of terms in text docu-
ments. These will be referred to as lexical measurements.
In the next section, lexical measurements on text docu-
ments are proposed and defined, and it is shown how these
measurements reflect the properties of ideal quantum mea-
surements. Then, operations between the measurements are
defined, which enable different relationships to be captured.
The proposed measurements are then discussed and direc-
tions for further work outlined.
Lexical measurements on Textual Documents
In a physical system, the state of the system is defined by
the probabilities of the possible outcomes of measurements
performed on that system. However, the state of a quantum
system can only have some of the measurement outcomes
determined, not all of them. For example, there is an im-
possibility of determining both position and velocity of an
electron (Heisenberg indeterminacy principle): only one of
the two properties can be determined with certainty, while
the other becomes uncertain when the first is determined.
For some pairs of measurements, the value of the corre-
sponding observables will not depend on the order in which
arXiv:0802.1738v2 [cs.IR] 18 Feb 2008
Fi
…(Full text truncated)…
This content is AI-processed based on ArXiv data.