A toy model of information retrieval system based on quantum probability
Recent numerical results show that non-Bayesian knowledge revision may be helpful in search engine training and optimization. In order to demonstrate how basic assumption about about the physical nature (and hence the observed statistics) of retrieved documents can affect the performance of search engines we suggest an idealized toy model with minimal number of parameters.
💡 Research Summary
The paper proposes an idealized toy model of an information retrieval (IR) system in order to explore how the underlying physical assumptions about documents and terms influence retrieval performance. The authors start from the premise that both relevance (R) and the presence of a particular term X in a document are measurable properties, analogous to physical observables, and that the collection of documents can be treated as an effectively infinite medium. Two contrasting implementations are examined: a classical model based on Boolean logic and an urn‑ball metaphor, and a quantum model in which documents are represented as spin‑½ particles living in a two‑dimensional complex Hilbert space.
In both models the only retrieval operation employed is query expansion by pre‑filtering documents that contain term X. The authors define a precision‑boost measure Δ(X)=P(R|X)−P(R) and an “Accardi statistical invariant” A=P(X)−P(X|R)P(R)−P(X|¬R)P(¬R), which quantifies the deviation from the classical law of total probability.
For the classical urn model, relevance probability is p=N_R/N, and conditional term probabilities are q_R=N_{X∧R}/N_R and q_N=N_{X∧¬R}/N_{¬R}. Applying Bayes’ theorem yields P(R|X)=q_R p/(q_R p+q_N (1−p)). Substituting into Δ gives a closed‑form expression (5). In this setting the Accardi invariant reduces to A=p, so it is constrained to the interval
Comments & Academic Discussion
Loading comments...
Leave a Comment