On the Probability Distribution of Superimposed Random Codes
A systematic study of the probability distribution of superimposed random codes is presented through the use of generating functions. Special attention is paid to the cases of either uniformly distributed but not necessarily independent or non uniform but independent bit structures. Recommendations for optimal coding strategies are derived.
š” Research Summary
The paper investigates the probabilistic behavior of superimposed random coding, a technique widely used to accelerate subgraph searches in large chemicalāstructure databases. Each compound is represented by a binary descriptor vector; a query is similarly encoded, and the final ātargetā bitstring is obtained by ORācombining the codewords associated with the positions of 1ābits in the original descriptor. This operation must preserve the partial order of the source vectors, which leads to the defining equation Ļ(β)=āØ_{jāβ^{-1}(1)}Ļ_j.
The authors introduce isotropic probability distributions on the nābit space, i.e., distributions that depend only on the number of 1ābits, not on their positions. Such a distribution is characterized by probabilities p_k (kāÆ=āÆ0ā¦n) and its generating function f(t)=ā_{k=0}^n C(n,k)p_k t^k. From f(t) they define two auxiliary functions: F_a = P(ξ ⤠α) and G_a = P(ξ ℠α), which are related to f(t) by simple transformations (F(t)=(1+t)^n f(t/(1+t)), G(t)=t^n f(1/(1+t))).
The central theoretical contribution (TheoremāÆ1) states that if the source bit patterns have generating function Ī (t) and the codewords have distribution coefficients F_m, then the distribution of the target bits is given by \hat F_m = Ī (F_m). In other words, the source distribution acts as a linear operator on the codeāword distribution, while the codeāword distribution itself introduces a nonālinear transformation. This relationship enables closedāform expressions for the first two moments of the target distribution:
\hat μ_1 = n
Comments & Academic Discussion
Loading comments...
Leave a Comment