In this paper will be presented methodology of encoding information in valuations of discrete lattice with some translational invariant constrains in asymptotically optimal way. The method is based on finding statistical description of such valuations and changing it into statistical algorithm, which allows to construct deterministically valuation with given statistics. Optimal statistics allow to generate valuations with uniform distribution - we get maximum information capacity this way. It will be shown that we can reach the optimum for one-dimensional models using maximal entropy random walk and that for the general case we can practically get as close to the capacity of the model as we want (found numerically: lost 10^{-10} bit/node for Hard Square). There will be also presented simpler alternative to arithmetic coding method which can be used as cryptosystem and data correction method too.
Consider all projections Z 2 → {0, 1}. In this way we can store 1 bit/node (point of the space). Now introduce some constrains, for example: there cannot be two neighboring "1" (each node has 4 neighbors) -it's so called Hard Square model(HS). It will occur that this restriction reduces the informational capacity to H HS ∼ = 0.5878911617753406 bits/node.
The goal of this paper is to introduce methodology of encoding information in such models as near their capacity as required.
We will call a model such triplet -space (Z 2 ), alphabet ({0, 1}) and some constrains. It’s elements are all projections fulfilling the constrains -we can think about them as valuations of nodes. Now the number of all such valuations over some finite 1 INTRODUCTION 2 set of nodes(A) will asymptotically grow exponentially N ∼ = 2 #AH . Because in the possibility of choosing one of N choices can be stored lg(N ) bits, this H (Shannon’s entropy) is the maximal capacity in bits/node we can achieve.
We can really store lg(N ) bits in choosing one of N choices, only if all of them are equally probable only. So to get the whole available capacity, we have to make that all possible valuations are equally probable. Unfortunately the space of valuations over infinite space is usually quite complicated. But thanks of translational symmetry, elements should have the same local statistical behavior. If we find it and valuate the space accordingly, we should get near to the uniform distribution over all elements. The statistical algorithm have to encode some information in generating some specific valuation, fulfilling the optimal statistics of the space. Statistical description (p) is a function, which for every finite set (shape) gives the probability distribution of valuations on it (patterns). Thanks of the translational invariance, we can for example write p(01) -the probability that while taking any two neighboring nodes, they give ‘01’ pattern. In one dimension we can find the optimal statistical description using pure combinatorics. In higher it’s much more complicated, but we can for example divide the space into short stripes, create new alphabet from their valuations and just use the one-dimensional method.
Having the statistical description, we can use it to construct the statistical algorithm. For example divide the space into straps and valuate them succeedingly. Now for succeeding nodes, depending on the valuations of already valuated neighbors, we get some probability from created previously table. According to this probability we valuate the node, encoding some information.
We can think about for example hard disk, locally as valuating nodes (let say -magnetizing round dots) of two-dimensional lattice with 0 or 1 (without constrains). more nodes, but we get some constrains -like in HS -so the capacity is now: 2 * 0.587 ∼ = 1.17 -we get 17% greater capacity. We’ve got it by more precise positioning of the head -it’s technically easier to achieve than shrinking the dot.
We will see that going further, we can increase the capacity potentially to infinity.
We can use statistical algorithm approach also to generate random states (e.g. spin alignment) in statistical physics. Usually we use Monte-Carlo methods -to generate “good” alignment we have to make many runs (the more, the better).
But using for example many of such alignments, we can approximate its (local) statistical description with assumed “goodness”. Now using statistical algorithm, we can generate so “good” alignments in one run.
In the second section we will see how to solve analytically the one-dimensional model -find its optimal description and capacity. We will motivate that it should be Shannon’s entropy. To find the optimal description we will just average over all elements. In this case the statistical algorithm will be just Markov process -we will valuate node by node from left to right and the probability distribution for a node is found using only the valuation of the previous one. We get this way random walk on a graph (of symbols), which maximizes global entropy ( [11]). This approach can be generalized to other than uniform distributions of sequences, by introducing some vertex/edge potentials.
In the third section there will be presented asymmetric numeral systems -a generalization of numeral systems, which are optimized for encoding sequences of equiprobable digits into which the probability distribution of digits is given. It’s natural way to encode data using given statistical algorithm. This algorithm can be alternative for widely used arithmetic coding method: in one table check it compress/decompress a few bits (a symbol) and have option that the output is encrypted, probably very well. It has also very nice data correction properties.
In the fourth section there will be introduced formality for general models. It will be shown that for “reasonable” models: X = Z n , translative invariant constrains with finite range and which are “simple” -v