The pressing need for efficient compression schemes for XML documents has recently been focused on stack computation, and in particular calls for a formulation of information-lossless stack or pushdown compressors that allows a formal analysis of their performance and a more ambitious use of the stack in XML compression, where so far it is mainly connected to parsing mechanisms. In this paper we introduce the model of pushdown compressor, based on pushdown transducers that compute a single injective function while keeping the widest generality regarding stack computation. We also consider online compression algorithms that use at most polylogarithmic space (plogon). These algorithms correspond to compressors in the data stream model. We compare the performance of these two families of compressors with each other and with the general purpose Lempel-Ziv algorithm. This comparison is made without any a priori assumption on the data's source and considering the asymptotic compression ratio for infinite sequences. We prove that in all cases they are incomparable.
Deep Dive into Polylog space compression, pushdown compression, and Lempel-Ziv are incomparable.
The pressing need for efficient compression schemes for XML documents has recently been focused on stack computation, and in particular calls for a formulation of information-lossless stack or pushdown compressors that allows a formal analysis of their performance and a more ambitious use of the stack in XML compression, where so far it is mainly connected to parsing mechanisms. In this paper we introduce the model of pushdown compressor, based on pushdown transducers that compute a single injective function while keeping the widest generality regarding stack computation. We also consider online compression algorithms that use at most polylogarithmic space (plogon). These algorithms correspond to compressors in the data stream model. We compare the performance of these two families of compressors with each other and with the general purpose Lempel-Ziv algorithm. This comparison is made without any a priori assumption on the data’s source and considering the asymptotic compression r
The compression algorithms that are required for today massive data applications necessarily fall under very limited resource restrictions. In the case of the data stream setting, the algorithm receives a stream of elements one-by-one and can only store a brief summary of them, in fact the amount of available memory is far below linear [3,14]. In the context of XML data bases the main limiting factor being document size renders the use of syntax directed compression particularly appropriate, i.e. compression centered on the grammarbased generation of XML-texts and performed with stack memory [11,17].
In this paper we introduce and formalize useful compression mechanisms that can be implemented within low resource-bounds, namely pushdown compressors and polylogarithmic space online compression algorithms. We compare these two with each other and with the general purpose Lempel Ziv algorithm [18].
Finite state compressors were extensively used and studied before the celebrated result of Lempel and Ziv [18] that their algorithm is asymptotically better than any finite-state compressor. However, until recently the natural extension of finite-state to pushdown compressors has received much less attention, a situation that has changed due to new specialized compressors for XML. The work done on stack transducers has been basic and very connected to parsing mechanisms. Transducers were initially considered by Ginsburg and Rose in [9] for language generation, further corrected in [10], and summarized in [5]. For these models the role of nondeterminism is specially useful in the concept of λ-rule, that is a transition in which a symbol is popped from the stack without reading any input symbol.
We introduce here the concept of pushdown compressor as the most general stack transducer that is compatible with information-lossless compression. We allow the use of λ-rules while having a deterministic (unambiguous) model. The existence of endmarkers is also allowed, since it allows the compressor to move away from mere prefix extension. A more feasible model will also be considered where the pushdown compressor is required to be invertible by a pushdown transducer (see Section 3.1). As mentioned before, stack compression is especially adequate for XML-texts and has been extensively used [11,17]. We will also consider an even more restrictive computation model, known as visibly pushdown automata [4,15], on which XML compression can be performed.
Polylogarithmic space online compressors (plogon) are compression algorithms that use at most polylogarithmic memory while accessing the input only once. This type of algorithms models the compression that can actually be performed in the setting of data streams, where sublinear space bounds and online input access are assumed, with constant and polylogarithm being the main bounds [3,14].
For the comparison of different compression mechanisms we consider asymptotic compression ratio for infinite sequences, and without any a priori assumption on the data’s source. Notice that this excludes results that assume a certain probability distribution on the data, for instance the fact that under an ergodic source, the Lempel-Ziv compression coincides exactly with the entropy of the source with high probability on finite inputs [18]. This last result is useful when the data source is known, but it is not informative for arbitrary inputs, i.e. when the data source is unknown (notice that an infinite sequence is Lempel-Ziv incompressible with probability one). Therefore for the comparison of compression algorithms on general sequences, either an experimental or a formal approach is needed, such as that used in [16]. In this paper we follow [16] using a worst case approach, that is, we consider asymptotic performance on every infinite sequence.
We prove that the performance of plogon compressors, pushdown compressors and Lempel-Ziv’s compression scheme is incomparable in the strongest sense. For each two of these three mechanisms we construct a sequence that is compressed optimally in one scheme but is not in the other, and vice-versa. In all cases the separation is the strongest possible, i.e. optimal compressibility is achieved in the worst case (i.e. almost all prefixes of the sequence are optimally compressible), whereas incompressibility is present even in the best case (i.e. only finitely many prefixes of the sequence are compressible).
For the comparison of pushdown transducers with both plogon and Lempel Ziv, we use the most general pushdown model (where the pushdown compressor need not be invertible by a pushdown transducer) for incompressibility and the more restrictive (where the pushdown compressor is required to be invertible by a pushdown transducer) for compressibility, thus obtaining the tightest results.
The proofs are interesting by themselves, since the witnesses of each of the separations proved show the strengths and drawbacks of each of the compression mechanisms.
…(Full text truncated)…
This content is AI-processed based on ArXiv data.