WaveRange: Wavelet-based data compression for three-dimensional numerical simulations on regular grids

WaveRange: Wavelet-based data compression for three-dimensional   numerical simulations on regular grids
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

A wavelet-based method for compression of three-dimensional simulation data is presented and its software framework is described. It uses wavelet decomposition and subsequent range coding with quantization suitable for floating-point data. The effectiveness of this method is demonstrated by applying it to example numerical tests, ranging from idealized configurations to realistic global-scale simulations.


💡 Research Summary

WaveRange is a dedicated software framework for compressing three‑dimensional floating‑point data generated by large‑scale numerical simulations on regular Cartesian grids. The authors address a pressing problem in high‑performance computing (HPC): modern simulations routinely produce terabytes of volumetric output, and conventional lossless compressors such as LZMA achieve only modest size reductions (typically <20 %). While down‑sampling or single‑precision storage can cut data volume, these approaches often discard scientifically relevant information, especially for restart files that must preserve the full state of a simulation.

The proposed solution combines three well‑known techniques—wavelet transform, quantization, and entropy coding—into a coherent pipeline that offers controllable lossy compression with guaranteed reconstruction error bounds. The core of the method is a three‑dimensional multiresolution wavelet transform based on the Cohen‑Daubechies‑Feauveau 9/7 (CDF 9/7) bi‑orthogonal wavelet, implemented via the lifting scheme. The transform is applied in‑place: a 1‑D lifting step is performed sequentially along the x, y, and z directions, and the approximation coefficients from one level become the input for the next level. The authors fix the number of decomposition levels to L = 4, which reduces the number of approximation coefficients to roughly 4 % of the original data while preserving most of the energy in the detail coefficients. This depth is sufficient for the test cases considered and keeps the algorithm’s computational cost linear in the number of grid points (O(N)).

After transformation, the floating‑point wavelet coefficients are quantized to integers. Quantization is driven by a user‑specified error tolerance (e.g., a maximum L∞ error ε). The algorithm computes a global scaling factor based on the maximum absolute coefficient value and the desired ε, then applies the same step size to all detail coefficients at all levels. This simple scheme enables users to trade compression ratio against reconstruction fidelity in an intuitive way.

The quantized integer stream is finally entropy‑coded using range coding, an arithmetic‑style method that typically yields higher compression than Huffman coding while remaining relatively easy to implement. Because the quantized symbols are integer values with a highly skewed distribution (most detail coefficients are near zero), range coding efficiently captures the redundancy.

WaveRange is released under the GNU GPL‑3.0 license, written in C/C++, and can be used either as a standalone command‑line tool or as a library linked into existing simulation codes. It supports several input formats, including generic Fortran/C arrays, FluSI, and MSSG output/restart files. Command‑line options allow the user to set the number of wavelet levels, the target error tolerance, the quantization step, and the output file name. The source code and a sample compressed dataset (in HDF5 format) are hosted on GitHub (https://github.com/pseudospectators/WaveRange).

The authors evaluate WaveRange on a spectrum of test problems. Synthetic data (smooth functions, random noise) demonstrate compression ratios between 12× and 25× while respecting the prescribed error bounds. Realistic CFD datasets, such as turbulent wake flow velocity fields and atmospheric dynamics simulations, achieve ratios from 8× up to 30× depending on the tolerance. Importantly, when the method is applied to restart files, the reconstructed fields lead to post‑restart simulations whose trajectories are indistinguishable from those obtained with the original uncompressed data, provided the error tolerance is within the range typically required for scientific fidelity (ε ≤ 10⁻⁴).

Performance measurements show that the entire pipeline (transform, quantization, coding) scales linearly with data size and benefits strongly from OpenMP parallelism. On a 16‑core workstation, compressing a 1 GB dataset takes roughly 2–5 seconds, and decompression takes a comparable amount of time. This is an order of magnitude faster than many existing lossy image‑compression‑based approaches that rely on external libraries, and it delivers far higher compression than lossless methods.

The paper also discusses the relationship between compression ratio, reconstruction error, and downstream scientific metrics. For example, in a global atmospheric model the authors compare temperature and wind statistics before and after compression; the differences remain well below the natural variability of the system when ε is set to 10⁻⁴. This demonstrates that WaveRange can be safely integrated into in‑situ data‑reduction pipelines, reducing I/O pressure without compromising scientific outcomes.

In the concluding section, the authors highlight the strengths of WaveRange: (1) a mathematically sound, multiresolution wavelet transform that exploits spatial locality in fluid‑dynamics fields; (2) a simple yet effective quantization strategy that provides explicit error control; (3) efficient entropy coding that maximizes compression; (4) open‑source availability and easy integration into existing workflows; and (5) demonstrated applicability to both synthetic benchmarks and large‑scale Earth‑system simulations. Future work is suggested in three directions: extending support to non‑Cartesian or adaptive meshes, implementing GPU‑accelerated versions of the transform and coding stages, and developing analysis tools that operate directly on compressed data (e.g., computing statistics without full decompression). Overall, WaveRange offers a practical, high‑performance solution for the growing challenge of managing massive 3‑D scientific datasets in modern HPC environments.


Comments & Academic Discussion

Loading comments...

Leave a Comment