TopoSZp: Lightweight Topology-Aware Error-controlled Compression for Scientific Data
Error-bounded lossy compression is essential for managing the massive data volumes produced by large-scale HPC simulations. While state-of-the-art compressors such as SZ and ZFP provide strong numerical error guarantees, they often fail to preserve topological structures (example, minima, maxima, and saddle points) that are critical for scientific analysis. Existing topology-aware compressors address this limitation but incur substantial computational overhead. We present TopoSZp, a lightweight, topology-aware, error-controlled lossy compressor that preserves critical points and their relationships while maintaining high compression and decompression performance. Built on the high-throughput SZp compressor, TopoSZp integrates efficient critical point detection, local ordering preservation, and targeted saddle point refinement, all within a relaxed but strictly enforced error bound. Experimental results on real-world scientific datasets show that TopoSZp achieves 3 to 100 times fewer non-preserved critical points, introduces no false positives or incorrect critical point types, and delivers 100 to 10000 times faster compression and 10 to 500 times faster decompression compared to existing topology-aware compressors, while maintaining competitive compression ratios.
💡 Research Summary
The paper introduces TopoSZp, a lightweight topology‑aware, error‑controlled lossy compressor built on the high‑throughput SZp framework. The authors identify a critical gap in existing error‑bounded compressors (e.g., SZ, ZFP): while they guarantee pointwise numerical error, they often destroy topological features such as minima, maxima, and saddle points, which are essential for downstream scientific analyses (feature tracking, Morse‑Smale complex extraction, etc.). Existing topology‑preserving compressors address this but rely on expensive global topological analyses (persistence diagrams, contour trees) that dramatically increase runtime and reduce compression ratios.
TopoSZp’s key idea is to embed lightweight topological safeguards directly into SZp’s quantization stage—the main source of distortion. The pipeline consists of three components:
-
Fast critical‑point detection – a local 3×3 (or 5×5) stencil scans each grid point; if a point is strictly larger (or smaller) than all its neighbors it is marked as a potential extremum. This detection is O(N) and parallelized with OpenMP.
-
Extrema stencil preservation – after SZp quantization, the algorithm checks whether any detected extremum has been flattened (i.e., its quantized value equals that of a neighbor). If so, the value is nudged by up to ε/2 to restore the strict ordering while still respecting the global error bound. This ensures that maxima/minima are never lost.
-
RBF‑based saddle‑point refinement – saddle points are more subtle because they involve mixed ascent/descent directions. For each saddle candidate, a radial‑basis‑function interpolation is performed on a small neighbourhood to reconstruct a smoother field. The interpolated value is then adjusted until the classic saddle condition (two higher, two lower neighbours) holds. This step is also parallelized and incurs only a modest overhead.
The authors prove that SZp’s quantization is monotone, meaning false positives (new critical points) and false type changes cannot occur; only false negatives (missing points) are possible. By applying the two corrective mechanisms above, TopoSZp eliminates virtually all false negatives, achieving zero false positives and zero type errors in practice.
Experimental evaluation uses five real‑world HPC datasets from climate modeling, combustion, cosmology, and other domains. Compared against two state‑of‑the‑art topology‑aware compressors (TopoA and TopoSZ) and against plain SZp, TopoSZp delivers:
- 3×–100× fewer missing critical points – especially effective on data with many saddles.
- 0 % false positives and false type errors – confirming the monotonicity‑based guarantee.
- Compression speedups of 100×–10 000× and decompression speedups of 10×–500× relative to the topology‑aware baselines, thanks to the O(N) local checks and OpenMP parallelism.
- Compression ratios comparable to SZp (slightly lower in some cases due to the small value adjustments, but still competitive).
The paper concludes that global topological analysis is not a prerequisite for high‑fidelity topology preservation; local detection plus selective refinement suffices for large‑scale scientific data. Future work includes extending the approach to multi‑scale persistence weighting, GPU‑accelerated implementations, and handling vector/tensor fields.
Overall, TopoSZp offers a practical solution that bridges the gap between fast error‑bounded compression and rigorous topological fidelity, making it suitable for integration into existing HPC data pipelines without prohibitive computational cost.
Comments & Academic Discussion
Loading comments...
Leave a Comment