Floating Point Compression: Lossless and Lossy Solutions
High-precision numerical data from computer simulations, observations, and experiments is often represented in floating point and can easily reach terabytes to petabytes of storage. Moving such large data sets to and from disk, across the internet, between compute nodes, and even through the memory hierarchy presents a significant bottleneck. To address this problem, we have developed lossy and lossless high-speed data compressors that can greatly reduce the amount of data stored and moved.
For lossless compression, where each and every bit of each floating-point number has to be exactly preserved without any loss in accuracy, our memory efficient streaming fpzip compressor usually provides 1.5x-4x data reduction, depending on data precision and smoothness.
To achieve much higher compression ratios, lossy compression is needed, where small, often imperceptible or numerically negligible errors may be introduced. Our zfp compressor for floating-point and integer data often achieves compression ratios on the order of 100:1, i.e., to less than 1 bit per value of compressed storage.
zfp frequently gives more accurate results than competing compressors (including our own fpzip). Its throughput of up to 2 GB/s per CPU core and 150 GB/s parallel throughput on an NVIDIA Volta GPU is also many times faster. zfp can achieve an exact bit rate, ensure that reconstructed values are within an absolute error tolerance, meet a specified precision requirement, or ensure fully lossless compression. zfp also comes with C and C++ compressed-array classes that support random access and that can be used in place of conventional C arrays or STL vectors, e.g., for numerical computations.
zfp and fpzip were both designed for compressing logically regular 1D, 2D, 3D, or 4D arrays of single- or double-precision floating-point numbers that exhibit spatial correlation (e.g., regularly sampled continuous functions). They should not be used to compress unstructured data such as triangle mesh geometry, unorganized point sets, or streams of unrelated numbers. Think of fpzip as the floating-point analogue to PNG image compression and zfp as advanced JPEG for floating-point arrays. Source code for both compressors is available for download below.
These projects are related to the hzip compressor and tthresh. hzip is an older project available as an archive; it is a C++ library for lossless compression of structured and unstructured meshes composed of cells with hypercube topology. tthresh is open source and available on GitHub.