zfp & fpzip: Floating Point Compression
Floating-point data compression
High-precision numerical data from computer simulations, observations, and experiments is often represented in floating point and can easily reach terabytes to petabytes of storage. Moving such large data sets to and from disk, across the internet, between compute nodes, and even through the memory hierarchy presents a significant bottleneck. To address this problem, we have developed lossy and lossless high-speed data compressors that can greatly reduce the amount of data stored and moved.
For lossless compression, where each and every bit of each floating-point number has to be exactly preserved without any loss in accuracy, our memory efficient streaming fpzip compressor usually provides 1.5x-4x data reduction, depending on data precision and smoothness.
To achieve much higher compression ratios, lossy compression is needed, where small, often imperceptible or numerically negligible errors may be introduced. Our zfp compressor for floating-point and integer data often achieves compression ratios on the order of 100:1, i.e., to less than 1 bit per value of compressed storage.
zfp frequently gives more accurate results than competing compressors (including our own fpzip). Its throughput of up to 2 GB/s per CPU core and 150 GB/s parallel throughput on an NVIDIA Volta GPU is also many times faster. zfp can achieve an exact bit rate, ensure that reconstructed values are within an absolute error tolerance, meet a specified precision requirement, or ensure fully lossless compression. zfp also comes with C and C++ compressed-array classes that support random access and that can be used in place of conventional C arrays or STL vectors, e.g., for numerical computations.
zfp and fpzip were both designed for compressing logically regular 1D, 2D, 3D, or 4D arrays of single- or double-precision floating-point numbers that exhibit spatial correlation (e.g., regularly sampled continuous functions). They should not be used to compress unstructured data such as triangle mesh geometry, unorganized point sets, or streams of unrelated numbers. Think of fpzip as the floating-point analogue to PNG image compression and zfp as advanced JPEG for floating-point arrays. Source code for both compressors is available for download below.
zfp is a BSD licensed open source library for compressed floating-point arrays that support high throughput read and write random access. zfp is primarily written in C and C++ but also has Python and Fortan bindings. zfp is loosely based on the algorithm described in the following paper:
Peter Lindstrom, “Fixed-Rate Compressed Floating-Point Arrays,” IEEE Transactions on Visualization and Computer Graphics, 20(12): 2674–2683, December 2014, doi:10.1109/TVCG.2014.2346458.
zfp was designed to achieve high compression ratios and therefore uses lossy but optionally error-bounded compression. Bit-for-bit lossless compression of integer and floating-point arrays is also supported. zfp is often more accurate and significantly faster than other lossy compressors, especially in its OpenMP and CUDA multithreaded modes. Our team has developed an FPGA implementation of zfp that further improves throughput by 1-2 orders of magnitude.
zfp is available for download below and is also hosted on GitHub. zfPy, the Python interface to zfp, can be installed as a conda package written by Kevin Paul or as a pip package written by David Wade. HDF5 users may be interested in the H5Z-ZFP compression plugin written by Mark Miller. zfp is supported by software tools and I/O libraries like Intel IPP, HDF5, ADIOS, VTK-m, TTK, and E4S.
zfp development is supported by the US Department of Energy’s Exascale Computing Project and by the Advanced Simulation and Computing Program. Advanced features, such as variable-rate random-access arrays, were investigated on LLNL’s Variable Precision Computing Project.
For more information on zfp, please see the tabs on the left.
fpzip is a BSD licensed open source library for lossless or lossy compression of large multidimensional floating-point arrays. Although written in C++, fpzip has a C interface. fpzip is based on the algorithm described in the following paper:
Peter Lindstrom and Martin Isenburg, “Fast and Efficient Compression of Floating-Point Data,” IEEE Transactions on Visualization and Computer Graphics, 12(5):1245–1250, September–October 2006, doi:10.1109/TVCG.2006.143.
fpzip was primarily designed for lossless compression but also has provision for lossy compression. For lossy compression, our zfp compressor often outperforms fpzip.
Questions and Comments on either zfp or fpzip
Check Out Our Latest Version on GitHub