llnl / fpzip Goto Github PK

View Code? Open in Web Editor NEW

90.0 4.0 14.0 67 KB

Lossless compressor of multidimensional floating-point arrays

Home Page: http://fpzip.llnl.gov

License: BSD 3-Clause "New" or "Revised" License

CMake 11.98% Makefile 1.62% C 24.97% C++ 61.43%

cpp floating-point compression data-viz

fpzip's Introduction

fpzip

INTRODUCTION

fpzip is a library and command-line utility for lossless and optionally lossy compression of 2D and 3D floating-point arrays. fpzip assumes spatially correlated scalar-valued data, such as regularly sampled continuous functions, and is not suitable for compressing unstructured streams of floating-point numbers. In lossy mode, fpzip discards some number of least significant mantissa bits and losslessly compresses the result. fpzip currently supports IEEE-754 single (32-bit) and double (64-bit) precision floating-point data. fpzip is written in C++ but has a C compatible API that can be called from C and other languages. It conforms to the C++98 and C89 language standards.

fpzip is released as Open Source under a three-clause BSD license. Please see the file LICENSE for further details.

INSTALLATION

CMake builds

fpzip was developed for Linux and macOS but can be built on Windows using CMake. To use CMake, type:

cd fpzip
mkdir build
cd build
cmake ..
cmake --build . --config Release

fpzip can be configured using compile-time options, e.g.:

cmake .. -DFPZIP_FP=FPZIP_FP_SAFE -DBUILD_UTILITIES=OFF

To display the available options, type:

cmake .. -L

Basic regression testing is available:

ctest -V -C Release

GNU builds

fpzip may also be built using GNU make:

cd fpzip
gmake

This builds lib/libfpzip.a and bin/fpzip.

The GNU make options are listed in the file Config and should preferably be set on the command line, e.g.:

gmake FPZIP_FP=FPZIP_FP_SAFE BUILD_UTILITIES=0

To run the regression tests, type:

gmake test

DOCUMENTATION

Documentation is currently limited to the source files themselves. For information on the API, please see the header file include/fpzip.h. For an example of how to call fpzip, please see the source file utils/fpzip.cpp. This utility may be used to compress binary files of raw floating-point numbers. Usage is given by:

fpzip -h

AUTHOR

fpzip was written by Peter Lindstrom at Lawrence Livermore National Laboratory.

CITING fpzip

If you use fpzip for scholarly research, please cite the following paper:

Peter Lindstrom and Martin Isenburg
"Fast and Efficient Compression of Floating-Point Data"
IEEE Transactions on Visualization and Computer Graphics, 12(5):1245-1250, 2006

LICENSE

fpzip is distributed under the terms of the BSD license. See the files LICENSE and NOTICE for details.

SPDX-License-Identifier: BSD

LLNL-CODE-764017

QUESTIONS AND COMMENTS

For questions and comments, please contact us at [email protected]. Please submit bug reports and feature requests via the GitHub issue tracker.

fpzip's People

Contributors

Stargazers

Watchers

Forkers

sciumo simonxuluo stjordanis aras-p william-silversmith wgq-iapcm mathrack victor1234 jpcoding lse672 palmin ymwang78 jacobloveless jiang2017jj

fpzip's Issues

Understanding Compression Ratios

Many thanks for providing these excellent compressors as open source software! I do not believe this is a bug report, but an inquiry on the characteristics of the compression algorithm and how to interpret the compression ratio.

I have made a very simple experiment to better understand the characteristics of fpzip's compression algorithm:

// Helper function for losslessly compressing a 1D array of doubles and returning the compression ratio
func compress(data: [Double], toFPZ fpz: UnsafeMutablePointer<FPZ>?) -> (Double, Int) {
    let nx: Int32 = Int32(data.count)
    let size = Int32(MemoryLayout<Double>.size)

    fpz!.pointee.type = FPZIP_TYPE_DOUBLE
    fpz!.pointee.prec = 64 // Int32(CHAR_BIT * size), this is full 64 bits of precision, which is lossless
    fpz!.pointee.nx = nx
    fpz!.pointee.ny = 1
    fpz!.pointee.nz = 1
    fpz!.pointee.nf = 1

    // write header
    fpzip_write_header(fpz)

    // perform actual compression
    let outbytes = fpzip_write(fpz, data)

    fpzip_write_close(fpz)

    let ratio = Double(nx) * Double(size) / Double(outbytes)
    return (ratio, outbytes)
}

var rng = ThreefryRandomNumberGenerator(seed: [42]) // This is provided by Swift for Tensorflow
let randomDistribution = UniformFloatingPointDistribution(lowerBound: 0.0, upperBound: Double(count)) // Also from Swift for Tensorflow
let tensorflowRandomValues: [Double] = (0..<Int(count * 10)).map { _ in randomDistribution.next(using: &rng) }
let tensorflowCompressionRatio = compressionRatio(data: tensorflowRandomValues)
print("uniform compression ratio:", tensorflowCompressionRatio)

// I expect the compression ratio to be better for a non-uniform distribution
let normalDistributionGenerator = NormalDistribution(mean: 0.0, standardDeviation: 1.0)
let normalValues: [Double] = (0..<count).map { _ in normalDistributionGenerator.next(using: &rng) }
let normalCompressionRatio = compressionRatio(data: normalValues)
print("normal compression ratio:", normalCompressionRatio)

Please excuse the fact that the code is written in Swift. I wrote a Swift->C interface to call fpzip from Swift, but the same should be readily achievable directly in C. This code generates random data from a uniform distribution and from a normal distribution, compresses each independently as 1-dimensional arrays, and then reports the compression ratios. It produces this output:

uniform compression ratio: 1.1626494222504358
normal compression ratio: 1.0636699538500198

Two questions arise:

Why is the compression ratio on the uniformly distributed random data greater than 1.0? I would expect it to be exactly 1.0 given that it's random data, unless perhaps the data aren't properly random. I tried generating the random data using various different methods: Swift's built-in Double.random, TensorFlow's ThreefryRandomNumberGenerator, as well as downloading truly random values produced by quantum physical processes. These all gave me a compression ratio of ~1.1625541314267445. I would hypothesize perhaps fpzip's minimum compression ratio on floats is 1.16 no matter how incompressible the data are due to some improvement over the floating point representation, but that doesn't hold up against the compression ratio of the normally distributed data:
Why is the compression ratio on the normally distributed random data worse than on the uniformly distributed data? I would expect the opposite. The normally distributed points are more likely to be clumped together around 0. Indeed, a differential entropy calculation of these distributions shows that the uniform distribution has a higher entropy:

#!/usr/bin/env python3

from scipy import stats

print(stats.uniform(loc=0.0, scale=100000).entropy()) # uniform distribution from 0 to 100,000
print(stats.norm(loc=0.0).entropy()) # std-deviation 1.0 normal distribution

Producing output:

11.512925464970229
1.4189385332046727

As expected, the uniform distribution is higher entropy than the normal distribution. Similar results are obtained when calculating discrete entropies on discretized samples from uniform and normal distributions.

I realize attempting to compress random data is a fool's errand, this is merely an experiment to aid my understanding of how the compression works.

FR: support for __fp16

Are there any plans to support half precision floats, using the __fp16 type?

https://en.wikipedia.org/wiki/Half-precision_floating-point_format

Just a question

Dear writers,

I have a question on the algorythm . what kind of lossless techique (ie entropy, dictionary type) is fzip?

Thanks in advance!
Best regards,
EP

[Question] Is this repo alive?

I would like to refactor the cmake scripts to make this project more modern in general and more suitable for Conan packaging. Will you accept a PR?

fpzip no longer supported in python3.11

fpzip is not supported by Python3.11

Bounds Checking on fpzip buffer Functions

Hi Dr. Lindstrom,

I was working on integrating a WASM fpzip decoder into a web viewer (Neuroglancer). However, the maintainer felt it would be best if fpzip performed bounds checking on arrays for the functions. It can cause buffer overflow issues if the number of bytes is not provided.

fpzip_read_from_buffer(const void* buffer, const size_t num_bytes)
fpzip_read(FPZ* fpz, void* data, const size_t num_bytes)

There may be others, but those are the big ones. It seems like this would be backwards incompatible without adding new functions, so perhaps fpzip_read_from_buffer2 and fpzip_read2 would be better?

Thanks so much for your work. Let me know if you need any updates to https://github.com/seung-lab/fpzip . I could potentially help a PR for this as well.

Will

portability: $^ in makefiles

I am using BSD Make to build fpzip. It does not define $^ macro which is
not in POSIX [1].

libpzip builds without error but it does not include any object files
because $^ expands to an empty string.

ar rc ../lib/libfpzip.a

It gives me very hard to trace errors if I link with the libray.

I think Makefiles (https://github.com/LLNL/fpzip/blob/master/src/Makefile#L14) can be portable if one does

$(LIBDIR)/libfpzip.a: $(OBJECTS)
       rm -f $@
       ar rc $@ $(OBJECTS)

instead of

$(LIBDIR)/libfpzip.a: $(OBJECTS)
       rm -f $@
       ar rc $@ $^

[1] https://pubs.opengroup.org/onlinepubs/009695399/utilities/make.html

Endianness documentation and support

I don't see any mention of endianness in the docs or fpzip -h, and I'm trying to understand if it supports either endianness and if so how to toggle it. I tried it on both little-endian and big-endian versions of my data and got a much better compression ratio for big-endian data, so perhaps it only supports big-endian? Curious to understand better.