Git Product home page Git Product logo

zstd-rs's Introduction

zstd

crates.io MIT licensed

Build on Linux Build on Windows Build on macOS Build on wasm

This library is a rust binding for the zstd compression library.

1 - Add to cargo.toml

$ cargo add zstd
# Cargo.toml

[dependencies]
zstd = "0.13"

2 - Usage

This library provides Read and Write wrappers to handle (de)compression, along with convenience functions to made common tasks easier.

For instance, stream::copy_encode and stream::copy_decode are easy-to-use wrappers around std::io::copy. Check the stream example:

use std::io;

// This function use the convenient `copy_encode` method
fn compress(level: i32) {
    zstd::stream::copy_encode(io::stdin(), io::stdout(), level).unwrap();
}

// This function does the same thing, directly using an `Encoder`:
fn compress_manually(level: i32) {
    let mut encoder = zstd::stream::Encoder::new(io::stdout(), level).unwrap();
    io::copy(&mut io::stdin(), &mut encoder).unwrap();
    encoder.finish().unwrap();
}

fn decompress() {
    zstd::stream::copy_decode(io::stdin(), io::stdout()).unwrap();
}

Asynchronous support

The async-compression crate provides an async-ready integration of various compression algorithms, including zstd-rs.

Compile it yourself

zstd is included as a submodule. To get everything during your clone, use:

git clone https://github.com/gyscos/zstd-rs --recursive

Or, if you cloned it without the --recursive flag, call this from inside the repository:

git submodule update --init

Then, running cargo build should take care of building the C library and linking to it.

Build-time bindgen

This library includes a pre-generated bindings.rs file. You can also generate new bindings at build-time, using the bindgen feature:

cargo build --features bindgen

TODO

  • Benchmarks, optimizations, ...

Disclaimer

This implementation is largely inspired by bozaro's lz4-rs.

License

  • The zstd C library is under a dual BSD/GPLv2 license.
  • This zstd-rs binding library is under a MIT license.

zstd-rs's People

Contributors

alexargoai avatar atouchet avatar benesch avatar busyjay avatar cgbur avatar dependabot[bot] avatar farnz avatar fauxfaux avatar figsoda avatar fitzgen avatar gyscos avatar heinrich5991 avatar jake-shadle avatar jakubonderka avatar jbms avatar khuey avatar kylebarron avatar marijns95 avatar martinvonz avatar nazar-pc avatar nobodyxu avatar olivierlemasle avatar phiresky avatar sfbdragon avatar stlankes avatar sunshowers avatar syncom avatar vivekpanyam avatar vlad-shcherbina avatar vladima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

zstd-rs's Issues

Encoder::finish prevents use in struct

I am using an Encoder within a struct, to which I only have a mut reference. According to the docs, I need to call Encoder::finish before dropping the stream, but finish consumes self. Since Rust cannot move out of a borrowed reference, it cannot call finish().

Creating an AutoFinishEncoder is also unsatisfactory, because errors from the finish method can no longer be handled.

The finish method should take only a reference to self, and return Result<()> similar to flush().

This is, of course, assuming finish() does something more than just calling flush() at the end, which is my reading of the documentation: "You need to call this after writing your stuff."

zstd v1.4.1

Hi,

zstd v1.4.1 was released and it would be nice to update also zstd-rs, because it provides some performance improvements and few bug fixes.

Push-`Decoder`?

Currently Decoder is wrapping a reader, which is an interface that I would call "pull" (Decoder is pulling-in the data it needs). I have 3 other compression libraries in rdedup already and they are all wrapping writers on both compression and decompression ("push" - since the user is pushing the input data in). I'm adding zstd, and taking a reader instead of the writer is much less convenient, since it reverses the control.

Fix broken `Encoder::with_prepared_dictionary`

Follow-up from #54

The current method imposes no lifetime constraints; but zstd specifies that the dictionary must outlive the context in this situation (since the context merely references the dictionary without copying it).

We will need to:

  • Update zstd-safe to add a lifetime to CStream.
  • In the zstd crate, update the Encoder to include something like an Option<Arc<EncoderDictionary>> (so it can be shared) and have the context borrow from it. This means Encoder will be a kind of self-referential struct, and we'll likely need something like owning_ref: a OwningHandle<Option<Arc<EncoderDictionary>>, zstd_safe::CStream>.

Cargo build fails with os_error 1

Contents of Cargo.toml

zstd = "0.4"


$cargo build
Downloaded zstd-sys v1.4.13+zstd.1.4.3
error: Operation not permitted (os error 1)

System info:

Ubuntu 18.04.3 LTS

Cargo version

 cargo 1.37.0 (9edd08916 2019-08-02)

Rust version

 rustc 1.37.0 (eae3437df 2019-08-13)

confused by performance gap relative to zstd-safe

I'm encountering roughly 10x worse performance (speed) with zstd::block::Compressor::compress_to_buffer relative to zstd_safe::compress_using_cdict or zstd_safe::compress_cctx, and 4-5x worse decompression performance (speed):

test sender::tests::zstd_block_compressor_nodict_compress_to_buffer_3049_byte_json        ... bench:      14,652 ns/iter (+/- 204)
test sender::tests::zstd_block_compressor_with_dict_compress_to_buffer_3049_byte_json     ... bench:     119,456 ns/iter (+/- 730)
test sender::tests::zstd_block_decompressor_with_dict_decompress_to_buffer_3049_byte_json ... bench:      21,099 ns/iter (+/- 493)
test sender::tests::zstd_safe_compress_cctx_3049_byte_json                                ... bench:      14,732 ns/iter (+/- 85)
test sender::tests::zstd_safe_compress_using_cdict_3049_byte_json                         ... bench:       9,998 ns/iter (+/- 73)
test sender::tests::zstd_safe_decompress_dctx_3049_byte_json                              ... bench:       5,969 ns/iter (+/- 37)
test sender::tests::zstd_safe_decompress_using_ddict_3049_byte_json                       ... bench:       4,188 ns/iter (+/- 42)
test sender::tests::zstd_stream_encoder_with_dictionary_3049_byte_json                    ... bench:     306,225 ns/iter (+/- 1,511)

I noticed that zstd::block::Compressor is using zstd_safe::compress_with_dict, which accepts a byte slice of the dictionary instead of zstd_safe::compress_with_cdict, which accepts a "prepared" dictionary - perhaps that could be the issue? It's likely only significant for small data, but zstd is one of the best options for small data with the dictionary functionality, so it seems preferable to "prepare" the dictionary once, up front instead of every compression.

Invert arguments order for `*_to_buffer` functions

The following methods take the destination as first argument, and the source as second argument:

  • stream::decode_to_buffer
  • stream::encode_to_buffer
  • block::compress_to_buffer
  • block::decompress_to_buffer

While this is a common C convention (this is what zstd itself uses), it is not very rustic: the io::copy method is a good example of that.
Therefore, it would make sense to stick to the rust convention, and take the source argument before the destination.

ZSTD depends on partial-io but does not have those features

When trying to add zstd to cargo with the tokio feature, every version after 0.4.9 fails with teh following error:

error: failed to select a version for `zstd`.
    ... required by package `bacon v0.1.0 (/Users/nemo/rust/bacon)`
versions that meet the requirements `= 0.4.23` are: 0.4.23+zstd.1.4.0

the package `bacon` depends on `zstd`, with features: `partial-io` but `zstd` does not have these features.


failed to select a version for `zstd` which could resolve this conflict
[dependencies]
zstd = { version = "0.4.23+zstd.1.4.0", features = [ "tokio" ] }

Support for --no-dictID

Currently, I believe, there's no way to:

  • specify the dictionary ID for zstd::Encoder::with_dictionary.
  • omit the dictionary ID from the stream, as the zstd --no-dictID command-line option does.

I'm after the second; as currently I get an extra null byte in my output when using dictionaries; the messages are actually bigger than without a dictionary.

Please add them!

Version 0.5 on crates.io fails on Mac OS

With an empty project with

[package]
name = "testlib"
version = "0.1.0"
authors = ["Ashley <[email protected]>"]
edition = "2018"

# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html

[dependencies]
zstd = "0.5"

I get the following compilation failure:

   Compiling libc v0.2.66
   Compiling log v0.4.8
   Compiling cfg-if v0.1.10
   Compiling glob v0.3.0
   Compiling jobserver v0.1.17
   Compiling num_cpus v1.11.1
   Compiling cc v1.0.48
   Compiling zstd-sys v1.4.15+zstd.1.4.4
error: failed to run custom build command for `zstd-sys v1.4.15+zstd.1.4.4`

Caused by:
  process didn't exit successfully: `/Users/ashley/testlib/target/debug/build/zstd-sys-e30d9343afb06d13/build-script-build` (exit code: 1)
--- stdout
TARGET = Some("x86_64-apple-darwin")
HOST = Some("x86_64-apple-darwin")
CC_x86_64-apple-darwin = None
CC_x86_64_apple_darwin = None
HOST_CC = None
CC = None
CFLAGS_x86_64-apple-darwin = None
CFLAGS_x86_64_apple_darwin = None
HOST_CFLAGS = None
CFLAGS = None
CRATE_CC_NO_DEFAULTS = None
DEBUG = Some("true")
CARGO_CFG_TARGET_FEATURE = Some("fxsr,sse,sse2,sse3,ssse3")
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/fse_decompress.o" "-c" "zstd/lib/common/fse_decompress.c"
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/debug.o" "-c" "zstd/lib/common/debug.c"
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/error_private.o" "-c" "zstd/lib/common/error_private.c"
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/entropy_common.o" "-c" "zstd/lib/common/entropy_common.c"
exit code: 0
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/pool.o" "-c" "zstd/lib/common/pool.c"
exit code: 0
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/threading.o" "-c" "zstd/lib/common/threading.c"
exit code: 0
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/xxhash.o" "-c" "zstd/lib/common/xxhash.c"
cargo:warning=In file included from zstd/lib/common/pool.c:15:
cargo:warning=zstd/lib/common/zstd_internal.h:285:42: error: unknown type name 'ZSTD_CCtx'
cargo:warning=const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx);   /* compress & dictBuilder */
cargo:warning=                                         ^
cargo:warning=zstd/lib/common/zstd_internal.h:289:32: error: unknown type name 'ZSTD_customMem'
cargo:warning=void* ZSTD_malloc(size_t size, ZSTD_customMem customMem);
cargo:warning=                               ^
cargo:warning=zstd/lib/common/zstd_internal.h:290:32: error: unknown type name 'ZSTD_customMem'
cargo:warning=void* ZSTD_calloc(size_t size, ZSTD_customMem customMem);
cargo:warning=                               ^
cargo:warning=zstd/lib/common/zstd_internal.h:291:27: error: unknown type name 'ZSTD_customMem'
cargo:warning=void ZSTD_free(void* ptr, ZSTD_customMem customMem);
cargo:warning=                          ^
cargo:warning=zstd/lib/common/zstd_internal.h:324:30: error: unknown type name 'ZSTD_CCtx'
cargo:warning=void ZSTD_invalidateRepCodes(ZSTD_CCtx* cctx);   /* zstdmt, adaptive_compression (shouldn't get this definition from here) */
cargo:warning=                             ^
cargo:warning=zstd/lib/common/zstd_internal.h:342:30: error: unknown type name 'ZSTD_DCtx'
cargo:warning=size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
cargo:warning=                             ^
cargo:warning=In file included from zstd/lib/common/pool.c:16:
cargo:warning=zstd/lib/common/pool.h:34:32: error: unknown type name 'ZSTD_customMem'
cargo:warning=                               ZSTD_customMem customMem);
cargo:warning=                               ^
cargo:warning=zstd/lib/common/pool.c:307:56: error: use of undeclared identifier 'ZSTD_defaultCMem'
cargo:warning=    return POOL_create_advanced(numThreads, queueSize, ZSTD_defaultCMem);
cargo:warning=                                                       ^
cargo:warning=zstd/lib/common/pool.c:310:69: error: unknown type name 'ZSTD_customMem'
cargo:warning=POOL_ctx* POOL_create_advanced(size_t numThreads, size_t queueSize, ZSTD_customMem customMem) {
cargo:warning=                                                                    ^
cargo:warning=9 errors generated.
exit code: 1
running: "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/zstd_common.o" "-c" "zstd/lib/common/zstd_common.c"
exit code: 0
cargo:warning=In file included from zstd/lib/common/zstd_common.c:19:
cargo:warning=zstd/lib/common/zstd_internal.h:285:42: error: unknown type name 'ZSTD_CCtx'
cargo:warning=const seqStore_t* ZSTD_getSeqStore(const ZSTD_CCtx* ctx);   /* compress & dictBuilder */
cargo:warning=                                         ^
cargo:warning=zstd/lib/common/zstd_internal.h:289:32: error: unknown type name 'ZSTD_customMem'
cargo:warning=void* ZSTD_malloc(size_t size, ZSTD_customMem customMem);
cargo:warning=                               ^
cargo:warning=zstd/lib/common/zstd_internal.h:290:32: error: unknown type name 'ZSTD_customMem'
cargo:warning=void* ZSTD_calloc(size_t size, ZSTD_customMem customMem);
cargo:warning=                               ^
cargo:warning=zstd/lib/common/zstd_internal.h:291:27: error: unknown type name 'ZSTD_customMem'
cargo:warning=void ZSTD_free(void* ptr, ZSTD_customMem customMem);
cargo:warning=                          ^
cargo:warning=zstd/lib/common/zstd_internal.h:324:30: error: unknown type name 'ZSTD_CCtx'
cargo:warning=void ZSTD_invalidateRepCodes(ZSTD_CCtx* cctx);   /* zstdmt, adaptive_compression (shouldn't get this definition from here) */
cargo:warning=                             ^
cargo:warning=zstd/lib/common/zstd_internal.h:342:30: error: unknown type name 'ZSTD_DCtx'
cargo:warning=size_t ZSTD_decodeSeqHeaders(ZSTD_DCtx* dctx, int* nbSeqPtr,
cargo:warning=                             ^
cargo:warning=zstd/lib/common/zstd_common.c:25:44: error: use of undeclared identifier 'ZSTD_VERSION_NUMBER'
cargo:warning=unsigned ZSTD_versionNumber(void) { return ZSTD_VERSION_NUMBER; }
cargo:warning=                                           ^
cargo:warning=zstd/lib/common/zstd_common.c:27:47: error: use of undeclared identifier 'ZSTD_VERSION_STRING'
cargo:warning=const char* ZSTD_versionString(void) { return ZSTD_VERSION_STRING; }
cargo:warning=                                              ^
cargo:warning=zstd/lib/common/zstd_common.c:56:32: error: unknown type name 'ZSTD_customMem'
cargo:warning=void* ZSTD_malloc(size_t size, ZSTD_customMem customMem)
cargo:warning=                               ^
cargo:warning=zstd/lib/common/zstd_common.c:63:32: error: unknown type name 'ZSTD_customMem'
cargo:warning=void* ZSTD_calloc(size_t size, ZSTD_customMem customMem)
cargo:warning=                               ^
cargo:warning=zstd/lib/common/zstd_common.c:75:27: error: unknown type name 'ZSTD_customMem'
cargo:warning=void ZSTD_free(void* ptr, ZSTD_customMem customMem)
cargo:warning=                          ^
cargo:warning=11 errors generated.
exit code: 1
exit code: 0
exit code: 0

--- stderr


error occurred: Command "cc" "-O3" "-ffunction-sections" "-fdata-sections" "-fPIC" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "/Users/ashley/testlib/target/debug/build/zstd-sys-86e1fb70d2837faf/out/zstd/lib/common/pool.o" "-c" "zstd/lib/common/pool.c" with args "cc" did not execute successfully (status code exit code: 1).

It compiles fine with zstd = { git = "https://github.com/gyscos/zstd-rs" }, so perhaps this is a problem with the version on crates.io?

Duplicate symbols error

I'm not a rust expert, so forgive my terminology. This crate should hide all libzstd symbols in the rlib. Otherwise, when using the zstd-sys module, and linking with other C/C++ code that also uses zstd, you will get duplicate symbol warnings.

You can get zstd to hide all of its symbols with these compiler flags:

-DZSTDLIB_VISIBILITY=
-DZDICTLIB_VISIBILITY=
-DZSTDERRORLIB_VISIBILITY=
-fvisibility=hidden

See the example in python-zstandard.

Add support for checksum?

Currently the produced zstd data does not include a checksum (as examined by zstd -l). Are there any plans for adding a way to configure the Encoder with additional options?

LICENSE of zstd-safe and zstd-sys

the pre-generated bindings shipped in zstd-sys, as well as API-describing comments in zstd-safe are (partly) copied verbatim from libzstd's zstd.h, without attribution.

zstd.h contains the following:

/*
 * Copyright (c) 2016-present, Yann Collet, Facebook, Inc.
 * All rights reserved.
 *
 * This source code is licensed under both the BSD-style license (found in the
 * LICENSE file in the root directory of this source tree) and the GPLv2 (found
 * in the COPYING file in the root directory of this source tree).
 * You may select, at your option, one of the above-listed licenses.
 */

which means you need to include proper attribution under the BSD-3-clause license in these two crates. for zstd-sys at least it probably makes sense to switch the whole crate over to BSD-3-clause, since it's almost entirely code directly generated from zstd.h.

End-of-frame not properlly recognized

End-of-frame is currently assumed to be aligned with underlying stream termination. This might not always be true, e.g when some other information follow the compressed frame.

The following example

use std::io::{Read, Write};

let mut enc = encoder::Encoder::new(Vec::new(), 1).unwrap();
enc.write_all(b"foo").unwrap();
let mut compressed = enc.finish().unwrap();

// Add footer/whatever to underlying storage.
compressed.push(0);

// Drain zstd stream until end-of-frame.
let mut dec = decoder::Decoder::new(&compressed[..]).unwrap();
let mut buf = Vec::new();
dec.read_to_end(&mut buf).unwrap();
assert_eq!(&buf, b"foo");

will return an instance of Error { repr: Custom(Custom { kind: Other, error: StringError("Context should be init first") }) }.

Add dictionary tests

Also test compatibility with the command-line zstd tool (using a shared dictionary).

Add benchmark

I guess this needs data, but it'll have to be excluded from the crate generation.

How to reuse context for stream decompression?

The stream Decoder can only decompress one stream. For each independent stream you decompress, you have to create a new zstd context.

The block API gives you a Decompressor that can decompress many blocks, but you have to know an upper bound on the decompressed data size in advance and pass it as capacity.

I guess one can use Decompressor and store the uncompressed size at the start of the block. But it's not ideal. I think Decoder needs a way to decode multiple zstd streams while reusing the context.

zstd-sys allows a vulnerable version of memoffset to be included in the build

> cargo generate-lockfile -Z minimal-versions && cargo audit
    Updating crates.io index
    Fetching advisory database from `https://github.com/RustSec/advisory-db.git`
      Loaded 59 security advisories (from /home/wim/.cargo/advisory-db)
    Scanning Cargo.lock for vulnerabilities (86 crate dependencies)
error: Vulnerable crates found!

ID:       RUSTSEC-2019-0011
Crate:    memoffset
Version:  0.2.0
Date:     2019-07-16
URL:      https://rustsec.org/advisories/RUSTSEC-2019-0011
Title:    Flaw in offset_of and span_of causes SIGILL, drops uninitialized memory of arbitrary type on panic in client code
Solution:  upgrade to >= 0.5.0
Dependency tree:
memoffset 0.2.0
└── crossbeam-epoch 0.3.0
    └── crossbeam-deque 0.2.0
        └── rayon-core 1.4.0
            └── rayon 1.0.0
                └── cc 1.0.28
                    ├── zstd-sys 1.4.13+zstd.1.4.3
                    │   └── zstd-safe 1.4.13+zstd.1.4.3
                    │       └── zstd 0.4.28+zstd.1.4.3
                    └── libloading 0.5.0
                        └── clang-sys 0.28.0
                            └── bindgen 0.51.0
                                └── zstd-sys 1.4.13+zstd.1.4.3

Upgrading the cc dependency to 1.0.45 drops rayon as a dependency and avoids this issue.

Add async integration

Work started in the async branch.

TODO:

  • Add AsyncWrite to stream::Encoder
    • Add tests
  • Add AsyncRead to stream::Decoder
    • Add tests

Add inter-compatibility test

Test that:

  • Data compressed with zstd cli tool can be decompressed with zstd-rs
    • This just needs some pre-compressed files during the tests, like those currently present in assets.
  • Data compressed with zstd-rs can be decompressed with zstd cli tool
    • This needs to call the zstd command during test (which may not be present on travis), so may not be easy for now...
    • We'd need to compile the zstd binary (we currently only compile the library) from the submodule, and then use it.

Do it both for stream and block, and with/without dictionary.

support for async-std

I've seen async operation is supported as an optional feature for tokio in the source code, any plan or example for async-std users?

"Error (generic)" when training dict using any method

On Mac OS X, rustc 1.31.1, using zstd 0.4.21+zstd.1.3.7 and zstd-safe 1.4.6+zstd.1.3.7, I get a generic error when trying to create a dict from samples, e.g.:

use zstd;

fn main() {
    let _dict = zstd::dict::from_samples(&["foo"], 10000).unwrap();
}

results in:

thread 'main' panicked at 'called `Result::unwrap()` on an `Err` value: Custom { kind: Other, error: StringError("Error (generic)") }', libcore/result.rs:1009:5

The same happens with from_files with a similar example.

Edit: same error with zstd v0.4.22+zstd.1.3.8 and zstd-sys v1.4.7+zstd.1.3.8.

Add libstd-less functions for embedded environments

May move current Read/Write wrappers under a (default) "libstd" feature, and add some (simpler) methods that don't use libstd.
Those methods may end up looking a lot like the C methods, just with safer parameters.

The lack of fast methods in the high-level wrapper

The high-level Rust wrapper is not zero-cost. It uses some nice abstractions but with a huge performance hit. Even though the hit might be reduced, I think the high-level wrapper needs a way to safely use Zstandard in a zero-cost way!

Here's what I got on my Opteron:

decode_all .. 20,434 ns/iter
ZSTD_decompress .. 428 ns/iter
ZSTD_decompressDCtx .. 74 ns/iter

Benchmark:

#[bench] fn zstd_high (bencher: &mut Bencher) {
  use zstd::stream::{encode_all, decode_all};
  let text = "Eeny, meeny, miny, moe";
  let compressed = encode_all ("Eeny, meeny, miny, moe".as_bytes(), 3) .unwrap();

  bencher.bytes = text.len() as u64;
  bencher.iter (|| {
    assert_eq! (decode_all (&compressed[..]) .unwrap(), text.as_bytes())});}

#[bench] fn zstd_direct (bencher: &mut Bencher) {
  use zstd::stream::encode_all;
  use zstd_sys;
  let text = "Eeny, meeny, miny, moe";
  let compressed = encode_all ("Eeny, meeny, miny, moe".as_bytes(), 3) .unwrap();

  let zctx = unsafe {zstd_sys::ZSTD_createDCtx()};

  bencher.bytes = text.len() as u64;
  bencher.iter (|| {
    const BUF_SIZE: usize = 2048;
    let mut buf: [u8; BUF_SIZE] = unsafe {uninitialized()};
    let rc = unsafe {zstd_sys::ZSTD_decompressDCtx (zctx, buf.as_mut_ptr() as *mut c_void, BUF_SIZE, compressed.as_ptr() as *const c_void, compressed.len())};
    assert_eq! (0, unsafe {zstd_sys::ZSTD_isError (rc)});
    assert_eq! (&buf[..rc], text.as_bytes())
  });

  unsafe {zstd_sys::ZSTD_freeDCtx (zctx);}}
running 2 tests
test by_test::zstd_direct                   ... bench:          74 ns/iter (+/- 14) = 297 MB/s
test by_test::zstd_high                     ... bench:      20,434 ns/iter (+/- 17,473) = 1 MB/s

Second run:

running 2 tests
test by_test::zstd_direct                   ... bench:          64 ns/iter (+/- 3) = 343 MB/s
test by_test::zstd_high                     ... bench:      10,521 ns/iter (+/- 931) = 2 MB/s

When using `single_frame`, don't read past frame end

In single_frame mode, stream::Decoder currently still reads big chunks from the inner Read, potentially reading past the end of the frame, and consuming some bytes from what lies beyond (maybe another frame?).

It is desirable in some situations to only read until the end of the frame. zstd conveniently gives us, as a hint, the number of bytes that it wants to read next.
One option is to never read more than zstd expects (at least with single_frame). That way, we'll never read past the end of the frame.

gcc compiler is not supported

Hi, I tried to compile zstd-rs in Docker container with latest rust and build-essential package, that's contains gcc compiler:

$ rustc -v --version      
rustc 1.30.1 (1433507eb 2018-11-07)
binary: rustc
commit-hash: 1433507eba7d1a114e4c6f27ae0e1a74f60f20de
commit-date: 2018-11-07
host: x86_64-unknown-linux-gnu
release: 1.30.1
LLVM version: 8.0
$ gcc --version
gcc (Ubuntu 7.3.0-27ubuntu1~18.04) 7.3.0

but with gcc it is not possible to compile this crate, because of this error:

thread 'main' panicked at 'Unable to find libclang: "couldn\'t find any valid shared libraries matching: [\'libclang.so\', \'libclang-*.so\', \'libclang.so.*\'], set the `LIBCLANG_PATH` environment variable to a path where one of these files can be found (invalid: [])"', libcore/result.rs:1009:5
note: Run with `RUST_BACKTRACE=1` for a backtrace.

so it looks like clang is required to compile this crate but at least there is no information that clang is required.

Decode concatenated frames

Multiples frames can be concatenated ; the decoder should be able to read those frames as one stream.

This should be an option, with two configurations:

  • Read until the end of the first frame (current behaviour)
  • Read until EOF, concatenating all frames.

Question is: which should be the default?

Implement BufRead trait for Decoder

Since Decoder already has a buffer then implementing BufRead using that buffer should be inexpensive and allow using the useful methods this trait provides.

Wrapping a Decoder in a BufReader adds another buffer which is redundant.

Move to bufferless API

zstd-rs currently using the zbuff API, which uses an internal buffer.
To be more versatile, we could switch to using the new zstd streaming API, which does not use an internal buffer.
The currently provided Read/Write structs could then be used directly, or behind a BufRead/BufWrite depending on the situation.

Have to benchmark it to check for performance losses, and check stability with various buffer sizes.
Handling the buffer here in zstd-rs is not planned.

Builds do fail on Windows on Travis

Just in case you did not notice: Your CI build is failing

https://travis-ci.org/gyscos/zstd-rs
e.g. https://travis-ci.org/gyscos/zstd-rs/jobs/634379540

error occurred: Command "gcc.exe" "-O3" "-ffunction-sections" "-fdata-sections" "-g" "-fno-omit-frame-pointer" "-m64" "-I" "zstd/lib/" "-I" "zstd/lib/common" "-I" "zstd/lib/legacy" "-fvisibility=hidden" "-DZSTD_LIB_DEPRECATED=0" "-DZSTDLIB_VISIBILITY=" "-DZDICTLIB_VISIBILITY=" "-DZSTDERRORLIB_VISIBILITY=" "-DZSTD_LEGACY_SUPPORT=1" "-o" "C:\Users\travis\build\gyscos\zstd-rs\target\debug\build\zstd-sys-df2ee4e12446f4cc\out\zstd\lib\compress\huf_compress.o" "-c" "zstd\lib\compress\huf_compress.c" with args "gcc.exe" did not execute successfully (status code exit code: 1).

Maybe related to #69 or #76? (/cc @amrx101 and @expenses)

Non-consuming version of finish() method?

Here's an interesting issue. I'm using zstd with tokio-io, which adds the notion of non-blocking writes signaled with io::Error(io::ErrorKind::WouldBlock). Typically the bottom-most layer will need to be aware of the non-blocking nature of writes. Intermediate layers like Encoder can just pass the error message up and trust that higher layers will handle it.

However, with consuming methods like finish(), there's a wrinkle:

  1. These methods cause writes to happen (a flush() before the finish() is insufficient because the finish() does a write of its own).
  2. They will consume the encoder even if the error is a WouldBlock error (where the caller needs to retry the finish() after some time).

I'm really not sure how to handle this use case. A non-consuming version of finish would solve this issue but makes the API uglier :/

I'm also going to file an issue against tokio-io to talk about it there, because this seems like a general API deficiency.

Fix block compressor API with dictionaries

Follow-up from #54

The current API for block compression using dictionaries is not ideal:

  • Impossible to share a dictionary (Compressor stores a Vec<u8>)
  • Impossible to use a prepared dictionary for higher performance (dictionary is re-hashed every time)

We want to:

  • Use shared ownership of the dictionary as much as possible
  • Keep a map of prepared dictionaries. When compressing a new block, use the prepared dictionary for the given compression level, or prepare a new one if needed. We'll keep the raw dictionary around in case we need to prepare more dictionaries, but the CDict itself copies the data internally and doesn't reference the raw data - this means no lifetime mess and no need for owning_ref here.
  • When not using a dictionary, don't prepare a digested dictionary and directly use compressCCtx - using an empty prepared dictionary is possible, but less efficient.

Parallel (de)compression

Most bindings I know support parallel (de)compression of Zstd blocks.
It'd be nice to have that API in these bindings as well.

Add dictionary-using free functions

All free functions currently don't use dictionaries:

  • stream::{encode_all, decode_all}
  • stream::{copy_encode, copy_decode}
  • block::{compress, decompress}
  • block::{compress_to_buffer, decompress_to_buffer}

I suppose we need add 8 duplicates to deal with the case when a dictionary is used...

Note: those function are just there for convenience; the same thing can always be done using one of the structs directly. So we don't have to provide every possible combination, just those that are more likely to be used frequently.

io::Write::flush corrupts internal buffer

Hello,

The following code snippet shows that calling flush on the Encoder corrupts its internal buffer.

extern crate zstd;

use std::io;
use std::io::Write;

fn main() {
    let buf = Vec::new();
    let mut z = zstd::Encoder::new(buf, 19).unwrap();

    z.write_all(b"hello").unwrap();

    z.flush().unwrap(); // Corrupts
    let buf = z.finish().unwrap();

    let s = zstd::decode_all(&buf[..]).unwrap();
    let s = ::std::str::from_utf8(&s).unwrap();
    assert_eq!(s, "hello");
}

Maxime

Unbundling zstd

I'm trying to unbundle zstd from zstd-sys to use the one provided by my distribution, but I don't find a clear way to do it. Is there anyway to change the build system to use the system-wide zstd library instead of a bundled one?

Need web-assembly support .

Hi,
It will be great help if you can guide to create a webassembly out of this library so as we can use it from browser.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.