Git Product home page Git Product logo

python-zstandard's Introduction

python-zstandard

This project provides Python bindings for interfacing with the Zstandard compression library. A C extension and CFFI interface are provided.

The primary goal of the project is to provide a rich interface to the underlying C API through a Pythonic interface while not sacrificing performance. This means exposing most of the features and flexibility of the C API while not sacrificing usability or safety that Python provides.

The canonical home for this project is https://github.com/indygreg/python-zstandard.

For usage documentation, see https://python-zstandard.readthedocs.org/.

python-zstandard's People

Contributors

bra-fsn avatar dbnicholson avatar eli-b avatar encukou avatar glandium avatar indygreg avatar jcristau avatar jonashaag avatar jugmac00 avatar klensy avatar kolanich avatar maxnoe avatar mgorny avatar mhils avatar ms2ger avatar odidev avatar pierref avatar rathann avatar rmax avatar spectral54 avatar talkaminker avatar terrelln avatar thedrow avatar theneuralbit avatar thewtex avatar thrasibule avatar tizianoperrucci avatar underyx avatar vstinner avatar yuja avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-zstandard's Issues

Support arbitrary buffer-like objects

The methods in this package (such as ZStdCompressor.compress() and ZStdDecompressor.decompress()) typically don't support bytearray or memoryview inputs. Though for some strange reason they seem to support Numpy arrays:

>>> c = zstd.ZstdCompressor(level=1, write_content_size=True)
>>> b = c.compress(b'12345678')
>>> b == c.compress(np.frombuffer(b'12345678', dtype='int8'))
True
>>> b == c.compress(memoryview(b'12345678'))
Traceback (most recent call last):
  File "<ipython-input-40-9ae8177a1d5c>", line 1, in <module>
    b == c.compress(memoryview(b'12345678'))
TypeError: compress() argument 1 must be read-only bytes-like object, not memoryview

>>> b == c.compress(bytearray(b'12345678'))
Traceback (most recent call last):
  File "<ipython-input-41-ddc68fad54ac>", line 1, in <module>
    b == c.compress(bytearray(b'12345678'))
TypeError: compress() argument 1 must be read-only bytes-like object, not bytearray

Note that other compression libraries typically accept arbitrary buffer-like objects (which allows passing them without copying to bytes before):

>>> zlib.compress(memoryview(b'12345678'))
b'x\x9c3426153\xb7\x00\x00\x07@\x01\xa5'
>>> bz2.compress(memoryview(b'12345678'))
b'BZh91AY&SY\xb6\x1c=\x04\x00\x00\x00\x08\x00?\xc0 \x001\x0c\x08\x19\x1ai\x935s\xf9E\xdc\x91N\x14$-\x87\x0fA\x00'
>>> lzma.compress(memoryview(b'12345678'))
b'\xfd7zXZ\x00\x00\x04\xe6\xd6\xb4F\x02\x00!\x01\x16\x00\x00\x00t/\xe5\xa3\x01\x00\x0712345678\x00\tx\xac+H\x80\x8b\\\x00\x01 \x08\xbb\x19\xd9\xbb\x1f\xb6\xf3}\x01\x00\x00\x00\x00\x04YZ'
>>> blosc.compress(memoryview(b'12345678'))
b'\x02\x01\x13\x08\x08\x00\x00\x00\x08\x00\x00\x00\x18\x00\x00\x0012345678'

Segmentation fault after catching exception

I encountered a segmentation fault after misusing the API in the interactive REPL. Below is a reproduction. This was found on macOS 10.14; the file empty.txt.zst was created by compressing an empty file using the zstd command line utility obtained via Homebrew. I suspect that this is a problem with the python bindings though, and probably not dependent on platform or archive contents. Happy to provide more info as necessary.

from zstandard import ZstdDecompressor
f = open('empty.txt.zst', 'rb')
decompressor = ZstdDecompressor()
df = decompressor.stream_reader(f)

try:
  with df:
    df.read() # TypeError: function missing required argument 'size' (pos 1).
except Exception as e:
  print(type(e), e)

with df:
  df.read(100) # Segmentation fault.

No longer compiles on FreeBSD 11

/usr/home/hg/buildslave/FreeBSD_hg_tests/build/contrib/python-zstandard/zstd.c:188:11: error: use of undeclared identifier 'CTL_HW'
        mib[0] = CTL_HW;
                 ^
/usr/home/hg/buildslave/FreeBSD_hg_tests/build/contrib/python-zstandard/zstd.c:189:11: error: use of undeclared identifier 'HW_NCPU'
        mib[1] = HW_NCPU;
                 ^
/usr/home/hg/buildslave/FreeBSD_hg_tests/build/contrib/python-zstandard/zstd.c:190:11: warning: implicit declaration of function 'sysctl' is invalid in C99 [-Wimplicit-function-declaration]
        if (0 != sysctl(mib, 2, &count, &len, NULL, 0)) {
                 ^
1 warning and 2 errors generated.
error: command 'cc' failed with exit status 1

Having trouble building python-zstandard for pypy2 on windows

Recently i was unsuccessfully trying to build module for pypy2-v6.0.0-win32 on windows 10 x64:

C:\Program Files (x86)\Microsoft Visual C++ Build Tools>c:\pypy2-v6.0.0-win32\pypy.exe -m pip install zstandard
Collecting zstandard
  Using cached https://files.pythonhosted.org/packages/29/21/4612a4b9e628d61aa045558ff008452378b5a333e5a64f7de29dee8b1e77/zstandard-0.10.1.tar.gz
Requirement already satisfied: cffi>=1.11 in c:\pypy2-v6.0.0-win32\lib_pypy (from zstandard) (1.11.5)
Installing collected packages: zstandard
  Running setup.py install for zstandard ... error
    Complete output from command c:\pypy2-v6.0.0-win32\pypy.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\admin\\appdata\\local\\temp\\pip-install-sjrjic\\zstandard\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\admin\appdata\local\temp\pip-record-2zhkwh\install-record.txt --single-version-externally-managed --compile:
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.

    tmp_ppnnx.h
    Microsoft (R) 32-bit C/C++ Optimizing Compiler Version 15.00.30729.01 for 80x86
    Copyright (C) Microsoft Corporation.  All rights reserved.

    tmpbltlc_.h
    generating build\_zstd_cffi.c
    (already up-to-date)
    not modified: 'build\\_zstd_cffi.c'
    running install
    running build
    running build_py
    creating build\lib.win32-2.7
    creating build\lib.win32-2.7\zstandard
    copying zstandard\__init__.py -> build\lib.win32-2.7\zstandard
    running build_ext
    building 'zstd' extension
    creating build\temp.win32-2.7
    creating build\temp.win32-2.7\Release
    creating build\temp.win32-2.7\Release\zstd
    creating build\temp.win32-2.7\Release\zstd\common
    creating build\temp.win32-2.7\Release\c-ext
    creating build\temp.win32-2.7\Release\zstd\compress
    creating build\temp.win32-2.7\Release\zstd\decompress
    creating build\temp.win32-2.7\Release\zstd\dictBuilder
    C:\Users\admin\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic-ext -Izstd\common -Izstd -Izstd\compress -Izstd\decompress -Izstd\dictBuilder -Ic:\pypy2-v6.0.0-win32\include /Tczstd\common\pool.c /Fobuild\temp.win32-2.7\Release\zstd\common\pool.obj -DZSTD_MULTITHREAD -DZSTDLIB_VISIBILITY= -DZDICTLIB_VISIBILITY= -DZSTDERRORLIB_VISIBILITY=
    pool.c
    C:\Users\admin\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic-ext -Izstd\common -Izstd -Izstd\compress -Izstd\decompress -Izstd\dictBuilder -Ic:\pypy2-v6.0.0-win32\include /Tczstd\common\threading.c /Fobuild\temp.win32-2.7\Release\zstd\common\threading.obj -DZSTD_MULTITHREAD -DZSTDLIB_VISIBILITY= -DZDICTLIB_VISIBILITY= -DZSTDERRORLIB_VISIBILITY=
    threading.c
    C:\Users\admin\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic-ext -Izstd\common -Izstd -Izstd\compress -Izstd\decompress -Izstd\dictBuilder -Ic:\pypy2-v6.0.0-win32\include /Tczstd.c /Fobuild\temp.win32-2.7\Release\zstd.obj -DZSTD_MULTITHREAD -DZSTDLIB_VISIBILITY= -DZDICTLIB_VISIBILITY= -DZSTDERRORLIB_VISIBILITY=
    zstd.c
    C:\Users\admin\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic-ext -Izstd\common -Izstd -Izstd\compress -Izstd\decompress -Izstd\dictBuilder -Ic:\pypy2-v6.0.0-win32\include /Tcc-ext\bufferutil.c /Fobuild\temp.win32-2.7\Release\c-ext\bufferutil.obj -DZSTD_MULTITHREAD -DZSTDLIB_VISIBILITY= -DZDICTLIB_VISIBILITY= -DZSTDERRORLIB_VISIBILITY=
    bufferutil.c
    C:\Users\admin\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic-ext -Izstd\common -Izstd -Izstd\compress -Izstd\decompress -Izstd\dictBuilder -Ic:\pypy2-v6.0.0-win32\include /Tcc-ext\compressiondict.c /Fobuild\temp.win32-2.7\Release\c-ext\compressiondict.obj -DZSTD_MULTITHREAD -DZSTDLIB_VISIBILITY= -DZDICTLIB_VISIBILITY= -DZSTDERRORLIB_VISIBILITY=
    compressiondict.c
    C:\Users\admin\AppData\Local\Programs\Common\Microsoft\Visual C++ for Python\9.0\VC\Bin\cl.exe /c /nologo /Ox /MD /W3 /GS- /DNDEBUG -Ic-ext -Izstd\common -Izstd -Izstd\compress -Izstd\decompress -Izstd\dictBuilder -Ic:\pypy2-v6.0.0-win32\include /Tcc-ext\compressobj.c /Fobuild\temp.win32-2.7\Release\c-ext\compressobj.obj -DZSTD_MULTITHREAD -DZSTDLIB_VISIBILITY= -DZDICTLIB_VISIBILITY= -DZSTDERRORLIB_VISIBILITY=
    compressobj.c
    c-ext\compressobj.c(62) : error C2065: 'ssize_t' : undeclared identifier
    c-ext\compressobj.c(62) : error C2146: syntax error : missing ')' before identifier 'input'
    c-ext\compressobj.c(62) : error C2059: syntax error : ')'
    c-ext\compressobj.c(62) : warning C4018: '<' : signed/unsigned mismatch
    c-ext\compressobj.c(62) : error C2143: syntax error : missing ';' before '{'
    c-ext\compressobj.c(62) : warning C4552: '<' : operator has no effect; expected operator with side-effect
    error: command 'C:\\Users\\admin\\AppData\\Local\\Programs\\Common\\Microsoft\\Visual C++ for Python\\9.0\\VC\\Bin\\cl.exe' failed with exit status 2

    ----------------------------------------
Command "c:\pypy2-v6.0.0-win32\pypy.exe -u -c "import setuptools, tokenize;__file__='c:\\users\\admin\\appdata\\local\\temp\\pip-install-sjrjic\\zstandard\\setup.py';f=getattr(tokenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();exec(compile(code, __file__, 'exec'))" install --record c:\users\admin\appdata\local\temp\pip-record-2zhkwh\install-record.txt --single-version-externally-managed --compile" failed with error code 1 in c:\users\admin\appdata\local\temp\pip-install-sjrjic\zstandard\

Please, help me to understand what`s went wrong and how to fix it.

Memory leak copy_stream

This is similar to #35 and #40. With zstandard 0.10.2 and Python 3.6.0.

def write_pkl(data, file):
    cctx = zstd.ZstdCompressor()
    stream = io.BytesIO()
    with open(file, 'wb') as f:
        pickle.dump(data, stream)
        stream.seek(0)
        cctx.copy_stream(stream, f)

The data input is a list with ~270000 tuples with Numpy arrays and single numbers. No bytes are written with such a large array but with smaller arrays it works fine. It was working for me with an array of ~200000 long.

My memory consumption goes from ~1.5 GB while creating the array to +35 GB when write_pkl is executed.

question on stream writer performance

I've been testing out ways of compressing json objects, and surprisingly (at least to me), connecting the streams is much slower than simply dumping an entire object:

In my example code, this takes 0.8s on pypy3, 0.9s on python3

with open(output_path, 'wb') as f:
    cctx = zstd.ZstdCompressor()
    with cctx.stream_writer(f) as compressor:
        compressor.write(json.dumps(my_map).encode())

And this takes 54s on pypy3, 6s on python3

with open(output_path, 'wb') as f:
    cctx = zstd.ZstdCompressor()
    with cctx.stream_writer(f) as compressor:
        compressor2 = codecs.getwriter('utf-8')(compressor)
        json.dump(my_map, compressor2)

Is this expected? Alternatively, is there a better way for streaming to the compressor without putting the whole object in memory?

decompressobj does not support flush()

decompressobj which is provided as an API-compatible interface with zlib.decompressobj does not support flush().
This makes it difficult to be used alongside zlib decompressor.

Saving and loading the training dict_data

After we train on a sample, we want to be able to save the dict_data to a binary file so we can load and re-use it across different applications. It seems like the ZstdCompressionDict object is not pickle-able or serializable. We don't want to have to re-train on a sample every time we want to use zstd.

My current work-around is to use the zstd command line tool to generate the dictionary binary, and then load it into python. The issue with that is the command line program has its own limitations, such as this: facebook/zstd#373 (comment)

ZstdDecompressor stream_reader doesn't support readline()

The documentation for ZstdCompressor explicitly says that stream_reader's file-like object doesn't support readline() because it makes no sense for compressed data. For Decompressor, this remark is missing - as it should be - but the function doesn't seem to be implemented and throws NotImplementedError.

Decompression fails if the compressed content is an empty bytestring

>>> zstd.ZstdDecompressor().decompress(zstd.ZstdCompressor(write_content_size=True).compress(b''))
ZstdError                                 Traceback (most recent call last)
<ipython-input-11-89c4c9bb5b8c> in <module>()
----> 1 zstd.ZstdDecompressor().decompress(zstd.ZstdCompressor(write_content_size=True).compress(b''))

ZstdError: input data invalid or missing content size in frame header

The cause is that there is no differentiation between a lack of a frame header and a frame header indicating content size = 0.

>>> zstd.ZstdCompressor(write_content_size=True).compress(b'') == zstd.ZstdCompressor(write_content_size=False).compress(b'')

True

Obviously this is a corner case, but one of the fundamental properties of a compression library should be that decompress(compress(x)) == x for all x.

Files with multiple frames have frames skipped?

I've ran into a use case for files that with multiple frames. It looks like only the first frame is read.
The zstd command line tool has no problems decompressing the file, but when I use a read_to_iter or ZstdDecompressor.decompress call only the first frame is returned.

I've uploaded an example here where the two_frame.zst file has one frame per line.

class ZstdDecompressor not seen

Hi,

I am not able to see ZstdDecompressor class in the code. Shouldn't it be as part of zstd_cffi.py ?
Or am I missing something?

thanks
Prashanth

Linux and macOS wheels

Thanks for putting this package together!

Before depending on it, I would like for there to be Linux and macOS binary wheels on PyPI.

Would a PR be accepted that improves the build system to generate Linux and macOS wheels with scikit-build?

CC: @jcfr

copy_stream drops file suffix

When using copy_stream to compress an on-disk file, the resulting archive is properly compressed and byte-equivalent to the original, but when using a different application ro decompress it, like 7z with zst, the file is missing the suffix. Is that intentional, and if so, how could I compress a "txt" file (for example) to zst using zstandard yet maintaining the filename suffix in the archive?

Absolute path error in Python 3.7

While trying to install version 0.9.1using pip I get the following error:

    reading manifest file 'zstandard.egg-info\SOURCES.txt'
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "C:\R\temp\pip-install-aoqr5ae0\zstandard\setup.py", line 96, in <mod
ule>
        install_requires=install_requires,
      File "c:\program files\python37\lib\site-packages\setuptools\__init__.py",
 line 131, in setup
        return distutils.core.setup(**attrs)
      File "c:\program files\python37\lib\distutils\core.py", line 148, in setup

        dist.run_commands()
      File "c:\program files\python37\lib\distutils\dist.py", line 966, in run_c
ommands
        self.run_command(cmd)
      File "c:\program files\python37\lib\distutils\dist.py", line 985, in run_c
ommand
        cmd_obj.run()
      File "c:\program files\python37\lib\site-packages\setuptools\command\insta
ll.py", line 61, in run
        return orig.install.run(self)
      File "c:\program files\python37\lib\distutils\command\install.py", line 55
7, in run
        self.run_command(cmd_name)
      File "c:\program files\python37\lib\distutils\cmd.py", line 313, in run_co
mmand
        self.distribution.run_command(command)
      File "c:\program files\python37\lib\distutils\dist.py", line 985, in run_c
ommand
        cmd_obj.run()
      File "c:\program files\python37\lib\site-packages\setuptools\command\insta
ll_egg_info.py", line 34, in run
        self.run_command('egg_info')
      File "c:\program files\python37\lib\distutils\cmd.py", line 313, in run_co
mmand
        self.distribution.run_command(command)
      File "c:\program files\python37\lib\distutils\dist.py", line 985, in run_c
ommand
        cmd_obj.run()
      File "c:\program files\python37\lib\site-packages\setuptools\command\egg_i
nfo.py", line 278, in run
        self.find_sources()
      File "c:\program files\python37\lib\site-packages\setuptools\command\egg_i
nfo.py", line 293, in find_sources
        mm.run()
      File "c:\program files\python37\lib\site-packages\setuptools\command\egg_i
nfo.py", line 524, in run
        self.add_defaults()
      File "c:\program files\python37\lib\site-packages\setuptools\command\egg_i
nfo.py", line 567, in add_defaults
        self.read_manifest()
      File "c:\program files\python37\lib\site-packages\setuptools\command\sdist
.py", line 199, in read_manifest
        self.filelist.append(line)
      File "c:\program files\python37\lib\site-packages\setuptools\command\egg_i
nfo.py", line 466, in append
        path = convert_path(item)
      File "c:\program files\python37\lib\distutils\util.py", line 110, in conve
rt_path
        raise ValueError("path '%s' cannot be absolute" % pathname)
    ValueError: path '/home/travis/build/indygreg/python-zstandard/zstd.c' canno
t be absolute

    ----------------------------------------
Command ""c:\program files\python37\python.exe" -u -c "import setuptools, tokeni
ze;__file__='C:\\R\\temp\\pip-install-aoqr5ae0\\zstandard\\setup.py';f=getattr(t
okenize, 'open', open)(__file__);code=f.read().replace('\r\n', '\n');f.close();e
xec(compile(code, __file__, 'exec'))" install --record C:\R\temp\pip-record-oala
s6un\install-record.txt --single-version-externally-managed --compile" failed wi
th error code 1 in C:\R\temp\pip-install-aoqr5ae0\zstandard\

Python 2.7.12 test errors with CFFI

There are 53 tests that error with python 2.7.12: https://gist.github.com/terrelln/c76ad7d6034f8f5bbf0e8575c5f11321

======================================================================
ERROR: test_write_size_cffi (tests.test_decompressor.TestDecompressor_write_to)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/terrelln/repos/python-zstandard/tests/test_decompressor.py", line 294, in test_write_size
    decompressor.write(c)
  File "/Users/terrelln/repos/python-zstandard/zstd_cffi.py", line 931, in write
    data_buffer = ffi.from_buffer(data)
TypeError: from_buffer() cannot return the address of the raw string within a str or unicode object

----------------------------------------------------------------------

Provide a file-like object to interface with TarFile

I've got a dataset of about 50.000 files. For easier and faster processing, these are inside one tar-archive and compressed with Zstandard. With a file-like object performing decompression, this would be really simple:

with zstd.open('data.tar.zstd', mode='r') as tar:
    with tarfile.open(fileobj=tar, mode='r') as archive:
        // do something with archive...

Command line interface to library

Hello,

Since zstandard are a bit new compression algorithm packages in Linux distro not always exists. But distro packages for pip is very common and can provide python-zstandard.

What do you think about provide a small command line interface to your library for working with files like zstd binary?

It can improve popularity of zstd due effort-less installation. See at python -mgzip -d aa.txt.gz in CPython too.

Greetings,

COMPRESSOBJ_FLUSH_BLOCK may not fully flush

I'm reasonably certain that using COMPRESSOBJ_FLUSH_BLOCK with the incremental compressor may result in partially flushed output. I'm pretty sure the comment and code around /* The output buffer is of size ZSTD_CStreamOutSize(), which is guaranteed to hold a full block. */ is wrong.

Is it possible to relax `with` usage requirements?

First of all, thank you for all of the work that went into publishing this package. I have been experimenting with it and would very much like to incorporate it into my work. However, I find that the requirement to only read from the decompressor inside of a with statement is somewhat inconvenient for use in cpython data processing scripts. I understand that it is a best practice to open a file within a with context. However it is not a requirement for correctly using other file handles and decompressor handles. As a result I have existing code for which Zstandard does not easily drop in as a replacement for gzip or lzma/xz.

I am curious if there is a strong technical requirement that the decompressor must be used in this fashion, or if it is merely the library enforcing the best practice. If the latter, please consider relaxing the requirement so that the decompressor handle can be read from and then closed with out the use of with. Thank you!

--system-zstd fails: error: unknown type name 'ZSTD_dictContentType_e'

Trying to build 0.9.0 with --system-zstd fails:

In file included from /usr/ports/archivers/py-zstandard/work-py27/zstandard-0.9.0/c-ext/decompressionreader.c:9:
/usr/ports/archivers/py-zstandard/work-py27/zstandard-0.9.0/c-ext/python-zstandard.h:86:2: error: unknown type name 'ZSTD_dictContentType_e'
        ZSTD_dictContentType_e dictType;
        ^

read() must be called from an active context manager

ENV

  1. system: linux
  2. python: 3.6.3
  3. zstandard: 0.9.0.dev0

simulation to io.RawIOBase

test.py

import io
import zstandard as zstd
class ZstdDecompressReader(io.RawIOBase):
    def __init__(self, fp):
        self._fp = fp
        self._eof = False
        self._pos = 0
        self._size = -1
        self._decompressor = zstd.ZstdDecompressor()
        self._stream_reader = None
    def close(self):
        self._decompressor = None
        if self._stream_reader:
            self._stream_reader.close()
            self._stream_reader = None
        return super().close()

    def read(self, size=-1):
        if size < 0:
            return self.readall()
        if not size or self._eof:
            return b""
        if not self._stream_reader:
            self._stream_reader = self._decompressor.stream_reader(self._fp)
        data = self._stream_reader.read(size)
        if not data:
            self._eof = True
            self._size = self._pos
            return b""
        self._pos += len(data)
        return data

if __name__ == '__main__':
    with open('1.tar.zstd', 'rb') as fp:
        decompress = ZstdDecompressReader(fp)
        print(decompress.read(10))

python test.py

Traceback (most recent call last):
  File "./test.py", line 36, in <module>
    print(decompress.read(10))
  File "./test.py", line 25, in read
    data = self._stream_reader.read(size)
zstd.ZstdError: read() must be called from an active context manager

can run

import io
import zstandard as zstd
class ZstdDecompressReader(io.RawIOBase):
    def __init__(self, fp):
        self._fp = fp
        self._eof = False
        self._pos = 0
        self._size = -1
        self._decompressor = zstd.ZstdDecompressor()
        self._stream_reader = None
    def close(self):
        self._decompressor = None
        if self._stream_reader:
            self._stream_reader.__exit__()
            # self._stream_reader.close()
            self._stream_reader = None
        return super().close()

    def read(self, size=-1):
        if size < 0:
            return self.readall()
        if not size or self._eof:
            return b""
        if not self._stream_reader:
            self._stream_reader = self._decompressor.stream_reader(self._fp)
            self._stream_reader.__enter__()
        data = self._stream_reader.read(size)
        if not data:
            self._eof = True
            self._size = self._pos
            return b""
        self._pos += len(data)
        return data

if __name__ == '__main__':
    with open('1.tar.zstd', 'rb') as fp:
        decompressor = zstd.ZstdDecompressor()
        decompress = ZstdDecompressReader(fp)
        print(decompress.read(10))

or

import io
import zstandard as zstd
class ZstdDecompressReader(io.RawIOBase):
    def __init__(self, decompressor, stream_reader):
        self._fp = fp
        self._eof = False
        self._pos = 0
        self._size = -1
        self._decompressor = decompressor
        self._stream_reader = stream_reader
    def close(self):
        self._decompressor = None
        if self._stream_reader:
            self._stream_reader.close()
            self._stream_reader = None
        return super().close()

    def read(self, size=-1):
        if size < 0:
            return self.readall()
        if not size or self._eof:
            return b""
        if not self._stream_reader:
            self._stream_reader = self._decompressor.stream_reader(self._fp)
        data = self._stream_reader.read(size)
        if not data:
            self._eof = True
            self._size = self._pos
            return b""
        self._pos += len(data)
        return data

if __name__ == '__main__':
    with open('1.tar.zstd', 'rb') as fp:
        decompressor = zstd.ZstdDecompressor()
        with decompressor.stream_reader(fp) as stream_reader:
            decompress = ZstdDecompressReader(decompressor, stream_reader)
            print(decompress.read(10))
            print(decompress.read(10))

suggest

So should it be not context manager

Linewise iteration over compressed input

Currently it's only possible to iterate over chunks:

dctx = zstd.ZstdDecompressor()
for chunk in dctx.read_from(fh):

How can I iterate line-by-line as it is possible with gzip.open()?

Build fails on FreeBSD with --system-zstd

Beginning from some recent update, the FreeBSD package build fails:

cc -DNDEBUG -O2 -pipe -fstack-protector -isystem /usr/local/include -fno-strict-aliasing -isystem /usr/local/include -fPIC -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/zstd/common -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext -I/usr/local/include/python3.6m -c /wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/decompressor.c -o build/temp.freebsd-11.1-RELEASE-p11-i386-3.6/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/decompressor.o -DZSTD_MULTITHREAD
cc -DNDEBUG -O2 -pipe -fstack-protector -isystem /usr/local/include -fno-strict-aliasing -isystem /usr/local/include -fPIC -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/zstd/common -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext -I/usr/local/include/python3.6m -c /wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/compressiondict.c -o build/temp.freebsd-11.1-RELEASE-p11-i386-3.6/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/compressiondict.o -DZSTD_MULTITHREAD
cc -DNDEBUG -O2 -pipe -fstack-protector -isystem /usr/local/include -fno-strict-aliasing -isystem /usr/local/include -fPIC -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/zstd/common -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext -I/usr/local/include/python3.6m -c /wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/decompressoriterator.c -o build/temp.freebsd-11.1-RELEASE-p11-i386-3.6/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/decompressoriterator.o -DZSTD_MULTITHREAD
cc -DNDEBUG -O2 -pipe -fstack-protector -isystem /usr/local/include -fno-strict-aliasing -isystem /usr/local/include -fPIC -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/zstd/common -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext -I/usr/local/include/python3.6m -c /wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/decompressionreader.c -o build/temp.freebsd-11.1-RELEASE-p11-i386-3.6/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/decompressionreader.o -DZSTD_MULTITHREAD
cc -DNDEBUG -O2 -pipe -fstack-protector -isystem /usr/local/include -fno-strict-aliasing -isystem /usr/local/include -fPIC -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/zstd/common -I/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext -I/usr/local/include/python3.6m -c /wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/constants.c -o build/temp.freebsd-11.1-RELEASE-p11-i386-3.6/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/constants.o -DZSTD_MULTITHREAD
/wrkdirs/usr/ports/archivers/py-zstandard/work-py36/zstandard-0.9.1/c-ext/constants.c:82:51: error: use of undeclared identifier 'ZSTD_TARGETLENGTH_MIN'
        PyModule_AddIntConstant(mod, "TARGETLENGTH_MIN", ZSTD_TARGETLENGTH_MIN);
                                                         ^
1 error generated.
not modified: 'build/_zstd_cffi.c'

Removing the zstd directory also causes it to fail, even though it shouldn't be needed with --system-zstd:

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "setup.py", line 50, in <module>
    import make_cffi
  File "make_cffi.py", line 168, in <module>
    preprocessed = preprocess(header)
  File "make_cffi.py", line 85, in preprocess
    with open(path, 'rb') as fh:
IOError: [Errno 2] No such file or directory: '/usr/ports/archivers/py-zstandard/work-py27/zstandard-0.9.1/zstd/zstd.h'

zstandard module doesn't have __version__

zstd.__version__ exists while zstandard.__version__ doesn't
for version 0.9.0 (both windows and linux)
probably some problem in the transition from zstd to zstandard.

cffi not working with pypy: ImportError: No module named zstd_cffi

We run our infrastructure using pypy and are trying to incorporate zstd into our pipeline.

Import doesn't work. It looks like the cffi object is created incorrectly

$virtualenv -p pypy2-v6.0.0-linux64/bin/pypy
pypy        pypy.debug
$virtualenv -p pypy2-v6.0.0-linux64/bin/pypy pypy6-env
Running virtualenv with interpreter pypy2-v6.0.0-linux64/bin/pypy
New pypy executable in /home/bkish/pypy6-env/bin/pypy
Installing setuptools, pip, wheel...done.
$source pypy6-env/bin/activate
$pip install zstandard
Collecting zstandard
  Using cached https://files.pythonhosted.org/packages/29/21/4612a4b9e628d61aa045558ff008452378b5a333e5a64f7de29dee8b1e77/zstandard-0.10.1.tar.gz
Requirement already satisfied: cffi>=1.11 in ./pypy2-v6.0.0-linux64/lib_pypy (from zstandard) (1.11.5)
Building wheels for collected packages: zstandard
  Running setup.py bdist_wheel for zstandard ... done
  Stored in directory: /home/bkish/.cache/pip/wheels/e6/f1/ac/4498937124aecdc0b12b9ac6de1984619429b02b476427761e
Successfully built zstandard
Installing collected packages: zstandard
Successfully installed zstandard-0.10.1
$python
Python 2.7.13 (ab0b9caf307d, Apr 24 2018, 18:04:42)
[PyPy 6.0.0 with GCC 6.2.0 20160901] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>> import zstandard
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/bkish/pypy6-env/site-packages/zstandard/__init__.py", line 38, in <module>
    from zstd_cffi import *
ImportError: No module named zstd_cffi
>>>>
$pip uninstall zstandard
Uninstalling zstandard-0.10.1:
  Would remove:
    /home/bkish/pypy6-env/site-packages/_zstd_cffi.pypy-41.so
    /home/bkish/pypy6-env/site-packages/zstandard-0.10.1.dist-info/*
    /home/bkish/pypy6-env/site-packages/zstandard/*
    /home/bkish/pypy6-env/site-packages/zstd.pypy-41.so
Proceed (y/n)? y
  Successfully uninstalled zstandard-0.10.1
$

python-zstandard incompatible with pypy

Greetings,

I ran into another issue when I started using python-zstandard with pypy. The module builds, but it fails at runtime because the bytes objects can't be resized. My workaround has been to change decompress2 to return a list of strings (iterate over -- decompression, PyBytes_FromStringAndSize on the output, PyList_Append with the result, and Py_DECREF the result) instead of growing a single string. I suspect this change makes things slower (not to mention incompatible), but it was worth it for me to get pypy working quickly.

0.5.0 tarball on PyPI is missing setup_zstd.py

The 0.5.0 tarball which was just uploaded to https://pypi.python.org/pypi/zstandard is missing the setup_zstd.py source file which is imported by setup.py. As a result, running pip install zstandard produces the following error:

Collecting zstandard
  Using cached zstandard-0.5.0.tar.gz
    Complete output from command python setup.py egg_info:
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "/tmp/pip-build-NfA1pg/zstandard/setup.py", line 15, in <module>
        import setup_zstd
    ImportError: No module named setup_zstd

    ----------------------------------------
Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-NfA1pg/zstandard/

Can't pip install from msys2

This is the output while pip install'ing mercurial from msys2:

  Running setup.py (path:c:/users/ta2291~1/appdata/local/temp/pip-build-_zfnyo/mercurial/setup.py) egg_info for package mercurial
    Running command python setup.py egg_info
    Traceback (most recent call last):
      File "<string>", line 1, in <module>
      File "c:/users/ta2291~1/appdata/local/temp/pip-build-_zfnyo/mercurial/setup.py", line 910, in <module>
        extmodules.append(setup_zstd.get_c_extension(name='mercurial.zstd'))
      File "contrib/python-zstandard/setup_zstd.py", line 128, in get_c_extension
        compiler.compiler_type)
    Exception: unhandled compiler type: mingw32

decompressobj inefficiency and work-around, patch included.

Greetings!

I'm working with a network protocol which is essentially a zstandard-compressed stream of newline-delimited lines of text. A Twisted protocol for receiving this data is given a slice of compressed data which can span frames. It's simple to feed chunks of data to decompressobj.decompress until it yields uncompressed data, but it's not clear how to determine how much trailing input data didn't contribute to the uncompressed output and should be processed with a new instance of decompressobj. The only solution I've found in terms of python-zstandard 0.8.1 has been to feed data to decompressobj.decompress one byte at a time and roll to a new decompressobj each time uncompressed data is produced. This is pretty slow. If I've missed something, I'd appreciate advice.

Meanwhile, I've privately replaced decompressobj.decompress with a new function decompressobj.decompress2 and re-implemented decompressobj.decompress as a trivial wrapper around this new function. The new function returns a tuple consisting of (1) the uncompressed result and (2) the value of input.pos before the final call to ZSTD_decompressStream. Using this interface, my protocol's dataReceived method looks like this:

    def dataReceived(self, bytes):
        decompressed, remaining = self.__dobj.decompress2(bytes)
        if len(decompressed) > 0:
            self.__json_receiver.dataReceived(decompressed)
            self.__dobj = self.__dctx.decompressobj()
            if remaining > 0:
                self.dataReceived(bytes[remaining:])

If I haven't missed something in the current python-zstandard API, would you consider a solution like mine for inclusion in the next release? I'm attaching the delta to decompressobj.c for reference.

Thanks!

decompressobj.c.diff.txt

TestCompressor_write_to sporadically failing

Build steps:

> git rev-parse HEAD
13b542c3d8c9c3edf84f3af3138e0091388ea353
> python3 --version
Python 3.5.2
> python3 setup.py test

Test output: https://gist.github.com/terrelln/432be84951e79993977892671bf5ce6d

======================================================================
FAIL: test_dictionary (tests.test_compressor.TestCompressor_write_to)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/terrelln/repos/python-zstandard/tests/test_compressor.py", line 510, in test_dictionary
    b'\x28\xb5\x2f\xfd\x03\x00\x55\x7b\x6b\x5e\x54\x00'
AssertionError: b'(\x[24 chars]\x00\x00\x02\xfc\xf4\xc5\xba#?\x05\xb3T\x00\x00\x18oof\x01\x00' != b'(\x[24 chars]\x00\x00\x02\xfc\xf4\xa5\xba#?\x85\xb3T\x00\x00\x18oof\x01\x00'

----------------------------------------------------------------------

Move away from ZSTDMT

The ZSTDMT API is superseded by the new advanced API and eventually all the functions will be made static. We will keep the ZSTDMT around for a few releases to give people time to update. The new advanced API is capable of multithreading, and supports the magicless format.

`ZstdDecompressionReader.closed` should be a property, not method.

IOBase defines closed as a property, and the builtin decompressors types for gz and lzma/xz follow this API. It would be great if Zstandard could match that; as it is, when you try to create a TextIOWrapper with a ZstdDecompressionReader, the read fails with "ValueError: I/O operation on closed file.", because the closed method evaluates to True.

ZstdDecompressor.stream_reader is not found anymore

I'm using zstandard==0.8.1.

ZstdCompressor.stream_reader method is not found but it's described in docs.
There is a read_from method which returns an iterator but it's not documented.

It will be nice to keep docs actual :)

Streaming sequentially through open socket not working

I am trying to send numpy arrays through an open socket:

server:

import zstandard as zstd
from numpy import *
import json_tricks as json
import socketserver as SocketServer
import sys

x = array([1,2,3,4,5,6,7,8,9])
y = array(['a','b','c','d','e','f','g','h','i','j'])
li = [x,y]

HOST, PORT = "localhost", 9999

class MyTCPSocketHandler(SocketServer.StreamRequestHandler):

    def handle(self):

        sock = self.wfile
        for i in range(2):
            val = li[i]
            enc = json.dumps(val).encode('utf-8')
            cctx = zstd.ZstdCompressor()
            with cctx.stream_writer(sock) as compressor:
                compressor.write(enc)
   
HOST, PORT = "localhost", 9999
server = SocketServer.TCPServer((HOST, PORT), MyTCPSocketHandler)
print('ok')
server.serve_forever()

client:

import socket, sys
import numpy
import zstandard as zstd

HOST, PORT = "localhost", 9999

sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

try:

        sock.connect((HOST,PORT))
        sock.setsockopt(socket.SOL_TCP, socket.TCP_NODELAY, 1)
    
        mf = sock.makefile()

        for i in range(2):
                data = ''
                dctx = zstd.ZstdDecompressor()
                for chunk in dctx.read_to_iter(mf.buffer, read_size=2):
                    data += chunk.decode('utf-8')

                print(data)


finally:
        pass

Any read_size specified on the client end that isn't 1 will cause an error:

zstd.ZstdError: zstd decompress error: Unknown frame descriptor. 

In this case, the first array is sent and received without problem, but the second (and any more, if there are more) will fail to transmit. In fact, when read_size=1 is specified, the size of the chunk is usually closer to 10kb. I want a chunk size of 1mb but it does not seem to work.

When it comes to sending a larger amount of data, a different error occurs.
Instead of receiving this error message the client will hang while it receives. After closing the connection on the server side (by terminating the program), then sometimes the client will suddenly receive all the data. Is there a flush() call being missed in the zstandard code?

If it helps, everything here works perfectly fine if I replace the zstandard library with lz4framed.

Add github link to PyPi page

The PyPi page for the package doesn't contain a link to the github project - would be really useful if it did.

With --system-zstd, build requires zstdmt_compress.h that isn't installed by zstd

https://github.com/facebook/zstd has gmake and cmake builds, but none of them installs zstdmt_compress.h .

In file included from /usr/ports/archivers/py-zstandard/work-py27/python-zstandard-0.8.1-103-gb3b44ff/c-ext/decompressor.c:9:
/usr/ports/archivers/py-zstandard/work-py27/python-zstandard-0.8.1-103-gb3b44ff/c-ext/python-zstandard.h:17:10: fatal error: 'zstdmt_compress.h' file not found
#include <zstdmt_compress.h>
         ^~~~~~~~~~~~~~~~~~~

You should either convince zstd to install the header, or work without it.

memory leak when decompressing with copy_stream on 0.8.1

env: linux
version: 0.8.1 (this is what's currently on pypi as of Jan 3 2018: https://pypi.python.org/pypi/zstandard)
python: 3.6.1

Here's a repro:

import os
import gc
import io
import zstd
import tempfile
import resource
import subprocess


def main():
    with tempfile.NamedTemporaryFile('wb') as compressed:
        uncompressed = os.urandom(1024)
        compressed.write(zstd.ZstdCompressor().compress(uncompressed))
        compressed.flush()
        
        print('using the zstd python bindings leaks')
        for i in range(10001):
            decompressed = io.BytesIO()
            with open(compressed.name, 'rb') as file:
                zstd.ZstdDecompressor().copy_stream(file, decompressed)
            decompressed.seek(0)
            result = decompressed.read()
            assert result == uncompressed
            del result, decompressed
            if i % 1000 == 0:
                print(i, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
                gc.collect()

        print('workaround is to launch zstd as subprocess and skip the python bindings :(')
        for i in range(10001):
            with subprocess.Popen(['zstd', '-dcq', compressed.name], stdout=subprocess.PIPE, stderr=subprocess.PIPE) as p:
                stdout, stderr = p.communicate()
                p.wait()
                assert p.returncode == 0
                assert stdout == uncompressed
                del stdout, stderr
                if i % 1000 == 0:
                    print(i, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
                    gc.collect()


if __name__ == '__main__':
    main()

Here's the output:

$ python3 main.py
using the zstd python bindings leaks
0 13000
1000 14020
2000 15340
3000 16396
4000 17452
5000 18508
6000 19564
7000 20884
8000 21940
9000 22996
10000 24052
workaround is to launch zstd as subprocess and skip the python bindings :(
0 24224
1000 24224
2000 24224
3000 24224
4000 24224
5000 24224
6000 24224
7000 24224
8000 24224
9000 24224
10000 24224

-Wshorten-64-to-32 has things to say when compiling

mercurial-4.1.2/contrib/python-zstandard/c-ext/compressiondict.c:48:30: warning:
      implicit conversion loses integer precision: 'unsigned long' to 'unsigned int' [-Wshorten-64-to-32]
                zparams.selectivityLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 0));
                                         ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mercurial-4.1.2/contrib/python-zstandard/c-ext/compressiondict.c:49:30: warning:
      implicit conversion loses integer precision: 'long' to 'int' [-Wshorten-64-to-32]
                zparams.compressionLevel = PyLong_AsLong(PyTuple_GetItem(parameters, 1));
                                         ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mercurial-4.1.2/contrib/python-zstandard/c-ext/compressiondict.c:50:31: warning:
      implicit conversion loses integer precision: 'unsigned long' to 'unsigned int' [-Wshorten-64-to-32]
                zparams.notificationLevel = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 2));
                                          ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
mercurial-4.1.2/contrib/python-zstandard/c-ext/compressiondict.c:51:20: warning:
      implicit conversion loses integer precision: 'unsigned long' to 'unsigned int' [-Wshorten-64-to-32]
                zparams.dictID = PyLong_AsUnsignedLong(PyTuple_GetItem(parameters, 3));
                               ~ ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
4 warnings generated.

copy_stream leaks memory (when compressing)

Same as https://github.com/indygreg/python-zstandard/issues/35
copy_steam leaks also on compression
copy_stream leaks a lot of memory when compressing.
The amount leaked seems to be proportional to the size of the input stream
changing 1024*1024 to 1024 leaks less memory

import os
import gc
import zstd
import tempfile
import resource

with tempfile.NamedTemporaryFile('wb', delete=False) as f:
    f.write(os.urandom(1024*1024))
input_path = f.name
f = tempfile.NamedTemporaryFile(delete=False); f.close()
output_path = f.name
for i in range(1001):
    with open(input_path, 'rb') as ifh, open(output_path, 'wb') as ofh:
        zstd.ZstdCompressor().copy_stream(ifh, ofh)
    if i % 100 == 0:
        print(i, resource.getrusage(resource.RUSAGE_SELF).ru_maxrss)
        gc.collect()
os.remove(input_path)
os.remove(output_path)

output:
0 12952
100 116052
200 219148
300 322460
400 425780
500 529092
600 632404
700 735716
800 839036
900 942348
1000 1045660

ZstdDecompressorIterator stops before frame is fully decompressed

Thanks for this very useful library!

I encountered an issue with decompression using the streaming decompressor iterator read_from where the iterator would raise StopIteration before reading to the end of the file. The file is however successfully decompressed with the zstd command line utility from the official repo. (I would include the file but it is really big.)

I think this behaviour is due to the following code:
https://github.com/indygreg/python-zstandard/blob/master/zstd.c#L2260

zresult == 1 does not necessarily imply that we are done with the input. It could also reflect that an additional byte needs to be read (as in my case), or more generally, that there's more to be done (this is reflected in the updated documentation in the master branch of zstd).

Removing this conditional assignment allows for the entire file to be decompressed; I think the code correctly handles the case where the buffers are not flushed but it is done reading the input.

That said I'm wondering if running out of input before the output is properly flushed should raise an exception instead of StopIteration.

Roundtrip fails with 0.8.1

With the current package (0.8.1) I see this:

$ python
Python 3.6.4 (default, Jan 17 2018, 12:00:56) 
[GCC 7.2.1 20170915 (Red Hat 7.2.1-2)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import zstd
>>> data = b'123' * 1000
>>> cctx = zstd.ZstdCompressor()
>>> c=cctx.compress(data, allow_empty=True)
>>> dctx = zstd.ZstdDecompressor()
>>> dctx.decompress(c)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
zstd.ZstdError: input data invalid or missing content size in frame header

In case it's helpful:

$ pip freeze
Cython==0.27.3
deprecation==1.0.1
llvmlite==0.21.0
lz4==0.19.1
numba==0.36.2
numpy==1.14.0
pandas==0.22.0
python-dateutil==2.6.1
python-snappy==0.5.1
pytz==2017.3
six==1.11.0
thrift==0.11.0
zstandard==0.8.1
$ rpm -q libzstd
libzstd-1.3.3-1.fc27.x86_64

(currently blocking dask/fastparquet#296)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.