rudi-cilibrasi / zlibcomplete Goto Github PK

View Code? Open in Web Editor NEW

37.0 37.0 5.0 740 KB

C++ interface to the ZLib library supporting compression with FLUSH, decompression, and std::string. RAII

License: MIT License

C++ 76.41% Shell 0.28% Ruby 8.30% CMake 15.02%

zlibcomplete's People

Contributors

Stargazers

Watchers

Forkers

debaleena-bhowmik kylinxh olesteban kingrobotzhou

zlibcomplete's Issues

Compression of something larger than ZLIB_COMPLETE_CHUNK

Hello,

I tried to compress something larger than ZLIB_COMPLETE_CHUNK and I have a problem.

int main(int argc, char **argv)
{
    std::ifstream t("file");
    std::string str;

    t.seekg(0, std::ios::end);
    str.reserve(t.tellg());
    t.seekg(0, std::ios::beg);

    str.assign((std::istreambuf_iterator<char>(t)),
                    std::istreambuf_iterator<char>());

    GZipCompressor compressor(9, auto_flush);
    string output = compressor.compress(str);
    output  += compressor.finish();

    GZipDecompressor decompressor;
    string output2 = decompressor.decompress(output);

    cout << output2;

    return 0;
}

After my decompression, the beginning (16384, ZLIB_COMPLETE_CHUNK) is ok but then, I have the beginning of the file again.
The problem is with the compression.

autoFlush_ is used uninitialized

In zlibtop.cpp, the member autoFlush_ is never initialized, and has no setter, but is used in the baseCompress method.

I discovered this because baseCompress was sometimes failing for no reason; it turns out this happened whenever autoFlush_ randomly had a nonzero value. For some reason this causes the call to deflate to fail.

I fixed the bug by adding a line to the constructor that sets autoFlush = false, but it seems that this variable could be removed entirely since there's no way to ever set it to true.

Bug ZLibBaseCompressor::baseFinish

Hi,

There is no check on the value of strm_.avail_out in this function. This means it can happen (and it has for me) you don't get the full encoded string after calling the baseFinish function (in reality after calling the finish() function).

Thus, this part of the baseFinish function:

      strm_.avail_out = ZLIB_COMPLETE_CHUNK;
      strm_.next_out = (Bytef *) out_;
      retval = deflate(&strm_, Z_FINISH);
      if (retval == Z_STREAM_ERROR) {
          throw std::bad_alloc();
      }
      have = ZLIB_COMPLETE_CHUNK - strm_.avail_out;
      result += std::string(out_, have);

should be replaced by:

  do {
      strm_.avail_out = ZLIB_COMPLETE_CHUNK;
      strm_.next_out = (Bytef *) out_;
      retval = deflate(&strm_, Z_FINISH);
      if (retval == Z_STREAM_ERROR) {
          throw std::bad_alloc();
      }
      have = ZLIB_COMPLETE_CHUNK - strm_.avail_out;
      result += std::string(out_, have);
  } while(strm_.avail_out == 0);

Please don't explicitly check for GCC

The following lines in zlibcomplete.hpp trigger a compilation failure when using any compiler other than GCC (i.e. Clang or MSVC):

#if CURRENT_GCC_VERSION < 40800
#error "Please use g++ version 4.8 or later."
#endif

Please don't do this. I want to use this library in a cross-platform product, so now I can't import it as a Git submodule; instead I have to copy the code or fork the repo, then modify it.

Data is copied/allocated unnecessarily, due to std::string API

Using std::string as the data format in the API means unavoidable memory copying and allocation in most cases. If the caller's data is in any other form, it has to construct a new std::string object to pass it to zlibcomplete, which involves allocating a new copy of the data on the heap. Similarly, zlibcomplete returns std::strings which means the result data always ends up allocated on the heap and has to be copied elsewhere by the caller.

In most situations this overhead won't be noticeable. But compression/decompression can become a CPU bottleneck in some areas, and in that case it's important to cut out any unnecessary memory allocation or copying.

I suggest having the core implementation use direct pointers to the data, and then implementing the existing string-based API as a simple wrapper around that. Then clients can choose performance or convenience.

rudi-cilibrasi / zlibcomplete Goto Github PK

zlibcomplete's People

Contributors

Stargazers

Watchers

Forkers

zlibcomplete's Issues

Compression of something larger than ZLIB_COMPLETE_CHUNK

autoFlush_ is used uninitialized

Bug ZLibBaseCompressor::baseFinish

Please don't explicitly check for GCC

Data is copied/allocated unnecessarily, due to std::string API

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent