Git Product home page Git Product logo

zlibcomplete's People

Contributors

foxcpp avatar oesteban-vx avatar petabytestorage avatar rudi-cilibrasi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

zlibcomplete's Issues

Compression of something larger than ZLIB_COMPLETE_CHUNK

Hello,

I tried to compress something larger than ZLIB_COMPLETE_CHUNK and I have a problem.

int main(int argc, char **argv)
{
    std::ifstream t("file");
    std::string str;

    t.seekg(0, std::ios::end);
    str.reserve(t.tellg());
    t.seekg(0, std::ios::beg);

    str.assign((std::istreambuf_iterator<char>(t)),
                    std::istreambuf_iterator<char>());

    GZipCompressor compressor(9, auto_flush);
    string output = compressor.compress(str);
    output  += compressor.finish();

    GZipDecompressor decompressor;
    string output2 = decompressor.decompress(output);

    cout << output2;

    return 0;
}

After my decompression, the beginning (16384, ZLIB_COMPLETE_CHUNK) is ok but then, I have the beginning of the file again.
The problem is with the compression.

autoFlush_ is used uninitialized

In zlibtop.cpp, the member autoFlush_ is never initialized, and has no setter, but is used in the baseCompress method.

I discovered this because baseCompress was sometimes failing for no reason; it turns out this happened whenever autoFlush_ randomly had a nonzero value. For some reason this causes the call to deflate to fail.

I fixed the bug by adding a line to the constructor that sets autoFlush = false, but it seems that this variable could be removed entirely since there's no way to ever set it to true.

Bug ZLibBaseCompressor::baseFinish

Hi,

There is no check on the value of strm_.avail_out in this function. This means it can happen (and it has for me) you don't get the full encoded string after calling the baseFinish function (in reality after calling the finish() function).

Thus, this part of the baseFinish function:

      strm_.avail_out = ZLIB_COMPLETE_CHUNK;
      strm_.next_out = (Bytef *) out_;
      retval = deflate(&strm_, Z_FINISH);
      if (retval == Z_STREAM_ERROR) {
          throw std::bad_alloc();
      }
      have = ZLIB_COMPLETE_CHUNK - strm_.avail_out;
      result += std::string(out_, have);

should be replaced by:

  do {
      strm_.avail_out = ZLIB_COMPLETE_CHUNK;
      strm_.next_out = (Bytef *) out_;
      retval = deflate(&strm_, Z_FINISH);
      if (retval == Z_STREAM_ERROR) {
          throw std::bad_alloc();
      }
      have = ZLIB_COMPLETE_CHUNK - strm_.avail_out;
      result += std::string(out_, have);
  } while(strm_.avail_out == 0);

Please don't explicitly check for GCC

The following lines in zlibcomplete.hpp trigger a compilation failure when using any compiler other than GCC (i.e. Clang or MSVC):

#if CURRENT_GCC_VERSION < 40800
#error "Please use g++ version 4.8 or later."
#endif

Please don't do this. I want to use this library in a cross-platform product, so now I can't import it as a Git submodule; instead I have to copy the code or fork the repo, then modify it.

Data is copied/allocated unnecessarily, due to std::string API

Using std::string as the data format in the API means unavoidable memory copying and allocation in most cases. If the caller's data is in any other form, it has to construct a new std::string object to pass it to zlibcomplete, which involves allocating a new copy of the data on the heap. Similarly, zlibcomplete returns std::strings which means the result data always ends up allocated on the heap and has to be copied elsewhere by the caller.

In most situations this overhead won't be noticeable. But compression/decompression can become a CPU bottleneck in some areas, and in that case it's important to cut out any unnecessary memory allocation or copying.

I suggest having the core implementation use direct pointers to the data, and then implementing the existing string-based API as a simple wrapper around that. Then clients can choose performance or convenience.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.