rudi-cilibrasi / zlibcomplete Goto Github PK
View Code? Open in Web Editor NEWC++ interface to the ZLib library supporting compression with FLUSH, decompression, and std::string. RAII
License: MIT License
C++ interface to the ZLib library supporting compression with FLUSH, decompression, and std::string. RAII
License: MIT License
Hello,
I tried to compress something larger than ZLIB_COMPLETE_CHUNK and I have a problem.
int main(int argc, char **argv)
{
std::ifstream t("file");
std::string str;
t.seekg(0, std::ios::end);
str.reserve(t.tellg());
t.seekg(0, std::ios::beg);
str.assign((std::istreambuf_iterator<char>(t)),
std::istreambuf_iterator<char>());
GZipCompressor compressor(9, auto_flush);
string output = compressor.compress(str);
output += compressor.finish();
GZipDecompressor decompressor;
string output2 = decompressor.decompress(output);
cout << output2;
return 0;
}
After my decompression, the beginning (16384, ZLIB_COMPLETE_CHUNK) is ok but then, I have the beginning of the file again.
The problem is with the compression.
In zlibtop.cpp, the member autoFlush_
is never initialized, and has no setter, but is used in the baseCompress
method.
I discovered this because baseCompress
was sometimes failing for no reason; it turns out this happened whenever autoFlush_
randomly had a nonzero value. For some reason this causes the call to deflate
to fail.
I fixed the bug by adding a line to the constructor that sets autoFlush = false
, but it seems that this variable could be removed entirely since there's no way to ever set it to true.
Hi,
There is no check on the value of strm_.avail_out in this function. This means it can happen (and it has for me) you don't get the full encoded string after calling the baseFinish function (in reality after calling the finish() function).
Thus, this part of the baseFinish function:
strm_.avail_out = ZLIB_COMPLETE_CHUNK;
strm_.next_out = (Bytef *) out_;
retval = deflate(&strm_, Z_FINISH);
if (retval == Z_STREAM_ERROR) {
throw std::bad_alloc();
}
have = ZLIB_COMPLETE_CHUNK - strm_.avail_out;
result += std::string(out_, have);
should be replaced by:
do {
strm_.avail_out = ZLIB_COMPLETE_CHUNK;
strm_.next_out = (Bytef *) out_;
retval = deflate(&strm_, Z_FINISH);
if (retval == Z_STREAM_ERROR) {
throw std::bad_alloc();
}
have = ZLIB_COMPLETE_CHUNK - strm_.avail_out;
result += std::string(out_, have);
} while(strm_.avail_out == 0);
The following lines in zlibcomplete.hpp trigger a compilation failure when using any compiler other than GCC (i.e. Clang or MSVC):
#if CURRENT_GCC_VERSION < 40800
#error "Please use g++ version 4.8 or later."
#endif
Please don't do this. I want to use this library in a cross-platform product, so now I can't import it as a Git submodule; instead I have to copy the code or fork the repo, then modify it.
Using std::string as the data format in the API means unavoidable memory copying and allocation in most cases. If the caller's data is in any other form, it has to construct a new std::string object to pass it to zlibcomplete, which involves allocating a new copy of the data on the heap. Similarly, zlibcomplete returns std::strings which means the result data always ends up allocated on the heap and has to be copied elsewhere by the caller.
In most situations this overhead won't be noticeable. But compression/decompression can become a CPU bottleneck in some areas, and in that case it's important to cut out any unnecessary memory allocation or copying.
I suggest having the core implementation use direct pointers to the data, and then implementing the existing string-based API as a simple wrapper around that. Then clients can choose performance or convenience.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.