google / brotli Goto Github PK
View Code? Open in Web Editor NEWBrotli compression format
License: MIT License
Brotli compression format
License: MIT License
Hi,
I checked the BrotliCompressBufferParallel function in encode_parallel.cc and can't find
any parallel execution, but it looks more like it was prepared for parallel execution with
openMP. Can you give me an idea where to find the parallel ?
python setup.py build
running build
running build_ext
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/dec
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/bit_reader.c -o build/temp.linux-x86_64-2.7/dec/bit_reader.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/decode.c -o build/temp.linux-x86_64-2.7/dec/decode.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/huffman.c -o build/temp.linux-x86_64-2.7/dec/huffman.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/streams.c -o build/temp.linux-x86_64-2.7/dec/streams.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/state.c -o build/temp.linux-x86_64-2.7/dec/state.o
creating build/temp.linux-x86_64-2.7/python
creating build/temp.linux-x86_64-2.7/enc
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c python/brotlimodule.cc -o build/temp.linux-x86_64-2.7/python/brotlimodule.o -std=c++0x
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
python/brotlimodule.cc:2:20: fatal error: Python.h: Aucun fichier ou dossier de ce type
compilation terminated.
error: command 'gcc' failed with exit status 1
python3 setup.py build
running build
running build_ext
creating build/temp.linux-x86_64-3.2
creating build/temp.linux-x86_64-3.2/dec
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/bit_reader.c -o build/temp.linux-x86_64-3.2/dec/bit_reader.o
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/decode.c -o build/temp.linux-x86_64-3.2/dec/decode.o
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/huffman.c -o build/temp.linux-x86_64-3.2/dec/huffman.o
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/streams.c -o build/temp.linux-x86_64-3.2/dec/streams.o
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/state.c -o build/temp.linux-x86_64-3.2/dec/state.o
creating build/temp.linux-x86_64-3.2/python
creating build/temp.linux-x86_64-3.2/enc
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c python/brotlimodule.cc -o build/temp.linux-x86_64-3.2/python/brotlimodule.o -std=c++0x
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
python/brotlimodule.cc:2:20: fatal error: Python.h: Aucun fichier ou dossier de ce type
compilation terminated.
error: command 'gcc' failed with exit status 1
If you need any more details, please ask.
Thanks,
after today's commit 94cd708, I'm getting this error when trying to compile the Python extension on Windows. I'm using Microsoft Visual C compiler from Visual Studio 2010.
./dec/bit_reader.h(165) : error C2039: 'pos_' : is not a member of 'BrotliBitReader'
./dec/bit_reader.h(39) : see declaration of 'BrotliBitReader'
error: command 'cl.exe' failed with exit status 2
Cheers,
Cosimo
Would you like to add more error handling for return values from functions like the following?
ubsan detects a lot of unaligned stores/loads, as well as a signed integer overflow, and a left shift of a negative value:
/home/nemequ/local/src/squash/plugins/brotli/brotli/enc/./././static_dict.h:57:19: runtime error: left shift of negative value -32
/home/nemequ/local/src/squash/plugins/brotli/brotli/enc/./././static_dict.h:57:29: runtime error: signed integer overflow: -3096880 + -2147483648 cannot be represented in type 'int [25]'
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:150:38: runtime error: store to misaligned address 0x7fff9b36051d for type 'uint64_t', which requires 8 byte alignment
0x7fff9b36051d: note: pointer points here
59 92 ba 00 00 00 00 f0 4f 22 02 00 00 00 00 f8 4f 22 02 00 00 00 00 70 05 36 9b ff 7f 00 00 22
^
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:151:42: runtime error: store to misaligned address 0x7fff9b360525 for type 'uint8_t', which requires 8 byte alignment
0x7fff9b360525: note: pointer points here
00 00 00 00 00 00 00 f8 4f 22 02 00 00 00 00 70 05 36 9b ff 7f 00 00 22 5d 54 8f 98 7f 00 00 80
^
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:152:43: runtime error: store to misaligned address 0x7fff9b36052d for type 'uint8_t', which requires 8 byte alignment
0x7fff9b36052d: note: pointer points here
00 00 00 00 00 00 00 70 05 36 9b ff 7f 00 00 22 5d 54 8f 98 7f 00 00 80 05 36 9b ff 7f 00 00 f8
^
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:153:43: runtime error: store to misaligned address 0x7fff9b360535 for type 'uint8_t', which requires 8 byte alignment
0x7fff9b360535: note: pointer points here
00 00 00 00 7f 00 00 22 5d 54 8f 98 7f 00 00 80 05 36 9b ff 7f 00 00 f8 4f 22 02 00 00 00 00 b8
^
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:188:17: runtime error: load of misaligned address 0x7fff9b3600d5 for type 'const uint64_t', which requires 8 byte alignment
0x7fff9b3600d5: note: pointer points here
ab b8 8f 65 ed 9c e7 0f 83 d2 75 fa fc b0 e8 37 25 3e 42 c1 d4 85 c5 8f a0 11 8f 16 90 87 8f 90
^
I'm trying to implement streaming compression using brotli. There's very little documentation and I haven't found anyone else trying to accomplish same thing. I assume I need to specify lgwin to set the block size, but when I run this code it gives me an error "brotli.error: BrotliDecompress failed"
my_str = 'very long string blah blah blah'
a = brotli.compress(my_str, lgwin=16)
b = brotli.decompress(a[0:16])
Am I doing something wrong or did I misunderstand the purpose of the lgwin parameter?
Hi,
as title says, if I compress a allocated (zeroed) buffer of (0x800000 + 1) bytes, the result of BrotliCompressBuffer can not be decompressed using BrotliDecompressBuffer.
As there are no error-codes or messages, I dont know what the problem is.
testing under OS X 10.10.5
thank you
encode.cc
:392
static const int kSampleRate = 13;
static const double kMinEntropy = 7.92;
static const double kBitCostThreshold = bytes * kMinEntropy / kSampleRate;
kBitCostThreshold is cached once initialized.
However, if brotli library is reused within the same process, first compress a small piece of data, then a large one will suffer from poor compression ratio, since some block will be misleadingly uncompressed.
Hi.
I'm trying to build the lib using "python setup.py build" and I can see lib is placed at "build/lib.macosx-10.10-intel-2.7/brotli.so", however, I think .so file is not usable in MacOS, I expect to see a .dylib file, is this a problem? how to solve it please?
Thanks a lot!
Seems to happen with data with a length of a power of two >= 32. 8 and 16 work, 32, 64, 128, and 256 don't. Test case:
#include <stdint.h>
#include <assert.h>
#include "enc/encode.h"
#include "dec/decode.h"
const uint8_t uncompressed[] = {
0x01, 0x4a, 0x2d, 0x82, 0x59, 0x2f, 0xdd, 0xe6,
0x4f, 0x69, 0x6c, 0x70, 0x13, 0x87, 0x9f, 0xe4,
0xd3, 0xb1, 0x9a, 0x86, 0xeb, 0x31, 0xa9, 0x69,
0x1d, 0x2b, 0x22, 0x60, 0x38, 0xaf, 0xb2, 0x27,
0xee, 0x48, 0x9e, 0x67, 0x3d, 0x5f, 0xca, 0xfa,
0x68, 0x74, 0x03, 0x6f, 0x03, 0x5c, 0xcb, 0x45,
0x1c, 0xfc, 0xc1, 0x4b, 0x1d, 0xb1, 0x2b, 0x7b,
0x87, 0xf2, 0xf4, 0xea, 0xc4, 0x34, 0x93, 0x75,
0xb5, 0x45, 0xa0, 0x70, 0x77, 0xe9, 0xe3, 0xe3,
0xe9, 0xf9, 0x36, 0x80, 0x2c, 0x3b, 0x19, 0xab,
0x46, 0xe2, 0xeb, 0x16, 0xf9, 0x4c, 0xac, 0x03,
0x42, 0x8c, 0x25, 0x3a, 0x9e, 0x68, 0xc7, 0x26,
0xce, 0x45, 0x1f, 0x3f, 0xc6, 0x24, 0x38, 0x01,
0xb2, 0x1a, 0x4f, 0x25, 0xd3, 0x7c, 0x5f, 0x37,
0xbc, 0x6b, 0x3d, 0xb1, 0x1d, 0x76, 0xc5, 0xb9,
0xae, 0xbd, 0x4c, 0x67, 0x87, 0xd9, 0xd0, 0x58,
0xe9, 0x42, 0x4e, 0x32, 0xbf, 0x83, 0xfd, 0xad,
0x63, 0x88, 0x7c, 0x9d, 0xd3, 0x25, 0x9e, 0xe0,
0x44, 0x4a, 0xbf, 0x88, 0xc4, 0x46, 0x24, 0x21,
0xf3, 0x35, 0xee, 0xa9, 0xc4, 0x6f, 0xdf, 0xe5,
0xef, 0xc5, 0x13, 0x25, 0x3f, 0x33, 0x1e, 0x54,
0x45, 0x79, 0xc0, 0x5e, 0x67, 0x4e, 0x7b, 0xa7,
0xe1, 0xe8, 0x7c, 0xe6, 0x5a, 0x7a, 0x20, 0x4f,
0x1b, 0xf2, 0xe9, 0x6a, 0x8e, 0xfc, 0x23, 0xfd,
0x5f, 0x47, 0x24, 0x9c, 0xe4, 0xa9, 0x0c, 0x3d,
0x8f, 0x9d, 0x81, 0x46, 0x25, 0x2d, 0x43, 0x49,
0xe2, 0xcc, 0x98, 0x8d, 0x14, 0x2b, 0x17, 0xc2,
0x1b, 0xd3, 0x03, 0x13, 0x8d, 0x72, 0x76, 0x96,
0xce, 0x23, 0x93, 0xee, 0x30, 0x7a, 0xe3, 0x74,
0xb6, 0x28, 0xc4, 0xfc, 0xb6, 0x3e, 0xf9, 0xe0,
0x9a, 0x88, 0x86, 0x5b, 0xfc, 0xc0, 0x0b, 0xdf,
0xa4, 0xfb, 0x79, 0x0f, 0x95, 0x12, 0xaf, 0x43
};
int main(int argc, char *argv[]) {
uint8_t compressed[sizeof(uncompressed) + 4];
size_t compressed_length = sizeof(compressed);
uint8_t decompressed[sizeof(uncompressed)];
size_t decompressed_length = sizeof(decompressed);
{
int result;
brotli::BrotliParams params;
params.quality = 11;
params.mode = brotli::BrotliParams::MODE_GENERIC;
params.enable_transforms = false;
result = brotli::BrotliCompressBuffer (params,
sizeof(uncompressed), uncompressed,
&compressed_length, compressed);
assert (result == 1);
fprintf (stderr, "Compressed %zu bytes to %zu bytes\n", sizeof(uncompressed), compressed_length);
}
{
BrotliResult result;
result = BrotliDecompressBuffer (compressed_length, compressed,
&decompressed_length, decompressed);
assert (result == BROTLI_RESULT_SUCCESS);
fprintf (stderr, "Decompressed %zu bytes to %zu bytes\n", compressed_length, decompressed_length);
}
return 0;
}
I was trying to use google/woff2 on a machine running OS X 10.8 and got the following error:
./brotli/enc/./static_dict.h:21:10: fatal error: 'unordered_map' file not found
It turns out that the version of gcc with Xcode 5 is too low, I had to upgrade it manually from 4.2 to 4.7. If it’s relatively easy to clarify what that error message is about, I think it could be helpful. To be fair, I don’t really know anything about this stuff though, so feel free to disregard as well.
Thanks!
Including installation / setup / usage instructions in the README would allow more people to download / use / test brotli more easily.
I am writing bindings which should be portable across platforms and work with older compilers. Is there a way to fix the warnings below (without suppressing them)?
clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -Ienc -Idec -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include -std=c++11 -fPIC -Wall -mtune=core2 -g -O2 -c enc/metablock.cc -o enc/metablock.o
In file included from enc/metablock.cc:18:
enc/./metablock.h:28:1: warning: 'BlockSplit' defined as a struct here but previously declared as a class [-Wmismatched-tags]
struct BlockSplit {
^
enc/./histogram.h:30:1: note: did you mean struct here?
class BlockSplit;
^~~~~
struct
Also a few warnings about c++11 extensions. Is there a way to work around these for older compilers?
In file included from enc/encode_parallel.cc:17:
In file included from enc/./encode_parallel.h:25:
In file included from enc/./encode.h:28:
enc/./streams.h:59:46: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
const void* Read(size_t n, size_t* OUTPUT) override;
^
enc/./streams.h:77:41: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
bool Write(const void* buf, size_t n) override;
^
enc/./streams.h:95:41: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
bool Write(const void* buf, size_t n) override;
^
enc/./streams.h:108:50: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
const void* Read(size_t n, size_t* bytes_read) override;
^
enc/./streams.h:121:41: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
bool Write(const void* buf, size_t n) override;
^
On Ubuntu 14.04:
g++ -I/usr/share/R/include -DNDEBUG -Ienc -Idec -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -c enc/backward_references.cc -o enc/backward_references.o
In file included from enc/./backward_references.h:23:0,
from enc/backward_references.cc:17:
enc/././hash.h:673:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H1> hash_h1;
^
enc/././hash.h:674:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H2> hash_h2;
^
enc/././hash.h:675:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H3> hash_h3;
^
enc/././hash.h:676:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H4> hash_h4;
^
enc/././hash.h:677:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H5> hash_h5;
^
enc/././hash.h:678:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H6> hash_h6;
^
enc/././hash.h:679:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H7> hash_h7;
^
enc/././hash.h:680:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H8> hash_h8;
^
enc/././hash.h:681:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
std::unique_ptr<H9> hash_h9;
^
enc/././hash.h: In member function ‘void brotli::Hashers::Init(int)’:
enc/././hash.h:636:15: error: ‘hash_h1’ was not declared in this scope
case 1: hash_h1.reset(new H1); break;
^
enc/././hash.h:637:15: error: ‘hash_h2’ was not declared in this scope
case 2: hash_h2.reset(new H2); break;
^
enc/././hash.h:638:15: error: ‘hash_h3’ was not declared in this scope
case 3: hash_h3.reset(new H3); break;
^
enc/././hash.h:639:15: error: ‘hash_h4’ was not declared in this scope
case 4: hash_h4.reset(new H4); break;
^
enc/././hash.h:640:15: error: ‘hash_h5’ was not declared in this scope
case 5: hash_h5.reset(new H5); break;
^
enc/././hash.h:641:15: error: ‘hash_h6’ was not declared in this scope
case 6: hash_h6.reset(new H6); break;
^
enc/././hash.h:642:15: error: ‘hash_h7’ was not declared in this scope
case 7: hash_h7.reset(new H7); break;
^
enc/././hash.h:643:15: error: ‘hash_h8’ was not declared in this scope
case 8: hash_h8.reset(new H8); break;
^
enc/././hash.h:644:15: error: ‘hash_h9’ was not declared in this scope
case 9: hash_h9.reset(new H9); break;
^
Brotli's compression speed for the x-ray file from the Silesa compression corpus is unreasonably poor. In the benchmark I'm working on it takes about 60x as long as the next slowest codec, which is ZPAQ at its highest setting.
In the Python version, the allowed range of the Window is 16 to 24:
PyErr_SetString(BrotliError, "Invalid lgwin. Range is 16 to 24.");
params.add_argument('--lgwin', metavar="LGWIN", type=int, choices=list(range(16, 25)), help='Base 2 logarithm of the sliding window size. Range is ' '16 to 24. Defaults to 22.')
But elsewhere, we see a different limit:
// Base 2 logarithm of the sliding window size. Range is 10 to 24. int lgwin;
if (*lgwin < 10 || *lgwin >= 25) { goto error;
Is this expected?
IMO, it would be cleaner to move the public headers to a separate include
directory, and/or give them less generic names like brotli-decode.h
and brotli-encode.h
, to streamline the use of the library.
I was trying to compress a 30GB sqlite file with data from wikipedia on a machine with 1GB of memory.
And the command crash with std::bad_alloc on quality 11, 10 and 9.
Command was running on Debian 8.1 x64 compiled with G++ 4.9.2
As already noted in the code https://github.com/google/brotli/blob/master/dec/decode.c#L89, assuming details about the memcpy implementation and optimizations is playing with fire. Maybe introduce a BROTLI_NO_DANGEROUS_OPTIMIZATIONS #define that avoids this kind of code path, to ease porting to non-mainstream architectures?
An extra null pointer check is not needed in functions like the following.
Is there a JavaScript Implementation of the decompressor part for in browser usage?
I found https://github.com/devongovett/brotli.js and created foliojs/brotli.js#2 ... but maybe someone else knows a decompress implementation?
More an idea as an issue.
Every MetaBlock could decode standalone in a thread.
Now an issue.
The blocksize is stored in meta-block header. (see spec. 9.2.) But only the final uncompressed size of block. The compressed size length isn't stored. That is a problem. Because the next thread will not know the position to start parallel the decoding of next block.
Solution:
Include a store information about compression block size.
Is this due to having an older version of GCC?
~/src/brotli$ python3 setup.py build
running build
running build_ext
creating build
creating build/temp.linux-x86_64-3.3
creating build/temp.linux-x86_64-3.3/dec
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/bit_reader.c -o build/temp.linux-x86_64-3.3/dec/bit_reader.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/decode.c -o build/temp.linux-x86_64-3.3/dec/decode.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/huffman.c -o build/temp.linux-x86_64-3.3/dec/huffman.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/streams.c -o build/temp.linux-x86_64-3.3/dec/streams.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/state.c -o build/temp.linux-x86_64-3.3/dec/state.o
creating build/temp.linux-x86_64-3.3/python
creating build/temp.linux-x86_64-3.3/enc
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c python/brotlimodule.cc -o build/temp.linux-x86_64-3.3/python/brotlimodule.o -std=c++0x
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
In file included from python/../enc/encode.h:28:0,
from python/brotlimodule.cc:4:
python/../enc/./streams.h:59:44: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:59:46: error: ‘override’ does not name a type
python/../enc/./streams.h:77:39: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:77:41: error: ‘override’ does not name a type
python/../enc/./streams.h:95:39: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:95:41: error: ‘override’ does not name a type
python/../enc/./streams.h:108:48: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:108:50: error: ‘override’ does not name a type
python/../enc/./streams.h:121:39: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:121:41: error: ‘override’ does not name a type
error: command 'gcc' failed with exit status 1
~/src/brotli$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
This is with revision d811b18. I can't really reproduce the full report here, but here is a summary of each item it found.
enc/histogram.h:37
: Non-static class member bit_cost_
is not initialized in this constructor nor in any functions that it calls.enc/command.h:106
: Non-static class members insert_len_
, copy_len_
, cmd_prefix_
, dist_prefix_
, cmd_extra_
, and dist_extra_
are not initialized in this constructor nor in any functions that it calls.enc/brotli_bit_stream.cc:343
: index
is passed to a parameter that cannot be negative. IndexOf can return -1enc/literal_cost.cc:102
: Execution cannot reach the expression 0 inside this statement: last_c = ((i + window_half …
i + window_half - 2 < 0
, the value of i
must be at least 0.i + window_half - 2 < 0
, the value of window_half
must be equal to 495.i + window_half - 2 < 0
cannot be true.enc/prefix.h:75
: In expression distance_code >> bucket
, shifting by a negative amount has undefined behavior. The shift amount, bucket
, is -2.
brotli::Log2Floor(distance_code)
returns -1.bucket = brotli::Log2Floor(distance_code) - 1
. The value of bucket
is now -2.enc/entropy_encode.cc:45
: Non-static class members total_count_
, index_left_
, and index_right_or_value_
are not initialized in this constructor nor in any functions that it calls.enc/backward_references.cc:157
: The compiler-generated constructor for this class does not initialize min_cost_cmd_
enc/encode.cc:207
: Non-static class member literal_cost_mask_
is not initialized in this constructor nor in any functions that it calls.dec/huffman.c:142
: Using uninitialized value sorted[symbol++]
dec/decode.c:871
: Using uninitialized value s.loop_counter
when calling BrotliDecompressStreaming
enc/metablock.cc:496
: Using tainted variable context as an index to pointer static_context_map
enc/static_dict.cc:391
: data[0] == '\xc2'
is always false regardless of the values of its operands. This occurs as the logical operand of if.If you need more details about an item let me know. I only provided the full information on the one item because it is a bit difficult to follow without the it, but I think the others should be pretty easy to figure out.
Can you guys use a proper build system, cmake or even a plain makefile?
At least so that we can get a single .so/.dylib and be able to easily link against it via -lbrotli
or something.
There is an issue with the Python 3-compatible pypy (aka pypy3) whereby distutils fails to use the proper C++ compiler/linker while compiling C++ extension modules (such as Brotli).
https://bitbucket.org/pypy/pypy/issues/1763/not-using-proper-c-compilers-linker-while
In the Python2.7-compatible pypy, they have fixed this by patching distutils/unixccompiler.py
so that it uses c++
as default C++ compiler instead of cc
:
https://bitbucket.org/pypy/pypy/commits/c6e45dfbda905fa9e626782c8d2dd313ff3f54cf
However, they haven't ported this patch to pypy3 yet.
At behdad/fonttools, we use Brotli for WOFF2 and we test it on Travis under different python versions, including pypy3.
Because the C++ runtime library isn't being linked when compiling Brotli for pypy3, the module fails to be loaded with undefined symbol
error.
For details, see: fonttools/fonttools#339
As far as I could test, this problem only occurs on pypy3 for Linux (the Travis python setup runs on Ubuntu 12.04). On OS X, where distutils.unixccompiler
is also used, I verified that brotli is compiled and imported without problems when using the same pypy3 version (2.4.0) as the one used on Travis -- but I guess it's because on OSX the name cc
is just a symbolic link to clang
, and the latter does the right thing.
As a workaround, we currently do something like this:
python setup.py build_ext --libraries=stdc++
While pypy developers fixes the issue, Brotli's setup.py could be modified to link with libstdc++ by default whenever platform.python_implementation() == "PyPy" and sys.version_info[0] == 3
. (I wonder if there would be portability issues if libstdc++ were linked all the time, on any python platform/version/implementation?)
A second alternative would be to change the value of UnixCCompiler.compiler_cxx
variable from the current ['cc']
to ['c++']
if running on pypy3 (exactly like in the above mentioned pypy2 patch). This is what they do in spaCy for example (I found this by chance).
Now, both approaches seem to work, though I'm not sure which one is preferable.
A third approach, of course, is simply take note and wait.
Anyway, I have mentioned this on pypy's bug tracker, where I found at least two duplicate issues (#1099 and #1763) referencing the same problem.
Please let me know if anyone else has encountered the same problem.
Thanks,
Cosimo
v0.2.0 has been tagged, but the Python package version as stored in python/brotlimodule.cc
still says "0.1.0" (see here).
This version string is read by the setup.py, and stored in the package metadata upon installing.
I wonder if the Python module's version should also increase every time a Brotli update is tagged, or if language bindings should have their own version numbers, independent from the core library.
WDYT?
Change it to .b please
It would be nice to have some wrapper or native implementation for golang.
std::vector::data
is defined in C+11, so Microsoft Visual C++ 9.0 (aka Visual Studio 2008) does not support it and raises the following error:
python/brotlimodule.cc(194) : error C2039: 'data' : is not a member of 'std::vector<_Ty>'
with
[
_Ty=uint8_t
]
The (uglier) method &vector[0]
could be used instead of vector.data()
to get the pointer to the underlying array (provided it isn't empty).
As you know, Python 2.7 for Windows is still compiled with Visual Studio 2008, so all extension modules should in theory be compiled using the same MSVC compiler version. After VS2008 was discontinued, Microsoft released an ad-hoc "Visual C++ Compiler for Python 2.7" (http://aka.ms/vcpython27) meant to be used for compiling extension for Windows Python 2.7.
In the current Brotli's setup.py
, I added a patch to force distutils use Visual Studio 2010 for Python 2.7. This seems to work well so far, despite many people warn against mixing different C runtimes versions between the interpreter and the extension modules.
Since you recently added support for C++98, it'd be nice to also try support the old Microsoft compiler.
Thank you.
woff2_decompress
always uses .ttf
extension for the output file, even if the payload is an OTTO font.
You can test with Source Sans Pro.
Shouldn't \dec\dictionary.h
and \enc\dictionary.h
be identical?
https://github.com/google/brotli/blob/master/enc/dictionary.h looks like it may be out-of-date; it doesn't include, e.g. the
#if defined(__cplusplus) || defined(c_plusplus)
} /* extern "C" */
#endif
...block at the end, for instance.
(Incidentally, declaration of the constant in a header file is apparently frowned upon per people smarter than me. Naively building brotli.exe with VS2015 results in 6 copies of the constant array in the resulting executable.)
I have a random generated file: https://drive.google.com/file/d/0By7bqJ83IOI8WUU5T2xXTzhqMjA/view?usp=sharing
Length of the file is 8192000.
Compress it with: bro --quality 1 --input bro_sample_0 --output bro_sample_0.bro
Then decompress with: bro --decompress --input bro_sample_0.bro --output bro_sample_0.1
And I got a file with length 20774912, which does not make sense.
Per @eustas (in quixdb/squash#113), the brotli decoder doesn't properly handle flushing:
Looks like brotli decoder doesn't support flush... It is forced to dump its internal buffer only when last block is finished.
I would love to see that fixed!
Does the following hold?
(For compression or decompress)
brotli(a + b) = brotli(a) + brotli(b)
I've long been a promonent of integrating LZMA2/LZMA into the browsers because of its incredible effectiveness for compressing binary data streams. When I saw Brotli I thought that this was likely going to be just as good. It isn't actually great.
I am a frequent contributor to both http://ThreeJS.org as well as the http://Clara.io online 3D editor. One of the biggest issues we run into is the size of mesh downloads. Right now we are using LZMA.js scripts to do the decompression in worker threads, but this isn't optimal, especially on mobile.
For example, this real-world large-ish binary trimesh stream, very typical:
The original size once downloaded is 6,779,000 bytes (be careful, this stream may be delivered with "Content-Encoding: gzip".)
Here are the compression results:
Brotli is significantly less effective that LZMA in this case -- not just a little but by a huge margin.
What this means is that we can not replace our LZMA.js scripts with Brotli support. This is pretty bad for us in the 3D community as we are still stuck with JavaScript-based decompression.
Hi,
how can I calculate the best values for lgwin and lgblock?
If I use the default values to compress a buffer with only a few KB, brotli allocates 8 MB for the ringbuffer.
I think this makes no sense, so wouldnt it be a good idea to calculate the two values depending on the size of the data that should be compressed?
thx
I have a ~30GB text file filled with ascii numbers.
If I truncate to first 1GB, brotli outperforms gzip in size about ~20%.
However, if I compress the whole file with quality 1, brotli compressed file is only 8% smaller.
More strange, if I compress the whole file with quality 6, brotli compressed file is actually 11% bigger than gzip.
Any theory what is going on? Thanks.
Hey, issue #151 mentions how dictionary.h exists in two versions in enc/ and dec/ while they perhaps should be identical.
What about other headers like streams.h that exist in both directories but are very different?
I've been putting together a little build setup that creates a libbrotli so that we can write test applications easier against the brotli code and when we install the public headers I would like to put them into the same directory (under brotli/) mostly since using more than one slash for a public header for a library is quite unusual for C and C++ programs.
As I suspect you too might want to do this at some point, or just help my project function, I figure it could be an idea to consider having different headers use different file names even when they are in a different directory in your source tree.
Possibly related to #203
> brotli_decompress(y)
*** caught bus error ***
address ffbfc093, cause 'invalid alignment'
Traceback:
1: .Call(R_brotli_decompress, buf)
2: brotli_decompress(y)
It would be nice to add Brotli to the official Python Package Index, so that users can download it with a simple pip install brotli
.
We could add just the sdist tarball, or also some pre-compiled wheel packages for Windows and Mac platforms, maybe built automatically via Travis and/or AppVeyor -- like here
/cc @khaledhosny
This is with gcc 4.6.3 on Windows. See log file.
there is a problem if one tries to build the Python extension using the pip
installer. The problem is that the setup.py is not located in the root of the repository, but in the python
binding subfolder. The way pip works is to copy the source files to a temporary folder and try to build from there. But since the C/C++ source files are located higher in the repository tree (../enc
, ../dec
) relative to setup.py, then pip does not (cannot?) copy these over to its temporary build folder, and therefore it fails to build the extension.
A solution would be to make a hard copy (instead of a symlink) of the enc
and dec
folders inside the python
subfolder. I believe git can efficiently handle such duplicate files and store them under the same object, as long as they have the same content.
That means one would have to synchronise it every time there is a change. I don't know if anyone has a better solution...
The reason I want to use pip is to allow publishing the Brotli extension to the official Python Package Index (PyPI) repository. Once that is done, one could simply do pip install brotli
to download, build and install the extension. Besides, one could add brotli to the list of dependencies for other packages (e.g. fontTools, etc.).
I wonder whether @khaledhosny has already thought about publishing the Brotli extension to PyPI?
I'm already experimenting in this direction in https://github.com/anthrotype/brotli-wheels
I'm trying to use Travis and Appveyor to automatically build pre-compiled Python wheel packages for Windows and OS X.
Please let me know what you think.
Thank you,
Cosimo
Brotli currently always finds matches using "cache tables": each 4-byte (or 5-byte for lower quality compression) sequence is hashed and placed in a limited-size array of sequences which share the same hash code. This works great in many cases but is not really well suited for large windows, especially in high compression modes.
An alternative which is used in LZMA and some other compressors is to have each hash bucket store a binary tree of sequences which share that hash code. The tree is sorted in two ways: by sequences in lexicographic order, and as a minheap for distance (the shortest distances are at the top). Matches can be found by searching for the current sequence in the tree, while re-rooting the tree.
There are normally two binary tree nodes allocated for each position in the sliding window, so this does require additional memory (8 times the sliding window size in bytes).
I implemented this as a proof of concept to see what would happen.
It does indeed seem to be better; here are some example results with an uncompressed archive of the silesia corpus (211,941,764 bytes) at quality 11:
Current version: compressed to 51,973,280 bytes in 10 mins 10 secs
With binary tree matchfinder: compressed to 51,609,618 bytes in 7 mins 34 secs
The code can be found at: repository https://github.com/ebiggers/brotli, branch "bt_matchfinder". Please feel free to do whatever you want with the code. I've left several TODOS in it.
This is Ubuntu 14.04
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c wrapper.cc -o wrapper.o
* installing *source* package ‘brotli’ ...
** libs
In file included from enc/encode.h:23:0,
from wrapper.cc:3:
enc/./hash.h:104:14: warning: invoking macro length argument 1: empty macro arguments are undefined in ISO C90 and ISO C++98 [enabled by default]
int length() const {
^
enc/./hash.h:109:33: warning: invoking macro length argument 1: empty macro arguments are undefined in ISO C90 and ISO C++98 [enabled by default]
return code ? code : length();
^
In file included from enc/./hash.h:34:0,
from enc/encode.h:23,
from wrapper.cc:3:
enc/././transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
kOmitFirst9 = 20,
^
In file included from wrapper.cc:3:0:
enc/encode.h:54:18: warning: comma at end of enumerator list [-Wpedantic]
MODE_FONT = 2,
^
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/bit_reader.c -o dec/bit_reader.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/decode.c -o dec/decode.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/dictionary.c -o dec/dictionary.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/huffman.c -o dec/huffman.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/state.c -o dec/state.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/streams.c -o dec/streams.o
ar rcs libdec.a dec/bit_reader.o dec/decode.o dec/dictionary.o dec/huffman.o dec/state.o dec/streams.o
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/backward_references.cc -o enc/backward_references.o
In file included from enc/././hash.h:34:0,
from enc/./backward_references.h:22,
from enc/backward_references.cc:17:
enc/./././transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
kOmitFirst9 = 20,
^
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/block_splitter.cc -o enc/block_splitter.o
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/brotli_bit_stream.cc -o enc/brotli_bit_stream.o
enc/brotli_bit_stream.cc:705:42: warning: use of C++0x long long integer constant [-Wlong-long]
uint64_t lenextra = cmd.cmd_extra_ & 0xffffffffffffULL;
^
enc/brotli_bit_stream.cc:811:48: warning: use of C++0x long long integer constant [-Wlong-long]
const uint64_t lenextra = cmd.cmd_extra_ & 0xffffffffffffULL;
^
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/encode.cc -o enc/encode.o
In file included from enc/././hash.h:34:0,
from enc/./encode.h:23,
from enc/encode.cc:17:
enc/./././transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
kOmitFirst9 = 20,
^
In file included from enc/encode.cc:17:0:
enc/./encode.h:54:18: warning: comma at end of enumerator list [-Wpedantic]
MODE_FONT = 2,
^
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/encode_parallel.cc -o enc/encode_parallel.o
In file included from enc/./././hash.h:34:0,
from enc/././encode.h:23,
from enc/./encode_parallel.h:23,
from enc/encode_parallel.cc:17:
enc/././././transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
kOmitFirst9 = 20,
^
In file included from enc/./encode_parallel.h:23:0,
from enc/encode_parallel.cc:17:
enc/././encode.h:54:18: warning: comma at end of enumerator list [-Wpedantic]
MODE_FONT = 2,
^
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/entropy_encode.cc -o enc/entropy_encode.o
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/streams.cc -o enc/streams.o
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/histogram.cc -o enc/histogram.o
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/literal_cost.cc -o enc/literal_cost.o
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/metablock.cc -o enc/metablock.o
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/static_dict.cc -o enc/static_dict.o
In file included from enc/static_dict.cc:8:0:
enc/./transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
kOmitFirst9 = 20,
^
g++ -I/usr/share/R/include -DNDEBUG -Wno-sign-compare -fpic -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g -O3 -Wall -pipe -Wno-unused -pedantic -c enc/utf8_util.cc -o enc/utf8_util.o
ar rcs libenc.a enc/backward_references.o enc/block_splitter.o enc/brotli_bit_stream.o enc/encode.o enc/encode_parallel.o enc/entropy_encode.o enc/streams.o enc/histogram.o enc/literal_cost.o enc/metablock.o enc/static_dict.o enc/utf8_util.o
g++ -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o brotli.so wrapper.o -L. -lenc -ldec -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/brotli/libs
Hi Brotli team,
despite me being one of Google "haters" let me share my 2 cents on current Brotli status.
In next several months I intend to juxtapose several high-performance textual compressors with one goal in mind - showing most balanced ones for high-ratio/high-decompression-speed scenario.
Yesterday I downloaded your 'master' zip and compiled (with several syntactic changes) with Intel v15.0 optimizer.
In my incoming showdown I want to include Brotli wanting to see how it performs in its best environment, I speak textual (mostly English texts) files.
[Question #1:]
Since my goal is to show tightness&decompression-speed top-performers, are following enforced defaults best?
struct BrotliParams {
BrotliParams()
// : mode(MODE_GENERIC),
// quality(11),
// lgwin(22),
// lgblock(0),
// enable_dictionary(true),
// enable_transforms(false),
// greedy_block_split(false),
// enable_context_modeling(true) {}
: mode(MODE_TEXT),
quality(11),
lgwin(24),
lgblock(24),
enable_dictionary(true),
enable_transforms(false),
greedy_block_split(false),
enable_context_modeling(true) {}
It would be very good to make these command line toggleable, no?
[Question #2:]
Your little announcement makes the impression Brotli is something special on text, what do I miss to see that? My quick test shows goodness but not greatness?
The below stats are for your yesterday commit compiled with Intel v15.0 (/O3 used), Brotli outperforms Shifune, but in decompression-speed department 3x is no joke, don't tell me if I use a browser or some English texts full-text browser/searcher Brotli will load 'dickens' faster than Zstd or even Shifune.
D:>bro_Intel15.exe -i dickens -o dickens.brotli -v
Brotli compression speed: 0.200944 MB/s
D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f
Brotli decompression speed: 142.945 MB/s
D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f
Brotli decompression speed: 138.861 MB/s
D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f
Brotli decompression speed: 145.079 MB/s
D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 5
Brotli decompression speed: 144.647 MB/s
D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 5
Brotli decompression speed: 145.513 MB/s
D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 20
Brotli decompression speed: 145.841 MB/s
D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 40
Brotli decompression speed: 144.701 MB/s
D:>Nakamichi_Shifune_branchfull.exe dickens.Nakamichi /report
Nakamichi 'Shifune-Totenschiff', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Note: This compile can handle files up to 1711MB.
Decompressing 3740418 bytes ...
RAM-to-RAM performance: 512 MB/s.
Compression Ratio (bigger-the-better): 2.72:1
D:>dir dic*
09/25/2015 03:32 AM 10,192,446 dickens
09/25/2015 03:29 AM 2,962,118 dickens.brotli
09/08/2015 02:33 AM 3,740,418 dickens.Nakamichi
D:>
The above quick run was done on my Core 2 laptop, on Haswell the 3x may jump up to 5x hands down, hate that I don't have Haswell or alike to share the actual stats.
[Question #3:]
Don't you think that your defaults (encode.h) are too low, I do, my big test shows worse ratio than gzip?
D:>zpaq64 add _Deathship_textual_corpus.tar.method58.zpaq _Deathship_textual_corpus.tar -method 58 -threads 1
D:>bsc e _Deathship_textual_corpus.tar _Deathship_textual_corpus.tar.ST6Block256.bsc -b256 -m6 -cp -Tt
D:>xz -z -k -f -9 -e -v -v --threads=1 _Deathship_textual_corpus.tar
D:>lzturbo.exe -39 -b256 -p0 _Deathship_textual_corpus.tar _Deathship_textual_corpus.tar.256MB.lzturbo12-39.lzt
D:>zpaq64 add _Deathship_textual_corpus.tar.method28.zpaq _Deathship_textual_corpus.tar -method 28 -threads 1
D:>7za a -tgzip -mx9 _Deathship_textual_corpus.tar.zip _Deathship_textual_corpus.tar
D:>bro_Intel15.exe -i _Deathship_textual_corpus.tar -o _Deathship_textual_corpus.tar.brotli -v
D:>zstd.exe _Deathship_textual_corpus.tar
D:>LZ4.exe -9 _Deathship_textual_corpus.tar
09/12/2015 12:59 PM 1,125,281,882 _Deathship_textual_corpus.tar.method58.zpaq
09/12/2015 02:34 AM 1,342,098,184 _Deathship_textual_corpus.tar.ST6Block256.bsc
09/11/2015 11:56 AM 1,471,795,768 _Deathship_textual_corpus.tar.xz
09/13/2015 07:31 PM 1,484,820,599 _Deathship_textual_corpus.tar.256MB.lzturbo12-39.lzt
09/14/2015 09:18 AM 1,800,083,824 _Deathship_textual_corpus.tar.method28.zpaq
Here comes Nakamichi 'Shifune' ...
09/13/2015 06:29 AM 2,181,159,237 _Deathship_textual_corpus.tar.zip
09/24/2015 11:36 PM 2,382,646,308 _Deathship_textual_corpus.tar.brotli
09/13/2015 03:04 AM 2,491,454,533 _Deathship_textual_corpus.tar.zst
09/13/2015 07:50 AM 2,626,828,543 _Deathship_textual_corpus.tar.lz4
09/11/2015 06:41 AM 8,090,119,168 _Deathship_textual_corpus.tar
A glimpse at my unfinished latest benchmark:
www.sanmayce.com/Hayabusa/Deathship_showdown.pdf
www.sanmayce.com/Hayabusa/Nakamichi_Shifune.pdf
[Suggestion #1:]
Your time reports seem problematic, I receive 0 MB/s for big files. Please make Brotli with '-b' benchmark or '-t' test (decompression without dump) ability, Zstd&Z4 have very good report.
Your current speed report includes 'fwrite()' time, I want Brotli's pure RAM-2-RAM performance.
[Suggestion #2:]
Make it compileable with Intel C/C++ optimizer, this will be appreciated by me for one. Current changes in bro.cc (I made) to run it:
#1:
//#include <unistd.h>
#include <time.h>
#include <fcntl.h>
#include <io.h>
#2:
static FILE* OpenInputFile(const char* input_path) {
// if (input_path == 0) {
// return fdopen(STDIN_FILENO, "rb");
// }
/*
tools\bro.cc(136): error: identifier "STDIN_FILENO" is undefined
return fdopen(STDIN_FILENO, "rb");
^
*/
if (input_path == 0) {
setmode(_fileno( stdin ), O_BINARY);
return stdin;
}
// https://msdn.microsoft.com/en-us/library/aa298581%28v=vs.60%29.aspx
/*
int result;
// Set "stdin" to have binary mode:
result = _setmode( _fileno( stdin ), _O_BINARY );
if( result == -1 )
perror( "Cannot set mode" );
else
printf( "'stdin' successfully changed to binary mode\n" );
*/
FILE* f = fopen(input_path, "rb");
if (f == 0) {
perror("fopen");
exit(1);
}
return f;
}
static FILE *OpenOutputFile(const char *output_path, const int force) {
// if (output_path == 0) {
// return fdopen(STDOUT_FILENO, "wb");
// }
/*
tools\bro.cc(148): error: identifier "STDOUT_FILENO" is undefined
return fdopen(STDOUT_FILENO, "wb");
^
*/
if (output_path == 0) {
setmode(_fileno( stdout ), O_BINARY);
return stdout;
}
if (!force) {
struct stat statbuf;
if (stat(output_path, &statbuf) == 0) {
fprintf(stderr, "output file exists\n");
exit(1);
}
}
// int fd = open(output_path, O_CREAT | O_WRONLY | O_TRUNC,
// S_IRUSR | S_IWUSR);
/*
tools\bro.cc(158): error: identifier "S_IRUSR" is undefined
S_IRUSR | S_IWUSR);
^
tools\bro.cc(158): error: identifier "S_IWUSR" is undefined
S_IRUSR | S_IWUSR);
^
*/
FILE* f = fopen(output_path, "wb");
/*
if (fd < 0) {
perror("open");
exit(1);
}
return fdopen(fd, "wb");
*/
if (f == 0) {
perror("fopen");
exit(1);
}
return f;
}
And the actual console dump of how the compilation went:
// The next log/source is modified (for Windows compatibility) Brotli:
/*
D:\brotli-master>type makeEXE.bat
cd dec
icl /O3 /c bit_reader.c decode.c huffman.c state.c streams.c
cd..
cd enc
icl /O3 /c backward_references.cc block_splitter.cc brotli_bit_stream.cc encode.cc encode_parallel.cc entropy_encode.cc histogram.cc literal_cost.cc metablock.cc static_dict.cc streams.cc
cd..
cd tools
icl /O3 bro.cc ..\dec\bit_reader.obj ..\dec\decode.obj ..\dec\huffman.obj ..\dec\state.obj ..\dec\streams.obj ..\enc\backward_references.obj ..\enc\block_splitter.obj ..\enc\brotli_bit_stream.obj ..\enc\encode.obj ..\enc\encode_parallel.obj ..\enc\entropy_encode.obj ..\enc\histogram.obj ..\enc\literal_cost.obj ..\enc\metablock.obj ..\enc\static_dict.obj ..\enc\streams.obj
D:\brotli-master>makeEXE.bat
D:\brotli-master>cd dec
D:\brotli-master\dec>icl /O3 /c bit_reader.c decode.c huffman.c state.c streams.c
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
bit_reader.c
decode.c
huffman.c
state.c
streams.c
D:\brotli-master\dec>cd..
D:\brotli-master>cd enc
D:\brotli-master\enc>icl /O3 /c backward_references.cc block_splitter.cc brotli_bit_stream.cc encode.cc encode_parallel.cc entropy_encode.cc histogram.cc literal_cost.cc metablock.cc static_dict.cc streams.cc
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
backward_references.cc
block_splitter.cc
brotli_bit_stream.cc
encode.cc
encode_parallel.cc
entropy_encode.cc
histogram.cc
literal_cost.cc
metablock.cc
static_dict.cc
streams.cc
D:\brotli-master\enc>cd..
D:\brotli-master>cd tools
D:\brotli-master\tools>icl /O3 bro.cc ..\dec\bit_reader.obj ..\dec\decode.obj ..\dec\huffman.obj ..\dec\state.obj ..\dec\streams.obj ..\enc\backward_references.obj ..\enc\block_splitter.obj ..\enc\brotli_bit_stream.obj ..\enc\encode.obj ..\enc\encode_parallel.obj ..\enc\entropy_encode.obj ..\enc\histogram.obj ..\enc\literal_cost.obj ..\enc\metablock.obj ..\enc\static_dict.obj ..\enc\streams.obj
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation. All rights reserved.
bro.cc
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation. All rights reserved.
-out:bro.exe
bro.obj
..\dec\bit_reader.obj
..\dec\decode.obj
..\dec\huffman.obj
..\dec\state.obj
..\dec\streams.obj
..\enc\backward_references.obj
..\enc\block_splitter.obj
..\enc\brotli_bit_stream.obj
..\enc\encode.obj
..\enc\encode_parallel.obj
..\enc\entropy_encode.obj
..\enc\histogram.obj
..\enc\literal_cost.obj
..\enc\metablock.obj
..\enc\static_dict.obj
..\enc\streams.obj
D:\brotli-master\tools>dir br*.exe
Volume in drive D is S640_Vol5
Volume Serial Number is 5861-9E6C
Directory of D:\brotli-master\tools
09/24/2015 06:56 AM 1,250,304 bro.exe
1 File(s) 1,250,304 bytes
0 Dir(s) 5,917,040,640 bytes free
D:\brotli-master\tools>bro
;
D:\brotli-master\tools>bro /?
Usage: bro [--force] [--quality n] [--decompress] [--input filename] [--output filename] [--repeat iters] [--verbose]
D:\brotli-master\tools>
*/
And a final note, a byte angry, in your promoting paper you say "Decompresses much faster than current LZMA implementations", usually amateurs like me use 2x, 3x or 15x, your much is not good, one would think from 2x to 20x.
Also why don't you mention the current best (IMO) decompressor on INTERNET?! Not mentioning it (LzTurbo) is like disrespecting not only the man behind it but the BEST as a general notion, yes?
Hope you will refine Brotli and make it usable hi-performance console tool.
Regards,
Kaze
What is the license of the paper Comparison of Brotli, Deflate, Zopfli, LZMA, LZHAM and Bzip2 Compression Algorithms? Can I freely distribute this document?
From https://code.google.com/p/chromium/issues/detail?id=452335 where we are exploring how to support brotli as an HTTP transfer-encoding method
Comment Nb. 20:
"Chrome's networking stack is a single thread event loop. To prevent arbitrary data from being buffered in memory, and to get data to consumers as fast as possible, this will need to be rewritten in a way for the caller to call it repeatedly to get the data out of it."
Most files of our project have data that is not text, but delta-encoded floats and shorts, and I think it would greatly benefit from creating a dictionary with the most common strings.
Related #165
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.