Git Product home page Git Product logo

brotli's People

Contributors

ahorek avatar anthrotype avatar aperezdc avatar dependabot[bot] avatar dsnet avatar ende76 avatar eustas avatar fred-wang avatar hyperxpro avatar jyrkialakuijala avatar khaledhosny avatar lvandeve avatar mdejong avatar nemequ avatar nicksay avatar oz1 avatar paulvollmer avatar piotrsikora avatar rockdaboot avatar rojkov avatar rsheeter avatar ryandesign avatar sebmarchand avatar sp1l avatar stephenkyle-arm avatar sullis avatar szabadka avatar tavrez avatar trofi avatar zip753 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

brotli's Issues

BrotliCompressBufferParallel is it allready parallel ?

Hi,

I checked the BrotliCompressBufferParallel function in encode_parallel.cc and can't find
any parallel execution, but it looks more like it was prepared for parallel execution with
openMP. Can you give me an idea where to find the parallel ?

Build failure on Linux with gcc 4.7.2 and brotli 0.2.0

With Python 2.7.3:

python setup.py build
running build
running build_ext
creating build
creating build/temp.linux-x86_64-2.7
creating build/temp.linux-x86_64-2.7/dec
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/bit_reader.c -o build/temp.linux-x86_64-2.7/dec/bit_reader.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/decode.c -o build/temp.linux-x86_64-2.7/dec/decode.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/huffman.c -o build/temp.linux-x86_64-2.7/dec/huffman.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/streams.c -o build/temp.linux-x86_64-2.7/dec/streams.o
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c dec/state.c -o build/temp.linux-x86_64-2.7/dec/state.o
creating build/temp.linux-x86_64-2.7/python
creating build/temp.linux-x86_64-2.7/enc
gcc -pthread -fno-strict-aliasing -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -fPIC -I/usr/include/python2.7 -c python/brotlimodule.cc -o build/temp.linux-x86_64-2.7/python/brotlimodule.o -std=c++0x
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
python/brotlimodule.cc:2:20: fatal error: Python.h: Aucun fichier ou dossier de ce type
compilation terminated.
error: command 'gcc' failed with exit status 1

With Python 3.2.3:

python3 setup.py build
running build
running build_ext
creating build/temp.linux-x86_64-3.2
creating build/temp.linux-x86_64-3.2/dec
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/bit_reader.c -o build/temp.linux-x86_64-3.2/dec/bit_reader.o
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/decode.c -o build/temp.linux-x86_64-3.2/dec/decode.o
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/huffman.c -o build/temp.linux-x86_64-3.2/dec/huffman.o
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/streams.c -o build/temp.linux-x86_64-3.2/dec/streams.o
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c dec/state.c -o build/temp.linux-x86_64-3.2/dec/state.o
creating build/temp.linux-x86_64-3.2/python
creating build/temp.linux-x86_64-3.2/enc
gcc -pthread -DNDEBUG -g -fwrapv -O2 -Wall -Wstrict-prototypes -D_FORTIFY_SOURCE=2 -g -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -fPIC -I/usr/include/python3.2mu -c python/brotlimodule.cc -o build/temp.linux-x86_64-3.2/python/brotlimodule.o -std=c++0x
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++ [enabled by default]
python/brotlimodule.cc:2:20: fatal error: Python.h: Aucun fichier ou dossier de ce type
compilation terminated.
error: command 'gcc' failed with exit status 1

If you need any more details, please ask.
Thanks,

MSVC error: 'pos_' is not a member of 'BrotliBitReader'

after today's commit 94cd708, I'm getting this error when trying to compile the Python extension on Windows. I'm using Microsoft Visual C compiler from Visual Studio 2010.

./dec/bit_reader.h(165) : error C2039: 'pos_' : is not a member of 'BrotliBitReader'
        ./dec/bit_reader.h(39) : see declaration of 'BrotliBitReader'
error: command 'cl.exe' failed with exit status 2

Cheers,

Cosimo

Undefined behavior detected by ubsan

ubsan detects a lot of unaligned stores/loads, as well as a signed integer overflow, and a left shift of a negative value:

/home/nemequ/local/src/squash/plugins/brotli/brotli/enc/./././static_dict.h:57:19: runtime error: left shift of negative value -32
/home/nemequ/local/src/squash/plugins/brotli/brotli/enc/./././static_dict.h:57:29: runtime error: signed integer overflow: -3096880 + -2147483648 cannot be represented in type 'int [25]'
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:150:38: runtime error: store to misaligned address 0x7fff9b36051d for type 'uint64_t', which requires 8 byte alignment
0x7fff9b36051d: note: pointer points here
 59 92 ba 00 00 00 00  f0 4f 22 02 00 00 00 00  f8 4f 22 02 00 00 00 00  70 05 36 9b ff 7f 00 00  22
             ^ 
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:151:42: runtime error: store to misaligned address 0x7fff9b360525 for type 'uint8_t', which requires 8 byte alignment
0x7fff9b360525: note: pointer points here
 00 00 00 00 00 00 00  f8 4f 22 02 00 00 00 00  70 05 36 9b ff 7f 00 00  22 5d 54 8f 98 7f 00 00  80
             ^ 
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:152:43: runtime error: store to misaligned address 0x7fff9b36052d for type 'uint8_t', which requires 8 byte alignment
0x7fff9b36052d: note: pointer points here
 00 00 00 00 00 00 00  70 05 36 9b ff 7f 00 00  22 5d 54 8f 98 7f 00 00  80 05 36 9b ff 7f 00 00  f8
             ^ 
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:153:43: runtime error: store to misaligned address 0x7fff9b360535 for type 'uint8_t', which requires 8 byte alignment
0x7fff9b360535: note: pointer points here
 00 00 00 00 7f 00 00  22 5d 54 8f 98 7f 00 00  80 05 36 9b ff 7f 00 00  f8 4f 22 02 00 00 00 00  b8
             ^ 
/home/nemequ/local/src/squash/plugins/brotli/brotli/dec/./bit_reader.h:188:17: runtime error: load of misaligned address 0x7fff9b3600d5 for type 'const uint64_t', which requires 8 byte alignment
0x7fff9b3600d5: note: pointer points here
 ab b8 8f 65 ed 9c e7  0f 83 d2 75 fa fc b0 e8  37 25 3e 42 c1 d4 85 c5  8f a0 11 8f 16 90 87 8f  90
             ^

Python binding: add streaming compression / decompression support.

I'm trying to implement streaming compression using brotli. There's very little documentation and I haven't found anyone else trying to accomplish same thing. I assume I need to specify lgwin to set the block size, but when I run this code it gives me an error "brotli.error: BrotliDecompress failed"

my_str = 'very long string blah blah blah'
a = brotli.compress(my_str, lgwin=16)
b = brotli.decompress(a[0:16])

Am I doing something wrong or did I misunderstand the purpose of the lgwin parameter?

static kBitCostThreshold affects compression ratio

encode.cc:392

      static const int kSampleRate = 13;
      static const double kMinEntropy = 7.92;
      static const double kBitCostThreshold = bytes * kMinEntropy / kSampleRate;

kBitCostThreshold is cached once initialized.
However, if brotli library is reused within the same process, first compress a small piece of data, then a large one will suffer from poor compression ratio, since some block will be misleadingly uncompressed.

build the lib on Mac

Hi.
I'm trying to build the lib using "python setup.py build" and I can see lib is placed at "build/lib.macosx-10.10-intel-2.7/brotli.so", however, I think .so file is not usable in MacOS, I expect to see a .dylib file, is this a problem? how to solve it please?

Thanks a lot!

Decompression failures with uncompressable data

Seems to happen with data with a length of a power of two >= 32. 8 and 16 work, 32, 64, 128, and 256 don't. Test case:

#include <stdint.h>
#include <assert.h>

#include "enc/encode.h"
#include "dec/decode.h"

const uint8_t uncompressed[] = {
  0x01, 0x4a, 0x2d, 0x82, 0x59, 0x2f, 0xdd, 0xe6,
  0x4f, 0x69, 0x6c, 0x70, 0x13, 0x87, 0x9f, 0xe4,
  0xd3, 0xb1, 0x9a, 0x86, 0xeb, 0x31, 0xa9, 0x69,
  0x1d, 0x2b, 0x22, 0x60, 0x38, 0xaf, 0xb2, 0x27,
  0xee, 0x48, 0x9e, 0x67, 0x3d, 0x5f, 0xca, 0xfa,
  0x68, 0x74, 0x03, 0x6f, 0x03, 0x5c, 0xcb, 0x45,
  0x1c, 0xfc, 0xc1, 0x4b, 0x1d, 0xb1, 0x2b, 0x7b,
  0x87, 0xf2, 0xf4, 0xea, 0xc4, 0x34, 0x93, 0x75,
  0xb5, 0x45, 0xa0, 0x70, 0x77, 0xe9, 0xe3, 0xe3,
  0xe9, 0xf9, 0x36, 0x80, 0x2c, 0x3b, 0x19, 0xab,
  0x46, 0xe2, 0xeb, 0x16, 0xf9, 0x4c, 0xac, 0x03,
  0x42, 0x8c, 0x25, 0x3a, 0x9e, 0x68, 0xc7, 0x26,
  0xce, 0x45, 0x1f, 0x3f, 0xc6, 0x24, 0x38, 0x01,
  0xb2, 0x1a, 0x4f, 0x25, 0xd3, 0x7c, 0x5f, 0x37,
  0xbc, 0x6b, 0x3d, 0xb1, 0x1d, 0x76, 0xc5, 0xb9,
  0xae, 0xbd, 0x4c, 0x67, 0x87, 0xd9, 0xd0, 0x58,
  0xe9, 0x42, 0x4e, 0x32, 0xbf, 0x83, 0xfd, 0xad,
  0x63, 0x88, 0x7c, 0x9d, 0xd3, 0x25, 0x9e, 0xe0,
  0x44, 0x4a, 0xbf, 0x88, 0xc4, 0x46, 0x24, 0x21,
  0xf3, 0x35, 0xee, 0xa9, 0xc4, 0x6f, 0xdf, 0xe5,
  0xef, 0xc5, 0x13, 0x25, 0x3f, 0x33, 0x1e, 0x54,
  0x45, 0x79, 0xc0, 0x5e, 0x67, 0x4e, 0x7b, 0xa7,
  0xe1, 0xe8, 0x7c, 0xe6, 0x5a, 0x7a, 0x20, 0x4f,
  0x1b, 0xf2, 0xe9, 0x6a, 0x8e, 0xfc, 0x23, 0xfd,
  0x5f, 0x47, 0x24, 0x9c, 0xe4, 0xa9, 0x0c, 0x3d,
  0x8f, 0x9d, 0x81, 0x46, 0x25, 0x2d, 0x43, 0x49,
  0xe2, 0xcc, 0x98, 0x8d, 0x14, 0x2b, 0x17, 0xc2,
  0x1b, 0xd3, 0x03, 0x13, 0x8d, 0x72, 0x76, 0x96,
  0xce, 0x23, 0x93, 0xee, 0x30, 0x7a, 0xe3, 0x74,
  0xb6, 0x28, 0xc4, 0xfc, 0xb6, 0x3e, 0xf9, 0xe0,
  0x9a, 0x88, 0x86, 0x5b, 0xfc, 0xc0, 0x0b, 0xdf,
  0xa4, 0xfb, 0x79, 0x0f, 0x95, 0x12, 0xaf, 0x43
};

int main(int argc, char *argv[]) {
  uint8_t compressed[sizeof(uncompressed) + 4];
  size_t compressed_length = sizeof(compressed);
  uint8_t decompressed[sizeof(uncompressed)];
  size_t decompressed_length = sizeof(decompressed);

  {
    int result;
    brotli::BrotliParams params;
    params.quality = 11;
    params.mode = brotli::BrotliParams::MODE_GENERIC;
    params.enable_transforms = false;

    result = brotli::BrotliCompressBuffer (params,
                       sizeof(uncompressed), uncompressed,
                       &compressed_length, compressed);

    assert (result == 1);

    fprintf (stderr, "Compressed %zu bytes to %zu bytes\n", sizeof(uncompressed), compressed_length);
  }

  {
    BrotliResult result;

    result = BrotliDecompressBuffer (compressed_length, compressed,
                     &decompressed_length, decompressed);

    assert (result == BROTLI_RESULT_SUCCESS);

    fprintf (stderr, "Decompressed %zu bytes to %zu bytes\n", compressed_length, decompressed_length);
  }

  return 0;
}

Clarify unordered_map file warning

I was trying to use google/woff2 on a machine running OS X 10.8 and got the following error:

./brotli/enc/./static_dict.h:21:10: fatal error: 'unordered_map' file not found

It turns out that the version of gcc with Xcode 5 is too low, I had to upgrade it manually from 4.2 to 4.7. If it’s relatively easy to clarify what that error message is about, I think it could be helpful. To be fair, I don’t really know anything about this stuff though, so feel free to disregard as well.

Thanks!

Compiler warnings

I am writing bindings which should be portable across platforms and work with older compilers. Is there a way to fix the warnings below (without suppressing them)?

clang++ -I/Library/Frameworks/R.framework/Resources/include -DNDEBUG -Ienc -Idec -I/usr/local/include -I/usr/local/include/freetype2 -I/opt/X11/include   -std=c++11 -fPIC  -Wall -mtune=core2 -g -O2  -c enc/metablock.cc -o enc/metablock.o
In file included from enc/metablock.cc:18:
enc/./metablock.h:28:1: warning: 'BlockSplit' defined as a struct here but previously declared as a class [-Wmismatched-tags]
struct BlockSplit {
^
enc/./histogram.h:30:1: note: did you mean struct here?
class BlockSplit;
^~~~~
struct

Also a few warnings about c++11 extensions. Is there a way to work around these for older compilers?

In file included from enc/encode_parallel.cc:17:
In file included from enc/./encode_parallel.h:25:
In file included from enc/./encode.h:28:
enc/./streams.h:59:46: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
  const void* Read(size_t n, size_t* OUTPUT) override;
                                             ^
enc/./streams.h:77:41: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
  bool Write(const void* buf, size_t n) override;
                                        ^
enc/./streams.h:95:41: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
  bool Write(const void* buf, size_t n) override;
                                        ^
enc/./streams.h:108:50: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
  const void* Read(size_t n, size_t* bytes_read) override;
                                                 ^
enc/./streams.h:121:41: warning: 'override' keyword is a C++11 extension [-Wc++11-extensions]
  bool Write(const void* buf, size_t n) override;
                                        ^

On Ubuntu 14.04:

g++ -I/usr/share/R/include -DNDEBUG -Ienc -Idec     -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -c enc/backward_references.cc -o enc/backward_references.o
In file included from enc/./backward_references.h:23:0,
                from enc/backward_references.cc:17:
enc/././hash.h:673:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H1> hash_h1;
  ^
enc/././hash.h:674:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H2> hash_h2;
  ^
enc/././hash.h:675:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H3> hash_h3;
  ^
enc/././hash.h:676:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H4> hash_h4;
  ^
enc/././hash.h:677:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H5> hash_h5;
  ^
enc/././hash.h:678:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H6> hash_h6;
  ^
enc/././hash.h:679:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H7> hash_h7;
  ^
enc/././hash.h:680:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H8> hash_h8;
  ^
enc/././hash.h:681:3: error: ‘unique_ptr’ in namespace ‘std’ does not name a type
  std::unique_ptr<H9> hash_h9;
  ^
enc/././hash.h: In member function ‘void brotli::Hashers::Init(int)’:
enc/././hash.h:636:15: error: ‘hash_h1’ was not declared in this scope
      case 1: hash_h1.reset(new H1); break;
              ^
enc/././hash.h:637:15: error: ‘hash_h2’ was not declared in this scope
      case 2: hash_h2.reset(new H2); break;
              ^
enc/././hash.h:638:15: error: ‘hash_h3’ was not declared in this scope
      case 3: hash_h3.reset(new H3); break;
              ^
enc/././hash.h:639:15: error: ‘hash_h4’ was not declared in this scope
      case 4: hash_h4.reset(new H4); break;
              ^
enc/././hash.h:640:15: error: ‘hash_h5’ was not declared in this scope
      case 5: hash_h5.reset(new H5); break;
              ^
enc/././hash.h:641:15: error: ‘hash_h6’ was not declared in this scope
      case 6: hash_h6.reset(new H6); break;
              ^
enc/././hash.h:642:15: error: ‘hash_h7’ was not declared in this scope
      case 7: hash_h7.reset(new H7); break;
              ^
enc/././hash.h:643:15: error: ‘hash_h8’ was not declared in this scope
      case 8: hash_h8.reset(new H8); break;
              ^
enc/././hash.h:644:15: error: ‘hash_h9’ was not declared in this scope
      case 9: hash_h9.reset(new H9); break;
              ^

Inconsistent range of lgwin values

In the Python version, the allowed range of the Window is 16 to 24:
PyErr_SetString(BrotliError, "Invalid lgwin. Range is 16 to 24.");

params.add_argument('--lgwin', metavar="LGWIN", type=int, choices=list(range(16, 25)), help='Base 2 logarithm of the sliding window size. Range is ' '16 to 24. Defaults to 22.')

But elsewhere, we see a different limit:

// Base 2 logarithm of the sliding window size. Range is 10 to 24. int lgwin;

if (*lgwin < 10 || *lgwin >= 25) { goto error;

Is this expected?

Separate the public headers

IMO, it would be cleaner to move the public headers to a separate include directory, and/or give them less generic names like brotli-decode.h and brotli-encode.h, to streamline the use of the library.

Crashing with std::bad_alloc

I was trying to compress a 30GB sqlite file with data from wikipedia on a machine with 1GB of memory.
And the command crash with std::bad_alloc on quality 11, 10 and 9.

Command was running on Debian 8.1 x64 compiled with G++ 4.9.2

Decode MetaBlock Multithreading

More an idea as an issue.
Every MetaBlock could decode standalone in a thread.

Now an issue.
The blocksize is stored in meta-block header. (see spec. 9.2.) But only the final uncompressed size of block. The compressed size length isn't stored. That is a problem. Because the next thread will not know the position to start parallel the decoding of next block.

Solution:
Include a store information about compression block size.

Build Failure on Linux with Python 3.3 and gcc 4.6.3

Is this due to having an older version of GCC?

~/src/brotli$ python3 setup.py build
running build
running build_ext
creating build
creating build/temp.linux-x86_64-3.3
creating build/temp.linux-x86_64-3.3/dec
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/bit_reader.c -o build/temp.linux-x86_64-3.3/dec/bit_reader.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/decode.c -o build/temp.linux-x86_64-3.3/dec/decode.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/huffman.c -o build/temp.linux-x86_64-3.3/dec/huffman.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/streams.c -o build/temp.linux-x86_64-3.3/dec/streams.o
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c dec/state.c -o build/temp.linux-x86_64-3.3/dec/state.o
creating build/temp.linux-x86_64-3.3/python
creating build/temp.linux-x86_64-3.3/enc
gcc -pthread -Wno-unused-result -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/local/include/python3.3m -c python/brotlimodule.cc -o build/temp.linux-x86_64-3.3/python/brotlimodule.o -std=c++0x
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for Ada/C/ObjC but not for C++ [enabled by default]
In file included from python/../enc/encode.h:28:0,
                 from python/brotlimodule.cc:4:
python/../enc/./streams.h:59:44: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:59:46: error: ‘override’ does not name a type
python/../enc/./streams.h:77:39: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:77:41: error: ‘override’ does not name a type
python/../enc/./streams.h:95:39: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:95:41: error: ‘override’ does not name a type
python/../enc/./streams.h:108:48: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:108:50: error: ‘override’ does not name a type
python/../enc/./streams.h:121:39: error: expected ‘;’ at end of member declaration
python/../enc/./streams.h:121:41: error: ‘override’ does not name a type
error: command 'gcc' failed with exit status 1
~/src/brotli$ gcc --version
gcc (Ubuntu/Linaro 4.6.3-1ubuntu5) 4.6.3
Copyright (C) 2011 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

Issues uncovered by Coverity Scan

This is with revision d811b18. I can't really reproduce the full report here, but here is a summary of each item it found.

  • enc/histogram.h:37: Non-static class member bit_cost_ is not initialized in this constructor nor in any functions that it calls.
  • enc/command.h:106: Non-static class members insert_len_, copy_len_, cmd_prefix_, dist_prefix_, cmd_extra_, and dist_extra_ are not initialized in this constructor nor in any functions that it calls.
  • enc/brotli_bit_stream.cc:343: index is passed to a parameter that cannot be negative. IndexOf can return -1
  • enc/literal_cost.cc:102: Execution cannot reach the expression 0 inside this statement: last_c = ((i + window_half …
    • at_least: At condition i + window_half - 2 < 0, the value of i must be at least 0.
    • const: At condition i + window_half - 2 < 0, the value of window_half must be equal to 495.
    • dead_error_condition: The condition i + window_half - 2 < 0 cannot be true.
  • enc/prefix.h:75: In expression distance_code >> bucket, shifting by a negative amount has undefined behavior. The shift amount, bucket, is -2.
    • return_constant: Function call brotli::Log2Floor(distance_code) returns -1.
    • assignment: Assigning: bucket = brotli::Log2Floor(distance_code) - 1. The value of bucket is now -2.
  • enc/entropy_encode.cc:45: Non-static class members total_count_, index_left_, and index_right_or_value_ are not initialized in this constructor nor in any functions that it calls.
  • enc/backward_references.cc:157: The compiler-generated constructor for this class does not initialize min_cost_cmd_
  • enc/encode.cc:207: Non-static class member literal_cost_mask_ is not initialized in this constructor nor in any functions that it calls.
  • dec/huffman.c:142: Using uninitialized value sorted[symbol++] dec/huffman.c:142
  • dec/decode.c:871: Using uninitialized value s.loop_counter when calling BrotliDecompressStreaming
  • enc/metablock.cc:496: Using tainted variable context as an index to pointer static_context_map
    enc/static_dict.cc:391: data[0] == '\xc2' is always false regardless of the values of its operands. This occurs as the logical operand of if.

If you need more details about an item let me know. I only provided the full information on the one item because it is a bit difficult to follow without the it, but I think the others should be pretty easy to figure out.

Proper build system

Can you guys use a proper build system, cmake or even a plain makefile?

At least so that we can get a single .so/.dylib and be able to easily link against it via -lbrotli or something.

brotli extension not linked with libstdc++ on pypy3 for Linux

There is an issue with the Python 3-compatible pypy (aka pypy3) whereby distutils fails to use the proper C++ compiler/linker while compiling C++ extension modules (such as Brotli).

https://bitbucket.org/pypy/pypy/issues/1763/not-using-proper-c-compilers-linker-while

In the Python2.7-compatible pypy, they have fixed this by patching distutils/unixccompiler.py so that it uses c++ as default C++ compiler instead of cc:
https://bitbucket.org/pypy/pypy/commits/c6e45dfbda905fa9e626782c8d2dd313ff3f54cf

However, they haven't ported this patch to pypy3 yet.

At behdad/fonttools, we use Brotli for WOFF2 and we test it on Travis under different python versions, including pypy3.

Because the C++ runtime library isn't being linked when compiling Brotli for pypy3, the module fails to be loaded with undefined symbol error.
For details, see: fonttools/fonttools#339

As far as I could test, this problem only occurs on pypy3 for Linux (the Travis python setup runs on Ubuntu 12.04). On OS X, where distutils.unixccompiler is also used, I verified that brotli is compiled and imported without problems when using the same pypy3 version (2.4.0) as the one used on Travis -- but I guess it's because on OSX the name cc is just a symbolic link to clang, and the latter does the right thing.

As a workaround, we currently do something like this:

python setup.py build_ext --libraries=stdc++

While pypy developers fixes the issue, Brotli's setup.py could be modified to link with libstdc++ by default whenever platform.python_implementation() == "PyPy" and sys.version_info[0] == 3. (I wonder if there would be portability issues if libstdc++ were linked all the time, on any python platform/version/implementation?)

A second alternative would be to change the value of UnixCCompiler.compiler_cxx variable from the current ['cc'] to ['c++'] if running on pypy3 (exactly like in the above mentioned pypy2 patch). This is what they do in spaCy for example (I found this by chance).

Now, both approaches seem to work, though I'm not sure which one is preferable.

A third approach, of course, is simply take note and wait.

Anyway, I have mentioned this on pypy's bug tracker, where I found at least two duplicate issues (#1099 and #1763) referencing the same problem.

Please let me know if anyone else has encountered the same problem.
Thanks,

Cosimo

should the Python module's version always match the library version?

v0.2.0 has been tagged, but the Python package version as stored in python/brotlimodule.cc still says "0.1.0" (see here).
This version string is read by the setup.py, and stored in the package metadata upon installing.

I wonder if the Python module's version should also increase every time a Brotli update is tagged, or if language bindings should have their own version numbers, independent from the core library.

WDYT?

golang support

It would be nice to have some wrapper or native implementation for golang.

vector.data() not supported on MS VC++ 9.0

std::vector::data is defined in C+11, so Microsoft Visual C++ 9.0 (aka Visual Studio 2008) does not support it and raises the following error:

python/brotlimodule.cc(194) : error C2039: 'data' : is not a member of 'std::vector<_Ty>'
        with
        [
            _Ty=uint8_t
        ]

The (uglier) method &vector[0] could be used instead of vector.data() to get the pointer to the underlying array (provided it isn't empty).

As you know, Python 2.7 for Windows is still compiled with Visual Studio 2008, so all extension modules should in theory be compiled using the same MSVC compiler version. After VS2008 was discontinued, Microsoft released an ad-hoc "Visual C++ Compiler for Python 2.7" (http://aka.ms/vcpython27) meant to be used for compiling extension for Windows Python 2.7.

In the current Brotli's setup.py, I added a patch to force distutils use Visual Studio 2010 for Python 2.7. This seems to work well so far, despite many people warn against mixing different C runtimes versions between the interpreter and the extension modules.

Since you recently added support for C++98, it'd be nice to also try support the old Microsoft compiler.

Thank you.

Different dictionary.h files?

Shouldn't \dec\dictionary.h and \enc\dictionary.h be identical?

https://github.com/google/brotli/blob/master/enc/dictionary.h looks like it may be out-of-date; it doesn't include, e.g. the

#if defined(__cplusplus) || defined(c_plusplus)
}    /* extern "C" */
#endif

...block at the end, for instance.

(Incidentally, declaration of the constant in a header file is apparently frowned upon per people smarter than me. Naively building brotli.exe with VS2015 results in 6 copies of the constant array in the resulting executable.)

Very poor compression ratio on TriMesh binary streams compared to LZMA

I've long been a promonent of integrating LZMA2/LZMA into the browsers because of its incredible effectiveness for compressing binary data streams. When I saw Brotli I thought that this was likely going to be just as good. It isn't actually great.

I am a frequent contributor to both http://ThreeJS.org as well as the http://Clara.io online 3D editor. One of the biggest issues we run into is the size of mesh downloads. Right now we are using LZMA.js scripts to do the decompression in worker threads, but this isn't optimal, especially on mobile.

For example, this real-world large-ish binary trimesh stream, very typical:

https://d3ijcvgxwtkjmf.cloudfront.net/a4c3c7313b7bdeb68ad46a7e1b761f38z?filename=object-53-batman-tumbler-lw8-12.bingeom

The original size once downloaded is 6,779,000 bytes (be careful, this stream may be delivered with "Content-Encoding: gzip".)

Here are the compression results:

  • LZMA
    • Normal: 921,600 bytes
    • Ultra: 920,147 bytes
  • GZip
    • Normal: 2,296,362 bytes
    • Ultra: 2,258,967 bytes
  • Brotli
    • Normal and Ultra: 1,513,459 bytes. (source)

Brotli is significantly less effective that LZMA in this case -- not just a little but by a huge margin.

What this means is that we can not replace our LZMA.js scripts with Brotli support. This is pretty bad for us in the 3D community as we are still stuck with JavaScript-based decompression.

Best values for lgwin and lgblock?

Hi,

how can I calculate the best values for lgwin and lgblock?
If I use the default values to compress a buffer with only a few KB, brotli allocates 8 MB for the ringbuffer.
I think this makes no sense, so wouldnt it be a good idea to calculate the two values depending on the size of the data that should be compressed?

thx

Compression ratio is better after tweaking quality 6 -> 1 for big files

I have a ~30GB text file filled with ascii numbers.
If I truncate to first 1GB, brotli outperforms gzip in size about ~20%.
However, if I compress the whole file with quality 1, brotli compressed file is only 8% smaller.
More strange, if I compress the whole file with quality 6, brotli compressed file is actually 11% bigger than gzip.

Any theory what is going on? Thanks.

Indentical header names, different contents

Hey, issue #151 mentions how dictionary.h exists in two versions in enc/ and dec/ while they perhaps should be identical.

What about other headers like streams.h that exist in both directories but are very different?

I've been putting together a little build setup that creates a libbrotli so that we can write test applications easier against the brotli code and when we install the public headers I would like to put them into the same directory (under brotli/) mostly since using more than one slash for a public header for a library is quite unusual for C and C++ programs.

As I suspect you too might want to do this at some point, or just help my project function, I figure it could be an idea to consider having different headers use different file names even when they are in a different directory in your source tree.

Decompress segfaults on Solaris Sparc

Possibly related to #203

> brotli_decompress(y)

 *** caught bus error ***
address ffbfc093, cause 'invalid alignment'

Traceback:
 1: .Call(R_brotli_decompress, buf)
 2: brotli_decompress(y)

add brotli to PyPI repository

It would be nice to add Brotli to the official Python Package Index, so that users can download it with a simple pip install brotli.

We could add just the sdist tarball, or also some pre-compiled wheel packages for Windows and Mac platforms, maybe built automatically via Travis and/or AppVeyor -- like here

/cc @khaledhosny

support pip installation

there is a problem if one tries to build the Python extension using the pip installer. The problem is that the setup.py is not located in the root of the repository, but in the python binding subfolder. The way pip works is to copy the source files to a temporary folder and try to build from there. But since the C/C++ source files are located higher in the repository tree (../enc, ../dec) relative to setup.py, then pip does not (cannot?) copy these over to its temporary build folder, and therefore it fails to build the extension.

A solution would be to make a hard copy (instead of a symlink) of the enc and dec folders inside the python subfolder. I believe git can efficiently handle such duplicate files and store them under the same object, as long as they have the same content.

That means one would have to synchronise it every time there is a change. I don't know if anyone has a better solution...

The reason I want to use pip is to allow publishing the Brotli extension to the official Python Package Index (PyPI) repository. Once that is done, one could simply do pip install brotli to download, build and install the extension. Besides, one could add brotli to the list of dependencies for other packages (e.g. fontTools, etc.).

I wonder whether @khaledhosny has already thought about publishing the Brotli extension to PyPI?

I'm already experimenting in this direction in https://github.com/anthrotype/brotli-wheels
I'm trying to use Travis and Appveyor to automatically build pre-compiled Python wheel packages for Windows and OS X.

Please let me know what you think.
Thank you,

Cosimo

Improved matchfinder for high quality compression

Brotli currently always finds matches using "cache tables": each 4-byte (or 5-byte for lower quality compression) sequence is hashed and placed in a limited-size array of sequences which share the same hash code. This works great in many cases but is not really well suited for large windows, especially in high compression modes.

An alternative which is used in LZMA and some other compressors is to have each hash bucket store a binary tree of sequences which share that hash code. The tree is sorted in two ways: by sequences in lexicographic order, and as a minheap for distance (the shortest distances are at the top). Matches can be found by searching for the current sequence in the tree, while re-rooting the tree.

There are normally two binary tree nodes allocated for each position in the sliding window, so this does require additional memory (8 times the sliding window size in bytes).

I implemented this as a proof of concept to see what would happen.

It does indeed seem to be better; here are some example results with an uncompressed archive of the silesia corpus (211,941,764 bytes) at quality 11:

Current version: compressed to 51,973,280 bytes in 10 mins 10 secs
With binary tree matchfinder: compressed to 51,609,618 bytes in 7 mins 34 secs

The code can be found at: repository https://github.com/ebiggers/brotli, branch "bt_matchfinder". Please feel free to do whatever you want with the code. I've left several TODOS in it.

Pedantic warnings under gnu99

This is Ubuntu 14.04

g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c wrapper.cc -o wrapper.o
* installing *source* package ‘brotli’ ...
** libs
In file included from enc/encode.h:23:0,
                 from wrapper.cc:3:
enc/./hash.h:104:14: warning: invoking macro length argument 1: empty macro arguments are undefined in ISO C90 and ISO C++98 [enabled by default]
   int length() const {
              ^
enc/./hash.h:109:33: warning: invoking macro length argument 1: empty macro arguments are undefined in ISO C90 and ISO C++98 [enabled by default]
     return code ? code : length();
                                 ^
In file included from enc/./hash.h:34:0,
                 from enc/encode.h:23,
                 from wrapper.cc:3:
enc/././transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
   kOmitFirst9     = 20,
                       ^
In file included from wrapper.cc:3:0:
enc/encode.h:54:18: warning: comma at end of enumerator list [-Wpedantic]
     MODE_FONT = 2,
                  ^
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/bit_reader.c -o dec/bit_reader.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/decode.c -o dec/decode.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/dictionary.c -o dec/dictionary.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/huffman.c -o dec/huffman.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/state.c -o dec/state.o
gcc -std=gnu99 -I/usr/share/R/include -DNDEBUG      -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -pedantic -std=gnu99 -c dec/streams.c -o dec/streams.o
ar rcs libdec.a dec/bit_reader.o dec/decode.o dec/dictionary.o dec/huffman.o dec/state.o dec/streams.o
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/backward_references.cc -o enc/backward_references.o
In file included from enc/././hash.h:34:0,
                 from enc/./backward_references.h:22,
                 from enc/backward_references.cc:17:
enc/./././transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
   kOmitFirst9     = 20,
                       ^
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/block_splitter.cc -o enc/block_splitter.o
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/brotli_bit_stream.cc -o enc/brotli_bit_stream.o
enc/brotli_bit_stream.cc:705:42: warning: use of C++0x long long integer constant [-Wlong-long]
     uint64_t lenextra = cmd.cmd_extra_ & 0xffffffffffffULL;
                                          ^
enc/brotli_bit_stream.cc:811:48: warning: use of C++0x long long integer constant [-Wlong-long]
     const uint64_t lenextra = cmd.cmd_extra_ & 0xffffffffffffULL;
                                                ^
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/encode.cc -o enc/encode.o
In file included from enc/././hash.h:34:0,
                 from enc/./encode.h:23,
                 from enc/encode.cc:17:
enc/./././transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
   kOmitFirst9     = 20,
                       ^
In file included from enc/encode.cc:17:0:
enc/./encode.h:54:18: warning: comma at end of enumerator list [-Wpedantic]
     MODE_FONT = 2,
                  ^
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/encode_parallel.cc -o enc/encode_parallel.o
In file included from enc/./././hash.h:34:0,
                 from enc/././encode.h:23,
                 from enc/./encode_parallel.h:23,
                 from enc/encode_parallel.cc:17:
enc/././././transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
   kOmitFirst9     = 20,
                       ^
In file included from enc/./encode_parallel.h:23:0,
                 from enc/encode_parallel.cc:17:
enc/././encode.h:54:18: warning: comma at end of enumerator list [-Wpedantic]
     MODE_FONT = 2,
                  ^
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/entropy_encode.cc -o enc/entropy_encode.o
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/streams.cc -o enc/streams.o
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/histogram.cc -o enc/histogram.o
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/literal_cost.cc -o enc/literal_cost.o
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/metablock.cc -o enc/metablock.o
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/static_dict.cc -o enc/static_dict.o
In file included from enc/static_dict.cc:8:0:
enc/./transform.h:47:23: warning: comma at end of enumerator list [-Wpedantic]
   kOmitFirst9     = 20,
                       ^
g++ -I/usr/share/R/include -DNDEBUG     -Wno-sign-compare -fpic  -g -O2 -fstack-protector --param=ssp-buffer-size=4 -Wformat -Werror=format-security -D_FORTIFY_SOURCE=2 -g  -O3 -Wall -pipe -Wno-unused -pedantic  -c enc/utf8_util.cc -o enc/utf8_util.o
ar rcs libenc.a enc/backward_references.o enc/block_splitter.o enc/brotli_bit_stream.o enc/encode.o enc/encode_parallel.o enc/entropy_encode.o enc/streams.o enc/histogram.o enc/literal_cost.o enc/metablock.o enc/static_dict.o enc/utf8_util.o
g++ -shared -L/usr/lib/R/lib -Wl,-Bsymbolic-functions -Wl,-z,relro -o brotli.so wrapper.o -L. -lenc -ldec -L/usr/lib/R/lib -lR
installing to /usr/local/lib/R/site-library/brotli/libs

Incompilable with Intel, few tweaks and options miissing

Hi Brotli team,
despite me being one of Google "haters" let me share my 2 cents on current Brotli status.

In next several months I intend to juxtapose several high-performance textual compressors with one goal in mind - showing most balanced ones for high-ratio/high-decompression-speed scenario.

Yesterday I downloaded your 'master' zip and compiled (with several syntactic changes) with Intel v15.0 optimizer.

In my incoming showdown I want to include Brotli wanting to see how it performs in its best environment, I speak textual (mostly English texts) files.

[Question #1:]
Since my goal is to show tightness&decompression-speed top-performers, are following enforced defaults best?

struct BrotliParams {
BrotliParams()
// : mode(MODE_GENERIC),
// quality(11),
// lgwin(22),
// lgblock(0),
// enable_dictionary(true),
// enable_transforms(false),
// greedy_block_split(false),
// enable_context_modeling(true) {}

  : mode(MODE_TEXT),
    quality(11),
    lgwin(24),
    lgblock(24),
    enable_dictionary(true),
    enable_transforms(false),
    greedy_block_split(false),
    enable_context_modeling(true) {}

It would be very good to make these command line toggleable, no?

[Question #2:]
Your little announcement makes the impression Brotli is something special on text, what do I miss to see that? My quick test shows goodness but not greatness?
The below stats are for your yesterday commit compiled with Intel v15.0 (/O3 used), Brotli outperforms Shifune, but in decompression-speed department 3x is no joke, don't tell me if I use a browser or some English texts full-text browser/searcher Brotli will load 'dickens' faster than Zstd or even Shifune.

D:>bro_Intel15.exe -i dickens -o dickens.brotli -v
Brotli compression speed: 0.200944 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f
Brotli decompression speed: 142.945 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f
Brotli decompression speed: 138.861 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f
Brotli decompression speed: 145.079 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 5
Brotli decompression speed: 144.647 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 5
Brotli decompression speed: 145.513 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 20
Brotli decompression speed: 145.841 MB/s

D:>bro_Intel15.exe -i dickens.brotli -o dickens -v -d -f -r 40
Brotli decompression speed: 144.701 MB/s

D:>Nakamichi_Shifune_branchfull.exe dickens.Nakamichi /report
Nakamichi 'Shifune-Totenschiff', written by Kaze, based on Nobuo Ito's LZSS source, babealicious suggestion by m^2 enforced, muffinesque suggestion by Jim Dempsey enforced.
Note: This compile can handle files up to 1711MB.
Decompressing 3740418 bytes ...
RAM-to-RAM performance: 512 MB/s.
Compression Ratio (bigger-the-better): 2.72:1

D:>dir dic*

09/25/2015 03:32 AM 10,192,446 dickens
09/25/2015 03:29 AM 2,962,118 dickens.brotli
09/08/2015 02:33 AM 3,740,418 dickens.Nakamichi

D:>

The above quick run was done on my Core 2 laptop, on Haswell the 3x may jump up to 5x hands down, hate that I don't have Haswell or alike to share the actual stats.

[Question #3:]
Don't you think that your defaults (encode.h) are too low, I do, my big test shows worse ratio than gzip?

D:>zpaq64 add _Deathship_textual_corpus.tar.method58.zpaq _Deathship_textual_corpus.tar -method 58 -threads 1
D:>bsc e _Deathship_textual_corpus.tar _Deathship_textual_corpus.tar.ST6Block256.bsc -b256 -m6 -cp -Tt
D:>xz -z -k -f -9 -e -v -v --threads=1 _Deathship_textual_corpus.tar
D:>lzturbo.exe -39 -b256 -p0 _Deathship_textual_corpus.tar _Deathship_textual_corpus.tar.256MB.lzturbo12-39.lzt
D:>zpaq64 add _Deathship_textual_corpus.tar.method28.zpaq _Deathship_textual_corpus.tar -method 28 -threads 1
D:>7za a -tgzip -mx9 _Deathship_textual_corpus.tar.zip _Deathship_textual_corpus.tar
D:>bro_Intel15.exe -i _Deathship_textual_corpus.tar -o _Deathship_textual_corpus.tar.brotli -v
D:>zstd.exe _Deathship_textual_corpus.tar
D:>LZ4.exe -9 _Deathship_textual_corpus.tar

09/12/2015 12:59 PM 1,125,281,882 _Deathship_textual_corpus.tar.method58.zpaq
09/12/2015 02:34 AM 1,342,098,184 _Deathship_textual_corpus.tar.ST6Block256.bsc
09/11/2015 11:56 AM 1,471,795,768 _Deathship_textual_corpus.tar.xz
09/13/2015 07:31 PM 1,484,820,599 _Deathship_textual_corpus.tar.256MB.lzturbo12-39.lzt
09/14/2015 09:18 AM 1,800,083,824 _Deathship_textual_corpus.tar.method28.zpaq
Here comes Nakamichi 'Shifune' ...
09/13/2015 06:29 AM 2,181,159,237 _Deathship_textual_corpus.tar.zip
09/24/2015 11:36 PM 2,382,646,308 _Deathship_textual_corpus.tar.brotli
09/13/2015 03:04 AM 2,491,454,533 _Deathship_textual_corpus.tar.zst
09/13/2015 07:50 AM 2,626,828,543 _Deathship_textual_corpus.tar.lz4
09/11/2015 06:41 AM 8,090,119,168 _Deathship_textual_corpus.tar

A glimpse at my unfinished latest benchmark:
www.sanmayce.com/Hayabusa/Deathship_showdown.pdf
www.sanmayce.com/Hayabusa/Nakamichi_Shifune.pdf

[Suggestion #1:]
Your time reports seem problematic, I receive 0 MB/s for big files. Please make Brotli with '-b' benchmark or '-t' test (decompression without dump) ability, Zstd&Z4 have very good report.
Your current speed report includes 'fwrite()' time, I want Brotli's pure RAM-2-RAM performance.

[Suggestion #2:]
Make it compileable with Intel C/C++ optimizer, this will be appreciated by me for one. Current changes in bro.cc (I made) to run it:
#1:

    //#include <unistd.h>
    #include <time.h>
    #include <fcntl.h>
    #include <io.h>

#2:

static FILE* OpenInputFile(const char* input_path) {
//  if (input_path == 0) {
//    return fdopen(STDIN_FILENO, "rb");
//  }
/*
tools\bro.cc(136): error: identifier "STDIN_FILENO" is undefined
      return fdopen(STDIN_FILENO, "rb");
                    ^
*/
  if (input_path == 0) {
    setmode(_fileno( stdin ), O_BINARY);
    return stdin;
  }

// https://msdn.microsoft.com/en-us/library/aa298581%28v=vs.60%29.aspx
/*
   int result;
   // Set "stdin" to have binary mode:
   result = _setmode( _fileno( stdin ), _O_BINARY );
   if( result == -1 )
      perror( "Cannot set mode" );
   else
      printf( "'stdin' successfully changed to binary mode\n" );
*/

  FILE* f = fopen(input_path, "rb");
  if (f == 0) {
    perror("fopen");
    exit(1);
  }
  return f;
}

static FILE *OpenOutputFile(const char *output_path, const int force) {
//  if (output_path == 0) {
//    return fdopen(STDOUT_FILENO, "wb");
//  }
/*
tools\bro.cc(148): error: identifier "STDOUT_FILENO" is undefined
      return fdopen(STDOUT_FILENO, "wb");
                    ^
*/
  if (output_path == 0) {
    setmode(_fileno( stdout ), O_BINARY);
    return stdout;
  }
  if (!force) {
    struct stat statbuf;
    if (stat(output_path, &statbuf) == 0) {
      fprintf(stderr, "output file exists\n");
      exit(1);
    }
  }
//  int fd = open(output_path, O_CREAT | O_WRONLY | O_TRUNC,
//                S_IRUSR | S_IWUSR);
/*
tools\bro.cc(158): error: identifier "S_IRUSR" is undefined
                  S_IRUSR | S_IWUSR);
                  ^

tools\bro.cc(158): error: identifier "S_IWUSR" is undefined
                  S_IRUSR | S_IWUSR);
                            ^
*/
  FILE* f = fopen(output_path, "wb");
/*
  if (fd < 0) {
    perror("open");
    exit(1);
  }
  return fdopen(fd, "wb");
*/
  if (f == 0) {
    perror("fopen");
    exit(1);
  }
  return f;
}

And the actual console dump of how the compilation went:

// The next log/source is modified (for Windows compatibility) Brotli:

/*
D:\brotli-master>type makeEXE.bat
cd dec
icl /O3 /c bit_reader.c decode.c huffman.c state.c streams.c
cd..
cd enc
icl /O3 /c backward_references.cc block_splitter.cc brotli_bit_stream.cc encode.cc encode_parallel.cc entropy_encode.cc histogram.cc literal_cost.cc metablock.cc static_dict.cc streams.cc
cd..
cd tools
icl /O3 bro.cc ..\dec\bit_reader.obj ..\dec\decode.obj ..\dec\huffman.obj ..\dec\state.obj ..\dec\streams.obj ..\enc\backward_references.obj ..\enc\block_splitter.obj ..\enc\brotli_bit_stream.obj ..\enc\encode.obj ..\enc\encode_parallel.obj ..\enc\entropy_encode.obj ..\enc\histogram.obj ..\enc\literal_cost.obj ..\enc\metablock.obj ..\enc\static_dict.obj ..\enc\streams.obj

D:\brotli-master>makeEXE.bat

D:\brotli-master>cd dec

D:\brotli-master\dec>icl /O3 /c bit_reader.c decode.c huffman.c state.c streams.c
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

bit_reader.c
decode.c
huffman.c
state.c
streams.c

D:\brotli-master\dec>cd..

D:\brotli-master>cd enc

D:\brotli-master\enc>icl /O3 /c backward_references.cc block_splitter.cc brotli_bit_stream.cc encode.cc encode_parallel.cc entropy_encode.cc histogram.cc literal_cost.cc metablock.cc static_dict.cc streams.cc
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

backward_references.cc
block_splitter.cc
brotli_bit_stream.cc
encode.cc
encode_parallel.cc
entropy_encode.cc
histogram.cc
literal_cost.cc
metablock.cc
static_dict.cc
streams.cc

D:\brotli-master\enc>cd..

D:\brotli-master>cd tools

D:\brotli-master\tools>icl /O3 bro.cc ..\dec\bit_reader.obj ..\dec\decode.obj ..\dec\huffman.obj ..\dec\state.obj ..\dec\streams.obj ..\enc\backward_references.obj ..\enc\block_splitter.obj ..\enc\brotli_bit_stream.obj ..\enc\encode.obj ..\enc\encode_parallel.obj ..\enc\entropy_encode.obj ..\enc\histogram.obj ..\enc\literal_cost.obj ..\enc\metablock.obj ..\enc\static_dict.obj ..\enc\streams.obj
Intel(R) C++ Intel(R) 64 Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

bro.cc
Microsoft (R) Incremental Linker Version 10.00.30319.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:bro.exe
bro.obj
..\dec\bit_reader.obj
..\dec\decode.obj
..\dec\huffman.obj
..\dec\state.obj
..\dec\streams.obj
..\enc\backward_references.obj
..\enc\block_splitter.obj
..\enc\brotli_bit_stream.obj
..\enc\encode.obj
..\enc\encode_parallel.obj
..\enc\entropy_encode.obj
..\enc\histogram.obj
..\enc\literal_cost.obj
..\enc\metablock.obj
..\enc\static_dict.obj
..\enc\streams.obj

D:\brotli-master\tools>dir br*.exe
 Volume in drive D is S640_Vol5
 Volume Serial Number is 5861-9E6C

 Directory of D:\brotli-master\tools

09/24/2015  06:56 AM         1,250,304 bro.exe
               1 File(s)      1,250,304 bytes
               0 Dir(s)   5,917,040,640 bytes free

D:\brotli-master\tools>bro
;
D:\brotli-master\tools>bro /?
Usage: bro [--force] [--quality n] [--decompress] [--input filename] [--output filename] [--repeat iters] [--verbose]

D:\brotli-master\tools>
*/

And a final note, a byte angry, in your promoting paper you say "Decompresses much faster than current LZMA implementations", usually amateurs like me use 2x, 3x or 15x, your much is not good, one would think from 2x to 20x.
Also why don't you mention the current best (IMO) decompressor on INTERNET?! Not mentioning it (LzTurbo) is like disrespecting not only the man behind it but the BEST as a general notion, yes?

Hope you will refine Brotli and make it usable hi-performance console tool.

Regards,
Kaze

Streamable output from the decoder

From https://code.google.com/p/chromium/issues/detail?id=452335 where we are exploring how to support brotli as an HTTP transfer-encoding method

Comment Nb. 20:
"Chrome's networking stack is a single thread event loop. To prevent arbitrary data from being buffered in memory, and to get data to consumers as fast as possible, this will need to be rewritten in a way for the caller to call it repeatedly to get the data out of it."

Is there a way to create a custom dictionary?

Most files of our project have data that is not text, but delta-encoded floats and shorts, and I think it would greatly benefit from creating a dictionary with the most common strings.

Related #165

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.