Comments (27)
I can confirm these results. At least for the example above brotli is about 10x slower with the default settings and still performs much worse than xz
:
> # Compare size
> length(x)
[1] 6779000
> system.time(print(length(brotli_compress(x))))
[1] 1369587
user system elapsed
26.702 0.672 27.371
> system.time(print(length(brotli_compress(x, mode = "font"))))
[1] 1353634
user system elapsed
26.417 0.624 27.038
> system.time(print(length(memCompress(x, "gzip"))))
[1] 2323522
user system elapsed
0.358 0.001 0.359
> system.time(print(length(memCompress(x, "bzip2"))))
[1] 2289619
user system elapsed
0.661 0.004 0.664
> system.time(print(length(memCompress(x, "xz"))))
[1] 937728
user system elapsed
2.006 0.045 2.050
from brotli.
@bhouston why are you suck with JavaScript-based compression? There are C implementations of lzma?
from brotli.
@jeroenooms I want to send LZMA compressed mesh data to our WebGL viewer that runs in JavaScript:
https://clara.io/view/d3b82831-d56b-462f-b30c-500ea1c7f870
Thus I need to have browser-based client side decompression. I would love it if LZMA or an equivalent (I was hoping Brotli) was integrated into all browsers, but as it stands, only gzip and deflate are available. Thus yo have to use LZMA.js for browser-based LZMA decompression.
from brotli.
Perhaps their main selling point is that decompression is faster than for lzma, to make things easier on the client. But I agree that the difference in compression ratio very big, given the proposal states that:
This specification defines a lossless compressed data format that compresses data using a combination of the LZ77 algorithm and Huffman coding, with efficiency comparable to the best currently available general-purpose compression methods.
from brotli.
@jeroenooms I think that LZMA also requires significant maximum memory for decompression, which Brotli may be improving upon.
It would be near if you could re-run your statistics with decompression times and maximum memory instead of compression times, if that was easy. Maybe that will better emphasize Brotli's benefits?
from brotli.
@bhouston yes it seems like brotli does outperform everyone in terms of decompression speed. It is similar, sometimes bit faster than gzip. But I have yet to find an example where the compression ratio is better than xy or bzip2.
from brotli.
Where I can read about TriMesh binary format?
I have a feeling, that transposing and demuxing tables would make the file more compressible, especially by encoders with lower rank context modeling (zlib and brotli).
from brotli.
I can also confirm that brotli sucks with some binary and text files using the GENERIC / TEXT setting! I can zip up the brotli source code on my mac down to 8.3MB using the standard archiver utility. If I try to archive the brotli source code using brotli it increases it from 19.8MB to 33.3MB with a quality setting of 6! Even with a quality of 11 it is still 31MB! I have tried both ways, where I combine the binary of all files into one file and compress or compressing individual files and combining into one file. Either way it works out to the same amount some how.
Now, I have to hand it to brotli when it comes to a 5200x2500 image file which is 2.8MB in jpeg form at 100% quality. Brotli is able to compress it down to 900KB with just a quality of 6. With a quality of 11 it gets it down to 813KB roughly. It is a simple graphic of a logo though. However, it is better than a PNG version of the logo. A PNG version of the logo is 1.98MB. So, hey at least that is an improvement!
from brotli.
I have checked lzturbo 1.2 ( https://sites.google.com/site/powturbo/home ) on this file:
-49 mode: 894,729
-39 mode 1,256,602
and LZA 0.82 ( http://encode.ru/threads/1969-LZA-archiver ): 995,014
I think the main issue is that Brotli still uses Huffman coding, which is terrible at skewed probability distributions, characteristic for some types of data. More recent compressors use ANS coding instead (lzturbo, LZA, LZNA, ZSTD, Apple LZFSE), which allows to repair it still maintaining the speed.
from brotli.
My guess was right. Even without knowing the format, it is easy to improve compression ratio.
Step 1: split odd and even bytes, e.g. with python script
f = open("input", "rb")
K = 2
d = [[] for k in range(K)]
i = 0
b = f.read(1)
while b != "":
d[i % K].append(b)
i = i + 1
b = f.read(1)
f.close()
f = open("output", "wb")
for k in range(K):
f.write(''.join(d[k]))
f.close()
Step 2: compress
Step 3: profit!
- Zopfli: 1,847,682 bytes (~22% smaller)
- Brotli: 1,172,693 bytes (~29% smaller)
I believe that with smarter transform results will be even better.
from brotli.
My guess was right. Even without knowing the format, it is easy to improve compression ratio. Step 1: split odd and even bytes, e.g. with python script
Weird approach but it gave good results. The file is a bit of a mish-mash between float and integer data. It will exhibit a 32-bit alignment pattern as it is filled mostly with 32 bit floats and 32 bit floats with some exceptions and we do align all internal data array starts on 32-bit boundaries for JavaScript ArrayBuffer access efficiency.
As some asked, a trimesh / polymesh format is for representing piece-wise linear objects in 3D. This format supports both trimeshes and polygon meshes:
https://en.wikipedia.org/wiki/Triangle_mesh
https://en.wikipedia.org/wiki/Polygon_mesh
It is composed of a list of 3D positions (X,Y,Z as 32 bit floats triples), and then a list of integers which state how those points are connected to form triangles or polygons, (one list of integers which states how many points per face/polygon, and then another list of those indices for each face) and then it contains normals (more XYZ 32 bit float triples) and UVs (UV 32 bit float pairs) and more integers that state how these are to be used by the faces/polygons (these are called polygon mappings.) This format actually will change the precision of the integers it uses from 8 to 16 to 32 bit depending upon their required maximum values, but I believe this files is mostly 32 bit integers because it is on the larger size (more than 65K vertices/points, and thus it exceeds the range of 16 bit integers.)
from brotli.
One of reasons might be the fact that the first 9 bits of float ( https://en.wikipedia.org/wiki/Single-precision_floating-point_format ) is sign and exponent, which should be nearly the same for most of points. Also the next 8 bits: the most significant of the fraction, should be similar for neighboring points.
So if you group even or odd positions, you make these similar bytes closer and so simpler for compressor to use their similarity (most of them work on bytes).
But generally, these formats seem extremely wasteful - you could probably get an order of magnitude better compression with a cheap but dedicated compressor.
Starting with storing differences instead of absolute positions, additionally entropy coder coder should be used for the most significant digits.
Storing the mesh is a graph compression problem and mesh is a very specific one - should be cheaply compressible.
My first thought is to cover all triangles with a tree (like in https://en.wikipedia.org/wiki/File:Icosaedro_desarrollo.gif ) - then go through the tree and add new information for each node: usually one vertex per node - just store the difference of its position to the previous ones.
Or even less as it has to agree with neighbors - one distance per node (difference from equilateral triangle) seems sufficient.
from brotli.
But generally, these formats seem extremely wasteful - you could probably get an order of magnitude better compression with a cheap but dedicated compressor.
Our data is representative of a lot of non-video large binary streams -- it is composed of long streams of floats and integers, exactly what should be in a binary data stream. Our case is an easy one because they are even 32-bit boundary aligned. Of course we could write a custom compressor but then it will be in JavaScript again and we lose the speed and memory benefits of having it integrated into the browser. It would also take a lot of time for us and also force that on all other 3D developers -- thus it is not a globally efficient solution nor a locally efficient solution.
I would prefer to just use LZMA.js in that case or a browser-based compression tool that produces similar results. Basically saying that I have to write a custom compression scheme because Brotli is inadequate compared to LZMA is not an acceptable answer. We have good general purpose compression schemes for a reason and right now Brotli isn't one, even though it claims to be.
from brotli.
If decompression in java is your requirement, LZMA would be still much slower than dedicated compressor I have sketched.
But generally I also think that it is too early to declare Brotli as a new standard - it is still far from state-of-art compressors ( http://encode.ru/threads/2313-Brotli?p=44970&viewfull=1#post44970 ) and uses Huffman making it terrible if skewed probability file happen.
from brotli.
My first thought is to cover all triangles with a tree (like in https://en.wikipedia.org/wiki/File:Icosaedro_desarrollo.gif ) - then go through the tree and add new information for each node: usually one vertex per node - just store the difference of its position to the previous ones.
Or even less as it has to agree with neighbors - one distance per node (difference from equilateral triangle) seems sufficient.
This sounds like the triangle strip idea: https://en.wikipedia.org/wiki/Triangle_strip
But the standard storage format for Triangle Strips is still a list of vertices as XYZ 32 bit floats and then a long list of 32 bit integers.
I am interested a bit in general purpose compressors, but only if they are easy to implement and ultra fast and give better results than LZMA.
So if I understand one could easily write a special purpose compressor for lists of 32 bit floats. Basically you want a reordering of the positions to have neighbors close by, then an adjustment of their space to be fully positive in XYZ. And then a reordering of their internal bytes. And then I need to send this into a general purpose compression stream of some type -- what would you recommend? One can do that fairly simply I guess. This covers the compression of the 32bit floats.
What would you do with the 32bit integers to maximize compression? If one does a neighbor reordering of the XYZ positions, that imposes an ordering on the 32bit integers which reference those points. Basically one can only impose an order on either the integers or the XYZ positions, but not both easily.
from brotli.
Indeed imposing an order is difficult, better is to directly encode parameters of the "triangle strip":
- first you have to encode the structure of the tree. For a triangular lattice you have full binary tree - their number is Catalan number, you need ~1 bit/node: if this is a leaf or not.
Comparing to 3x32 vertex numbers, it means 96x compression. - then you don't need 3 float numbers per vertex - imagine such triangular lattice you want to bend - there are lots of constraints due to the fact that lengths of triangle edges have to agree. It is sufficient to store 1 length per node of the tree - e.g. length of its edge on the right (on the left is from its neighbor).
Then decoder would need to perform the bending - it is a few trigonometric operations per node - really cheap comparing to what LZMA does.
Regarding using floats - you store mostly very similar numbers - their exponent and sign (9 of 32 bits) are probably identical, also the most significant bits of the fraction.
Using fixed-precision numbers instead you should directly get like 1/2 savings.
And their most significant digits are usually from some localized probability distribution, especially if you encode differences - you can use entropy coder for them.
from brotli.
@JarekDuda Very interesting. Our case is a bit more complex than what is outlined, and would require a fair bit of work to deduce, but I bet there is a good algorithm to be had in the end.
from brotli.
I have started a general discussion about shape compression ( http://encode.ru/threads/2324-Data-compression-of-a-%283D%29-shape-e-g-of-TriMesh-or-a-chemical-molecule ) and was pointed a nice paper:
http://www.cc.gatech.edu/~jarek/papers/Compression.pdf
It says we need ~1byte/triangle, what agrees with what I have written.
In contrast, the standard you have described uses 3x4 bytes per vertex plus ~3x4 bytes/triangle for labels - we are talking about a cheap ~20x compression, 3 times better than the best discussed above (which are much more expensive).
It's sad that standards are so wasteful ... it's what's happening when engineers don't talk with theoreticians ...
from brotli.
@JarekDuda Just be careful that there are two major different mesh formats, one is purely for playback in a video game -- limited trimesh format with a fixed number of channels, and there is a second more general polygon format with arbitrary named channels -- also used for playback when one needs additional channels or knowledge of the polygonal structure. Both are heavily used for data transfer needs. I have interests in both.
BTW the current most popular open mesh compression format is OpenCTM. I think it is mostly LZMA: http://openctm.sourceforge.net/ And then this for in browser decompression: https://github.com/jcmellado/js-openctm
from brotli.
I'm now seeing 1'236'531 bytes with --quality 11 --window 24. I have some ideas (using more LSB6/MSB6 context modeling and less blocks) that we will try out later this year, with an expected 5-10 % more compression for this kind of data.
from brotli.
BTW the current most popular open mesh compression format is OpenCTM. I think it is mostly LZMA:
it is LZMA plus vertices and triangles reordering, to improve compression:
https://github.com/Danny02/OpenCTM/blob/master/lib/compressMG2.c#L191-L212
This scheme could probably be improved, but there are several constraints to take into account. IMHO the input mesh topology is the biggest constraint : as an example, you cannot perform triangle strip compression on a triangle soup, where all vertices are disconnected.
from brotli.
I understand re-ordering the vertices as well as splitting out the Y, X and Z into linear arrays so that it is X0, X1, X2, X3.... Y0, Y1, Y2.... Z0, Z1, Z2, is better.
from brotli.
But maybe you are corrent that OpenCTM is the best solution, maybe it is better than I thought. I apologize.
from brotli.
Update : currently brotli at level 11 compresses the example file to 1'236'595. It is still bigger than LZMA output, but there is a space for further encoder improvements.
from brotli.
Those are fairly significant improvements to brotli. That is nice. It doesnt' have to his LZMA compression level, but 25% improvements over what it was is hugely significant. What was the compression time for this file at level 11?
from brotli.
About 49s (with fixed encoder) on Intel Xeon E5-1650 @ 1200 MHz (powersave mode).
There is a problem in encoder that makes encoding of this file unexpectedly slow. I'm preparing a fix, soon it will be published.
from brotli.
@bhouston : I wonder if "utf8 webgl-loader" mesh compression, with brotli is a better option than gzip ? I did a quick test on the utf8 data here: http://webgl-tests.crbs.ucsd.edu/obj-viewer/?model=test
and brotli ratio seemed comparable to lzma.
60MB obj file (original file)
12MB utf8 file...
4.5MB utf8.gz (using default)
3.4MB utf8.br (using q11, w24)
3.4MB utf8.xz (using default)
The ideas in Won's utf8 webgl loader do the specialized transform that @eustas (and @JarekDuda) mentioned, but it sounds like you want to avoid any deserializing in js land?
from brotli.
Related Issues (20)
- Ability to clone/serialize/deserialize state in Compressor and Decompressor HOT 4
- Exception while brotliDecode in decode.ts HOT 5
- EMPTY VERSION FIELDS IN PKGCONFIG FILES (LINUX) HOT 4
- Support for user-supplied dictionaries in Python binding
- Release v1.1 HOT 14
- Publish JNI artefacts (including platform-dependent) HOT 7
- Stale comment referring to nonexisting WriteMetadata() function HOT 6
- v1.1.0rc does not build on macOS 10.12.6 HOT 8
- Strange compression ratio on large CSV file... HOT 1
- Brotli v1.1.0 tests fail with pypy3 HOT 10
- Brotli 1.1.0 breaks Python 2 compatibility HOT 3
- Create a static source tarball for releases
- CMake build broken, tries to install man files in /man not in CMAKE_INSTALL_PREFIX
- bazel error no such package '@org_brotli// HOT 4
- Brotli 1.1.0 NginX HOT 3
- Brotli 1.1.0 changed return values? HOT 5
- [Feature Request] adding a option to control whether install the executable HOT 4
- makefile broken HOT 7
- undefined symbol BrotliEncoderDestroyPreparedDictionary HOT 5
- Windows ARM64EC Target Fails - _tzcnt_u64(EC Symbol) HOT 4
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from brotli.