Git Product home page Git Product logo

Comments (9)

conor42 avatar conor42 commented on August 26, 2024 1

Nice :)
In theory LZMA2 only supports a dictionary up to 1.5 Gb, but in practice the decoder can handle more than this. There may be decoders which balk at 2 Gb. I'll try to find out if raising it will cause any trouble.

from fast-lzma2.

ApexMods avatar ApexMods commented on August 26, 2024 1

Thanks, Conor. 👍🏻

With only 16 Gb RAM to test on, memory pressure was too big for extensive testing of the 2 GB dictionary (although decoding with e.g. standard p7zip 16.02 worked flawlessly). Running at 1.5 GB dictionary now (don't know how to set that limit in the source, as it's 2^n defined). Raising match finder cycles did not improve compression, so my max compression command currently looks like this:

7za a -mx -myx -ms1024t -mqs -m0=flzma2:a3:d1536m:fb273:mc64:mt16

Haven't figured out how exactly compression level (x), compression analysis (yx), and compression mode (a) influence each other in flzma2. Your source comments mention 3 compression modes, each with 10 different compression levels, and a plethora of other parameters which appear to be unalterable via command line switches.

While we're at it, what would the absolute fastest compression setting be? I'm currently using

7za a -mx1 -m0=flzma2:a0

(which performs north of 200GB/h, btw) but I have a feeling there's room for further improvement, as CPU load won't nearly reach 100% on that setting, and it's not disk speed bound, either.

from fast-lzma2.

conor42 avatar conor42 commented on August 26, 2024 1

The 10 compression levels are made by combining a mode setting with a number of other settings, so there aren't 10 per mode. The problem with levels is the best combination of settings depends on the type of data, so results may not be consistent when comparing 1-10.

Fast compression will probably never compare well with hashing strategies because the algorithm has no advantage on small dictionaries. Also the initial step at depth 0 is single threaded.

from fast-lzma2.

ApexMods avatar ApexMods commented on August 26, 2024

From studying the source and initial trial and error, I've learned that -m0=flzma2:a(0-3) sets compression strategy, dictionary is indeed limited to 1024m (why?!?), fast bytes top out at 273, and matchfinder cycles go up to mc64. Compression levels seem to be set via standard -mx(0-9) switches, analysis levels -myx(0-9) remain original(?), same seems to be the case for literal context, literal position, and position bits. Will study source further for additional parameters to fiddle with.

On a side note: Wow, this thing is fast!!! Minimal memory use and very good compression.

Amazing job, Conor! Thank you for this gift to the world! =)

from fast-lzma2.

conor42 avatar conor42 commented on August 26, 2024

Thanks for your comments :)

You must be referring to the 7-zip-zstd implementation. Yes there isn't much documentation that I recall. The Fast LZMA2 library documentation combined with the source for the 7-zip interface should cover everything. The 1024 Mb dictionary limit is a legacy of configuration code from Zstandard, which accepts only logarithmic sizes. I have updated the Fast LZMA2 library to fix this, but FL2_DICTSIZE_MAX still limits it to 1024 Mb. This needs to be fixed and tested.

from fast-lzma2.

ApexMods avatar ApexMods commented on August 26, 2024

I'm using your excellent library inside the p7zip dev branch (https://github.com/szcnick/p7zip). Had to wrangle with the source a bit, but finally got it to compile on macOS. I am truly amazed at the speed gains and almost laughably low memory requirements for multithreading. Beautiful.

from fast-lzma2.

ApexMods avatar ApexMods commented on August 26, 2024

Oh, and yes - the dictionary limit is the only thing holding it back. I'm maxing settings with 1G dictionary on 16 threads here on my machine, and it's barely using 7GB of memory. A dictionary size of 2GB would be perfect to compensate the slightly lower compression ratio (when compared to 1.5GB dictionary in memory-munching "slow" LZMA).

from fast-lzma2.

ApexMods avatar ApexMods commented on August 26, 2024

So, just for the fun of it, I compiled again with a modified 2GB dictionary limit (no other modifications). On a 20GB corpus, compression ratio was significantly increased, archive size went from 5.45GB (with 1GB dictionary) to 5.18GB (with 2GB). Compression time went up from 29 to 35 minutes, memory usage from 7GB to 14GB. Will try increasing radix cycles next. :)

from fast-lzma2.

ApexMods avatar ApexMods commented on August 26, 2024

Still unclear on some things. Does mode (a0-3) override level (x0-10), or is it the other way around? Does analysis level (yx) have any effect at all? Getting inconclusive results here, so a bit confused.

from fast-lzma2.

Related Issues (9)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.