Git Product home page Git Product logo

x-compressor's Introduction

x – minimalist data compressor

Build Status

Why?

Because readable and maintainable code is key. The x is an easily verifiable and portable lossless data compressor. Source codes count 700 lines in total. A core library is less than 400 lines in pure C.

Benchmarks

Benchmark evaluates the compression of the reference enwik8 file. All compressors have been compiled with GCC 9.2 on 64-bit Linux. The reference system uses an AMD Ryzen Threadripper 2990WX. All measurements use default settings (no extra arguments). The elapsed Compression and Decompression times (wall clock) are given in seconds. The compression Ratio is given as uncompressed/compressed (more is better). SLOC means Source Lines Of Code. Bold font indicates the best result.

Compressor Ratio Compression time Decompression time SLOC
lz4 1.9.2 1.75 0.29 0.11 20 619
lzop 1.04 1.78 0.36 0.33 17 123
x 1.88 1.03 0.91 700
gzip 1.9 2.74 4.69 0.63 48 552
zstd 1.3.7 2.80 0.55 0.18 111 948
bzip2 1.0.6 3.45 7.39 3.36 8 117
xz 5.2.4 3.79 53.70 1.40 43 534
brotli 1.0.7 3.88 3:05.59 0.34 35 372

The algorithm

The x uses an adaptive Golomb-Rice coding based on context modeling. The context model uses a single previous byte in the uncompressed stream to predict the next byte. The compressor can switch between fast compression mode (default) and multi-pass high compression mode.

How to build?

make BUILD=release

or

make build-pgo

How to use?

Compress:

./x INPUT-FILE [OUTPUT-FILE]

Decompress:

./unx INPUT-FILE [OUTPUT-FILE]

Authors

License

This project is licensed under the MIT License. See the LICENSE.md file for details.

x-compressor's People

Contributors

xbarin02 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

x-compressor's Issues

[Note] I ported the decompressor to 8086 assembly

You can find the source at https://hg.ulukai.org/ecm/inicomp/file/1229f30865d2/x.asm

As noted in the initial commit it is based on git 71afd5b05eb4.

To save on the memory usage, I limited the ctFreq field to 16 bits. (Carry out of the top bit is detected as an error.) Even so, the context tables require 256 KiB. I also had to add some layer threshold memory (the word "layout" here is in error) so there's enough space to decompress subsequent layers. (The 96 kB kernel compresses to 62 kB and uses two layers.)

I did not yet inline function calls only used once. However, I do not expect much speed-up from that. I also did not yet add support for overlapping source and destination, which all four of the other decompressors do support.

Its speed seems to place it as the slowest of all five decompressors yet. I think the reason for that is that every byte has to call into bio_read_gr, increment_frequency, and update_model. I was able to modify/create all other (mostly LZ-like) decompressors to use repeated string instructions to write several literal or match bytes in one step.

It is also the first decompressor to need additional memory (other than stack frame variables). And the compression ratio is not especially good either. However, the port was interesting to me. Thanks for your work!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.