Git Product home page Git Product logo

bcrush's Introduction

bcrush

Meson CI

About

This is an example using some of the compression algorithms from BriefLZ to produce output in the format of CRUSH by Ilya Muravyov.

Please note: this is just a quick experiment to see how it would work, it is not production quality, and has not been properly tested.

Benchmark

Here are some results on the Silesia compression corpus:

File Original bcrush --optimal crush cx crushx -9
dickens 10.192.446 3.148.963 3.350.093 3.343.930
mozilla 51.220.480 18.037.611 18.760.573 18.281.301
mr 9.970.564 3.367.533 3.532.785 3.428.968
nci 33.553.445 2.407.286 2.624.037 2.750.658
ooffice 6.152.192 2.832.224 2.958.518 2.871.884
osdb 10.085.684 3.424.687 3.545.632 3.457.335
reymont 6.627.202 1.523.547 1.644.701 1.610.306
samba 21.606.400 4.720.964 4.912.141 4.911.613
sao 7.251.944 5.344.713 5.472.035 5.368.466
webster 41.458.703 9.766.251 10.430.228 10.322.130
xml 5.345.280 535.316 563.744 561.118
x-ray 8.474.240 5.717.405 5.958.603 5.747.141

Where crush is the original CRUSH v1.00, and crushx is an implementation of crush with optimal parsing posted on Encode's Forum.

Usage

bcrush uses Meson to generate build systems. To create one for the tools on your platform, and build bcrush, use something along the lines of:

mkdir build
cd build
meson ..
ninja

You can also simply compile and link the source files.

bcrush includes the leparse and btparse algorithms from BriefLZ, which gives compression levels -5 to -9 and the very slow --optimal.

Notes

  • The CRUSH format does not store the size of the compressed block, so I copied the way the CRUSH depacker reads one byte at a time from the file to avoid issues with reading the next block into memory.
  • bcrush only hashes 3 bytes to find matches, which makes it slow on files with many small matches. It might benefit from using two hash tables like CRUSH.

License

This projected is licensed under the zlib License (Zlib).

bcrush's People

Contributors

jibsen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

Forkers

hobao07 clayne

bcrush's Issues

Compile feedback with Intel v15.0

Very fast --optimal has become, congratulations, more feedback will be posted on your BriefLZ page...
I successfully compiled it with ICL, just let you know of these 2 warnings:

E:\_TEXTUAL_MADNESS_bare-minimum_2020-Jan-05\Nakamichi_2020-Feb-02_DOC_oldsources\TEXTORAMIC_part\bcrush>icl /TP /O3 bcrush.c parg.c crush.c crush_depack.c crush_depack_file.c
Intel(R) C++ Compiler XE for applications running on Intel(R) 64, Version 15.0.0.108 Build 20140726
Copyright (C) 1985-2014 Intel Corporation.  All rights reserved.

bcrush.c
parg.c
crush.c
E:\_TEXTUAL_MADNESS_bare-minimum_2020-Jan-05\Nakamichi_2020-Feb-02_DOC_oldsources\TEXTORAMIC_part\bcrush\crush_btparse.h(72): warning #589: transfer of control bypasses initialization of:
            variable "cost" (declared at line 75)
            variable "mpos" (declared at line 76)
            variable "mlen" (declared at line 77)
            variable "nodes" (declared at line 78)
            variable "lookup" (declared at line 79)
            variable "next_match_cur" (declared at line 99)
            variable "next_token" (declared at line 252)
            variable "cur" (declared at line 260)
                goto finalize;
                ^

E:\_TEXTUAL_MADNESS_bare-minimum_2020-Jan-05\Nakamichi_2020-Feb-02_DOC_oldsources\TEXTORAMIC_part\bcrush\crush_leparse.h(57): warning #589: transfer of control bypasses initialization of:
            variable "prev" (declared at line 70)
            variable "mpos" (declared at line 71)
            variable "mlen" (declared at line 72)
            variable "cost" (declared at line 73)
            variable "lookup" (declared at line 74)
            variable "bits" (declared at line 77)
                goto finalize;
                ^

crush_depack.c
crush_depack_file.c
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation.  All rights reserved.

-out:bcrush.exe
bcrush.obj
parg.obj
crush.obj
crush_depack.obj
crush_depack_file.obj

E:\_TEXTUAL_MADNESS_bare-minimum_2020-Jan-05\Nakamichi_2020-Feb-02_DOC_oldsources\TEXTORAMIC_part\bcrush>

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.