Git Product home page Git Product logo

lzd-rs's Introduction

lzd-rs

Documentation Crates.io

This library provides a Rust implementation of LZ double-factor factorization, an efficient grammar-based compression algorithm, proposed in the paper:

K Goto, H Bannai, S Inenaga, and M Takeda. LZD Factorization: Simple and Practical Online Grammar Compression with Variable-to-Fixed Encoding. In CPM, 2015.

Examples

Factorization

use lzd::compressor::Compressor;

fn main() {
    // Input text
    let text = "abaaabababaabbabab".as_bytes();

    // Factorization
    let mut factors = Vec::new();
    let defined_factors = Compressor::run(text, |id: usize| {
        factors.push(id);
    });

    // Output factors
    println!("factors: {:?}", factors);

    // Statistics
    println!("defined_factors: {:?}", defined_factors);
}

The output will be

factors: [97, 98, 97, 97, 256, 256, 256, 257, 98, 98, 258]
defined_factors: 261

NOTE: In this implementation, all 256 single characters are predefined as factors, so the number of factors defined will become 261.

Defactorization

use lzd::decompressor::Decompressor;

fn main() {
    // Input text
    let factors = [97, 98, 97, 97, 256, 256, 256, 257, 98, 98, 258];

    // Defactorization
    let mut text = String::new();
    Decompressor::run(&factors, |c: u8| {
        text.push(c as char);
    });

    // Decoded text
    println!("text: {:?}", text);
}

The output will be

text: "abaaabababaabbabab"

Commnad line tools

This library provides two command line tools for compression and decompression. The tools will print the command line options by specifying the parameter -h.

In the tools, LZ factors are serialized into a binary stream, in the same manner as tdc::BitCorder of tudocomp.

lzd command

It compresses an input data and writes the result into a file with the extension lzd. In the following case, english.50MB.lzd will be written as the compressed file.

$ lzd english.50MB
Compressed filename will be /home/kampersanda/dataset/pizzachili/text/english/english.50MB.lzd
52428800 bytes were compressed into 16426243 bytes (31.33%)
52428800 characters were factorized into 6354129 LZD-factors (12.12%)
3177320 LZD-factors were defined

unlzd command

It decompresses a compressed file and writes the original data into a file without the extension lzd. In the following case, english.50MB will be written as the decompressed file.

$ ./target/release/unlzd english.50MB.lzd

Licensing

This library is free software provided under MIT.

lzd-rs's People

Contributors

kampersanda avatar

Stargazers

 avatar  avatar  avatar  avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.