Git Product home page Git Product logo

lz77's Introduction

LZ77 Compression Algorithm

The most widely used general purpose compressors (gzip, zstd) are based on LZ77. It encodes data by referencing earlier occurrences of the same data within a sliding window.

Key Features

  • Universal Compression: Theoretically, LZ77 achieves the entropy rate of any stationary process, making it a versatile tool for diverse data types.
  • Efficient Match Finding: Utilizes a hash-based indexing approach for quick identification of repeated sequences.
  • Adaptability: Effective for text, binary files, and images.

Usage

  • Building the Project:

    make lz77
  • Compression:

    ./lz77 -encode [input file] [output file]
  • Decompression:

    ./lz77 -decode [input file] [output file]

Implementation Details

Data Structures

  • LZ77Context: Manages the state, including buffers, hash table, and counters.

Core Functions

  • UpdateHash: Updates index references in the hash table.
  • FindMatch: Identifies the longest match in the input data.
  • WriteLiteral: Handles literal (uncompressed) data blocks.
  • WriteCompressedBlock: Manages the encoding of compressed data.
  • LZ77Encode: Core compression function.
  • LZ77Decode: Core decompression function.

Current Lmitations & Future Improvements

  1. Memory Efficiency: The entire output buffer is held in memory during decompression, which could be optimized for large files.
  2. Compression Optimality: The current greedy parsing strategy may not always yield the best compression ratio. Future updates could explore more sophisticated strategies.
  3. Special Case Handling: Enhancements are needed for better handling of long repeating sequences.

Benchmarks

Below is a comparison of LZ77 compression against gzip across various file types:

file raw size lz77 ratio gzip ratio
eff.html 41684 2.37 4.32
the_brothers_karamazov.txt 2004392 1.24 2.78
sev_financial_statement_data_sets_2023Q3_num.txt 281614209 5.63 10.87
bootstrap-3.3.6.min.css 121260 3.10 6.19
jquery-2.1.4.min.js 84344 1.56 2.85
leveldb.tar 3041280 1.30 1.43
random.bin (random bytes) 1000000 0.995 0.999
repeat_As.txt 1000000 84.98 982.32
cat.png 1056320 1.00 1.00

Data for some files were sourced from the Squash Compression Corpus.

All sizes are in bytes.

Resources

For further reading and understanding of the underlying principles of data compression, refer to Stanford EE274: Data Compression Course Notes.

lz77's People

Contributors

jssonx avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.