Git Product home page Git Product logo

huffmancompressor's Introduction

HuffmanCompressor

A program that can losslessly compress and decompress files using Huffman Encoding

Working

Compression

  1. Find frequencies of characters and generate a Huffman Tree
  2. Traverse, and assign bitcodes to each character, and store in map
  3. Using this info, create a bitstring of the input file
  4. Read this bitstring in chunks of 8, using OR and SHIFT operations to get the byte
  5. Write these bytes to file

Decompression

  1. Generate bitstring of compressed file
  2. Go through each bit, comparing with the huffman codes, and appending characters to a vector on match
  3. Write this vector to the file, which would be our uncompressed version

Efficiency

Since we're now storing the table in the compressed file to decouple the compression and decompression phases, the size of our compressed file has increased. So for smaller files, it is actually worse to compress them, since we have to store the table information too. But as file size increases, so does our efficiency, to some extent.

Case in point:

  1. Uncompressed: 4 bytes

  2. Compressed: 128 bytes

  3. Uncompressed: 1100KB

  4. Compressed: 550KB

As we can see, with a higher file size, we have a higher probability of better compression.

Issues

1. There's a bug in which if you have equal frequencies, it doesn't work properly(FIXED)

2. In the implementation, we're appending zeroes to make the bitstring length a multiple of 8. Due to this, during the decompression phase, some unwanted characters are added(At most 7). This can be fixed by storing the amount we padded in the file.

3. It is quite slow. I'm sure there's a better way than having to muck around in converting to bitstrings over and over again.

To Do

  1. Make it faster
  2. Find way to store codes efficiently

1. Store the padded amount in the file

2. Store the table in the file header so that we don't need to depend externally

3. Fix the same frequency bug

huffmancompressor's People

Contributors

kishore-ganesh avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

huffmancompressor's Issues

bug

the readFileIntoBuffer function returns exit status -1 without any error messages sometimes when the compressFile function is used in a function or if the file was previously opened.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.