Git Product home page Git Product logo

huffmancoding's Introduction

Huffman Encoding

The algorithm is divided into the following steps:

  • Stream in data from a text file and record a map of symbols with their number of occurences (e.g. map<char,int>)
  • Create Binary Tree nodes for each of the symbols as leaves with their respective frequencies and store them in a priority queue where they are arranged with lowest frequency --> highest priority.
  • Pop two nodes from the queue and create an internal node for them with the frequency being the sum of the two nodes. Add the internal node back into the priority queue.
  • Have the two nodes point to the internal node as its children (left and right). Repeat this procedure until there is just one node (root) left.

Building

$ make compress
$ make decompress

Running

$ compress <file_to_compress>
$ decompress <file_to_decompress> <csm_file>

huffmancoding's People

Contributors

rap2363 avatar rparanjpe-tesla avatar

Watchers

 avatar  avatar

huffmancoding's Issues

Compression/Decompression without Memory Allocation

Technically we can write directly from one file to the other. We allocate character buffers in order to make the read/writes faster. But is there a way to maintain the read/write speed while allocating little to no extra memory on the Heap? e.g. like gzip?

Makefile

Currently you have to build the binaries using terrible instructions in the README.

Why isn't BinaryEncoder abstract?

It contains no member variables, is there a reason to not have this class abstract and avoid unnecessary variable declarations: e.g. BinaryEncoder be() ?

Memory Allocation in Binary Encoder

Currently we allocate a ton of memory for the BinaryEncoder object.

It has 4 interface methods:
streamCharacterFileIn (reads in a text files and stores the bits as bools).
streamBinaryFileOut (streams out the currently held bits).
streamBinaryFileIn (streams in bits from a bin file and stores the bits as bools).
streamCharacterFileOut (streams out stored bits into a character file).

Logistically, streamCharacterFileIn and streamBinaryFileOut are called directly with each other in compress.cpp, and the same is true for the latter two methods in decompress.cpp. From an efficiency standpoint, we don't need to actually store any bits in the BinaryEncoder. It could just read from one file, and write to another (using a previously allocated character buffer in between). Storing the bits as bools however would require 8x more memory as bools are stored as bytes anyway!

Merging .bin, .csm

Find a way to merge .bin and .csm so that we don't need two separate files. Then we can have:
bin/compress file -> file.bin, bin/decompress file.bin -> file

Update README

Specifically the description of the algorithm/usage of the binaries. Also include ideas for future development.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.