Git Product home page Git Product logo

heatshrink's Introduction

heatshrink

A data compression/decompression library for embedded/real-time systems.

Key Features:

  • Low memory usage (as low as 50 bytes) It is useful for some cases with less than 50 bytes, and useful for many general cases with < 300 bytes.
  • Incremental, bounded CPU use You can chew on input data in arbitrarily tiny bites. This is a useful property in hard real-time environments.
  • Can use either static or dynamic memory allocation The library doesn't impose any constraints on memory management.
  • ISC license You can use it freely, even for commercial purposes.

Getting Started:

There is a standalone command-line program, heatshrink, but the encoder and decoder can also be used as libraries, independent of each other. To do so, copy heatshrink_common.h, heatshrink_config.h, and either heatshrink_encoder.c or heatshrink_decoder.c (and their respective header) into your project. For projects that use both, static libraries are built that use static and dynamic allocation.

Dynamic allocation is used by default, but in an embedded context, you probably want to statically allocate the encoder/decoder. Set HEATSHRINK_DYNAMIC_ALLOC to 0 in heatshrink_config.h.

Basic Usage

  1. Allocate a heatshrink_encoder or heatshrink_decoder state machine using their alloc function, or statically allocate one and call their reset function to initialize them. (See below for configuration options.)

  2. Use sink to sink an input buffer into the state machine. The input_size pointer argument will be set to indicate how many bytes of the input buffer were actually consumed. (If 0 bytes were conusmed, the buffer is full.)

  3. Use poll to move output from the state machine into an output buffer. The output_size pointer argument will be set to indicate how many bytes were output, and the function return value will indicate whether further output is available. (The state machine may not output any data until it has received enough input.)

Repeat steps 2 and 3 to stream data through the state machine. Since it's doing data compression, the input and output sizes can vary significantly. Looping will be necessary to buffer the input and output as the data is processed.

  1. When the end of the input stream is reached, call finish to notify the state machine that no more input is available. The return value from finish will indicate whether any output remains. if so, call poll to get more.

Continue calling finish and polling to flush remaining output until finish indicates that the output has been exhausted.

Sinking more data after finish has been called will not work without calling reset on the state machine.

Configuration

heatshrink has a couple configuration options, which impact its resource usage and how effectively it can compress data. These are set when dynamically allocating an encoder or decoder, or in heatshrink_config.h if they are statically allocated.

  • window_sz2, -w in the CLI: Set the window size to 2^W bytes.

The window size determines how far back in the input can be searched for repeated patterns. A window_sz2 of 8 will only use 256 bytes (2^8), while a window_sz2 of 10 will use 1024 bytes (2^10). The latter uses more memory, but may also compress more effectively by detecting more repetition.

The window_sz2 setting currently must be between 4 and 15.

  • lookahead_sz2, -l in the CLI: Set the lookahead size to 2^L bytes.

The lookahead size determines the max length for repeated patterns that are found. If the lookahead_sz2 is 4, a 50-byte run of 'a' characters will be represented as several repeated 16-byte patterns (2^4 is 16), whereas a larger lookahead_sz2 may be able to represent it all at once. The number of bits used for the lookahead size is fixed, so an overly large lookahead size can reduce compression by adding unused size bits to small patterns.

The lookahead_sz2 setting currently must be between 3 and the window_sz2 - 1.

  • input_buffer_size - How large an input buffer to use for the decoder. This impacts how much work the decoder can do in a single step, and a larger buffer will use more memory. An extremely small buffer (say, 1 byte) will add overhead due to lots of suspend/resume function calls, but should not change how well data compresses.

Recommended Defaults

For embedded/low memory contexts, a window_sz2 in the 8 to 10 range is probably a good default, depending on how tight memory is. Smaller or larger window sizes may make better trade-offs in specific circumstances, but should be checked with representative data.

The lookahead_sz2 should probably start near the window_sz2/2, e.g. -w 8 -l 4 or -w 10 -l 5. The command-line program can be used to measure how well test data works with different settings.

More Information and Benchmarks:

heatshrink is based on LZSS, since it's particularly suitable for compression in small amounts of memory. It can use an optional, small index to make compression significantly faster, but otherwise can run in under 100 bytes of memory. The index currently adds 2^(window size+1) bytes to memory usage for compression, and temporarily allocates 512 bytes on the stack during index construction (if the index is enabled).

For more information, see the blog post for an overview, and the heatshrink_encoder.h / heatshrink_decoder.h header files for API documentation.

Build Status

Build Status

heatshrink's People

Contributors

aredridel avatar far-bulogics avatar jvranish avatar philpem avatar silentbicycle avatar unixdj avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

heatshrink's Issues

heatshrink with -w 15 and HEATSHRINK_USE_INDEX=1 loops infinitely for >= 32768 byte file

This is with revision v0.4.1-1-g7d419e1.

$ make clean
[...]
$ make OPTIMIZE="-O0"
[...]
$ ii=32768; while ((ii--)); do printf "\x90"; done > 90.bin
$ ./heatshrink -w 15 90.bin tmp
^C

Using -w 14 or lower, or building with HEATSHRINK_USE_INDEX set to zero, works around this bug.

It seems to be that in the function find_longest_match the while loop loops forever, because pos = 0, match_maxlen = 0, pospoint[match_maxlen] = pospoint[0] = 0, needlepoint[match_maxlen] = needlepoint[0] = 0x90 (presumably a byte (first one?) from the input file), hsi->index[pos] = hsi->index[0] = 0. So the first if in that while always is evaluated as true, pos gets reset to hsi->index[pos] (both are zero), and the loop continues looping.

Here's a gdb session showing the behaviour:

$ gdb --args ./heatshrink -w 15 90.bin tmp
GNU gdb (Debian 8.3.1-1) 8.3.1
Copyright (C) 2019 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from ./heatshrink...
(gdb) break heatshrink_encoder.c:466
Breakpoint 1 at 0x2dee: file heatshrink_encoder.c, line 466.
(gdb) run
Starting program: /media/ssd-data/coding/proj/heatshrink/broken/heatshrink -w 15 90.bin tmp

Breakpoint 1, find_longest_match (hse=0x55555557c300, start=0, end=32768, 
    maxlen=16, match_length=0x7fffffffcbdc) at heatshrink_encoder.c:466
466	    while (pos - (int16_t)start >= 0) {
(gdb) step
467	        uint8_t * const pospoint = &buf[pos];
(gdb) 
468	        len = 0;
(gdb) 
473	        if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
(gdb) print/x pospoint[match_maxlen]
$1 = 0x0
(gdb) print/x needlepoint[match_maxlen]
$2 = 0x90
(gdb) step
474	            pos = hsi->index[pos];
(gdb) 
475	            continue;
(gdb) print/x pos
$3 = 0x0
(gdb) display/x pos
1: /x pos = 0x0
(gdb) step
466	    while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
(gdb) 
467	        uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
(gdb) 
468	        len = 0;
1: /x pos = 0x0
(gdb) 
473	        if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
(gdb) print/x pospoint[match_maxlen]
$4 = 0x0
(gdb) print/x needlepoint[match_maxlen]
$5 = 0x90
(gdb) step
474	            pos = hsi->index[pos];
1: /x pos = 0x0
(gdb) display/x hsi->index[pos]
2: /x hsi->index[pos] = 0x0
(gdb) step
475	            continue;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
466	    while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
467	        uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
468	        len = 0;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
473	        if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
474	            pos = hsi->index[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
475	            continue;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
466	    while (pos - (int16_t)start >= 0) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
467	        uint8_t * const pospoint = &buf[pos];
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
468	        len = 0;
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) 
473	        if (pospoint[match_maxlen] != needlepoint[match_maxlen]) {
1: /x pos = 0x0
2: /x hsi->index[pos] = 0x0
(gdb) display/x match_maxlen
3: /x match_maxlen = 0x0
(gdb) 

Reducing break even point may lower compression ratio

Reducing the break even point, as done in f533992, should theoretically improve compression ratio. However, in some cases emitting a literal or two followed by a backref would make the encoding overall shorter than two shorter backrefs that the (naturally greedy) encoder finds. E.g., when compressing kennedy.xls from the Canterbury Corpus with -w 8 -l 7 with different break even points, one can observe many such occurences (- is break even point 3, + is 2):

 ## step_search, scan @ +15 (271/512), input size 256
 ss Match not found
 ## step_search, scan @ +16 (272/512), input size 256
-ss Match not found
-## step_search, scan @ +17 (273/512), input size 256
-ss Found match of 6 bytes at 8
+ss Found match of 2 bytes at 1
+## step_search, scan @ +18 (274/512), input size 256
+ss Found match of 5 bytes at 8
 ## step_search, scan @ +23 (279/512), input size 256
 ss Match not found
 ## step_search, scan @ +24 (280/512), input size 256

It would be nice to try to find a way to optimize this within the contraints of an embedded environment. If possible, without backtracking.

Usage simplified if heatshrink_encoder_poll() returns HSER_POLL_MORE when finish flag is set and output buffer is full

heatshrink_encoder_poll() is returning 'HSER_POLL_EMPTY' rather than 'HSER_POLL_MORE' when finishing and the output buffer is full. Is a break missing from the 'HSES_FLUSH_BITS' case? As implemented, it is dropping through to the 'HSES_DONE' case and returning 'HSER_POLL_EMPTY' rather than running through the 'HSER_POLL_MORE' check?

    case HSES_FLUSH_BITS:
        hse->state = st_flush_bit_buffer(hse, &oi);
    case HSES_DONE:
        return HSER_POLL_EMPTY;
    default:
        LOG("-- bad state %s\n", state_names[hse->state]);
        return HSER_POLL_ERROR_MISUSE;
    }

    if (hse->state == in_state) {
        /* Check if output buffer is exhausted. */
        if (*output_size == out_buf_size) return HSER_POLL_MORE;
    }

Logging without break:

heatshrink_encoder_poll()
(12:59:37.602) (>>) -- polling, state 1 (filled), flags 0x01<LF>
...
(13:00:37.174) (>>) -- polling, state 2 (search), flags 0x01<LF>
(13:00:37.174) (>>) ## step_search, scan @ +963 (1927/2048), input size 964<LF>
(13:00:37.199) (>>) -- scanning for match of buf[1987:1988] between buf[963:1987] (max 1 bytes)<LF>
(13:00:37.199) (>>) -- none found<LF>
(13:00:37.230) (>>) ss Match not found<LF>
(13:00:37.230) (>>) -- polling, state 3 (yield_tag_bit), flags 0x01<LF>
(13:00:37.230) (>>) -- adding tag bit: 1<LF>
(13:00:37.230) (>>) ++ push_bits: 1 bits, input of 0x01<LF>
(13:00:37.293) (>>) -- polling, state 4 (yield_literal), flags 0x01<LF>
(13:00:37.293) (>>) -- yielded literal byte 0x44 ('D') from +1987<LF>
(13:00:37.293) (>>) ++ push_bits: 8 bits, input of 0x44<LF>
(13:00:37.293) (>>)  > pushing byte 0x75<LF>
(13:00:37.293) (>>) -- polling, state 2 (search), flags 0x01<LF>
(13:00:37.293) (>>) ## step_search, scan @ +964 (1928/2048), input size 964<LF>
(13:00:37.323) (>>) -- end of search @ 964<LF>
(13:00:37.323) (>>) -- polling, state 8 (flush_bits), flags 0x01<LF>
returns HSER_POLL_EMPTY

I observe the desired behaviour with:

    case HSES_FLUSH_BITS:
        hse->state = st_flush_bit_buffer(hse, &oi);
        break;

Logging with break:

heatshrink_encoder_poll()
(13:15:58.567) (>>) -- polling, state 1 (filled), flags 0x01<LF>
...
(13:17:46.218) (>>) -- polling, state 2 (search), flags 0x01<LF>
(13:17:46.218) (>>) ## step_search, scan @ +963 (1927/2048), input size 964<LF>
(13:17:46.218) (>>) -- scanning for match of buf[1987:1988] between buf[963:1987] (max 1 bytes)<LF>
(13:17:46.227) (>>) -- none found<LF>
(13:17:46.227) (>>) ss Match not found<LF>
(13:17:46.227) (>>) -- polling, state 3 (yield_tag_bit), flags 0x01<LF>
(13:17:46.227) (>>) -- adding tag bit: 1<LF>
(13:17:46.258) (>>) ++ push_bits: 1 bits, input of 0x01<LF>
(13:17:46.258) (>>) -- polling, state 4 (yield_literal), flags 0x01<LF>
(13:17:46.258) (>>) -- yielded literal byte 0x44 ('D') from +1987<LF>
(13:17:46.289) (>>) ++ push_bits: 8 bits, input of 0x44<LF>
(13:17:46.289) (>>)  > pushing byte 0x75<LF>
(13:17:46.289) (>>) -- polling, state 2 (search), flags 0x01<LF>
(13:17:46.289) (>>) ## step_search, scan @ +964 (1928/2048), input size 964<LF>
(13:17:46.362) (>>) -- end of search @ 964<LF>
(13:17:46.362) (>>) -- polling, state 8 (flush_bits), flags 0x01<LF>
returns HSDR_POLL_MORE

heatshrink_encoder_poll()
(13:17:46.362) (>>) -- polling, state 8 (flush_bits), flags 0x01<LF>
(13:17:46.362) (>>) -- flushing remaining byte (bit_index == 0x02)<LF>
(13:17:46.382) (>>) -- done!<LF>
(13:17:46.382) (>>) -- polling, state 9 (done), flags 0x01<LF>
returns HSER_POLL_EMPTY

This behaviour is not seen in the example usage, since heatshrink_decoder_poll is also called until 'poll_sz == 0', rather than just until pres != HSDR_POLL_MORE.

If heatshrink_encoder_poll() is returns 'HSER_POLL_MORE' when finishing and the output buffer is full, the usage can be simplified by only needing to call heatshrink_decoder_finish() once to set the finish flag. If it returns 'HSDR_FINISH_MORE', keep calling heatshrink_encoder_poll() until it returns 'HSER_POLL_EMPTY' i.e. same as when sinking.

Pseudo code without error handling:

do
{
	if (input remaining)
	{
		heatshrink_encoder_sink()
		update input remaining
	}
	
	if (no input remaining)
	{
		fres = heatshrink_encoder_finish()
		if (fres == HSER_FINISH_DONE)
		{
			return
		}
	}
	
	do 
	{
		pres = heatshrink_encoder_poll()
		write output
	} while (pres == HSER_POLL_MORE);
} while (input remaining)

I believe the encoder example in the following pull request has an issue without changing the fall-through to a break in the 'HSES_FLUSH_BITS' case, as this is what I based my implementation on i.e. a single call to heatshrink_encoder_finish():
https://github.com/atomicobject/heatshrink/pull/54/commits

off-by-one in assert in st_step_search / st_yield_backref ?

When running with asserts enabled I hit one of them during encode:

https://github.com/atomicobject/heatshrink/blob/master/heatshrink_encoder.c#L297

matching assert in the decoder here:

https://github.com/atomicobject/heatshrink/blob/master/heatshrink_decoder.c#L272

**

Assertion failed: (match_pos < 1 << ((hse)->window_sz2)), function st_step_search

should the encoder-assertion be "(match_pos <= 1 << ((hse)->window_sz2))" instead ?

and the decoder-assertion be "ASSERT(neg_offset <= mask + 1)"

[Question] What should `heatshrink_encoder_poll` and `heatshrink_encoder_finish` return if nothing has been sunk?

New to heatshrink, and I'm running into an issue where I'm getting stuck in a while loop while polling on what I imagine is an empty encoder.

uint8_t buf[32];
heatshrink_encoder_reset(&hse);

// some logic here that reads some file contents and sinks it into the encoder - it's possible that `heatshrink_encoder_sink` is not called at all here

while (heatshrink_encoder_finish(&hse)==HSER_FINISH_MORE) {
    heatshrink_encoder_poll(&hse, buf, 32, &num_compressed_bytes_output);
    // do something with buf
}

In cases where the "some logic" part calls heatshrink_encoder_sink at least once, this works just fine
When heatshrink_encoder_sink never gets called, that while loop at the end continues forever and seems to be filling the buf with stuff from memory. I had expected heatshrink_encoder_finish(&hse)==HSER_FINISH_MORE to be true if nothing was sunk into the encoder, but clearly I"m missing something.

My current fix is just to sink a blank string into the encoder before starting to poll, and that seems to work out

        size_t num_bytes_sunk = 0;
        uint8_t dummyBuf[1] = "";
        heatshrink_encoder_sink(&hse, dummyBuf, 1,&num_bytes_sunk);

Any advice on the proper way to poll when there might not be anything sunk into the encoder?

Possible improvement to compression ratio

Currently, the match length is encoded as the length - 1 (since a match of 0 would be useless), but the current rule for whether a match is worth using clamps the minimum length to 3. Storing the match length as length - 3 would allow matches of a few more bytes to be represented in the same number of bits, improving the compression ratio in some cases.

On quick review, it also appears the value for max_possible in st_step_search could be increased by this amount. This doesn't affect correctness, but again could make compression more effective.

These changes will break reverse compatibility for old compressed bytestreams, so they should not be integrated until an appropriate version number increase.

"Implicit declaration of function getopt" when building on Ubuntu 14.10

Building Heatshrink on Ubuntu 14.10 fails with the following errors:

cc -std=c99 -g -Wall -Wextra -pedantic  -O3 -Wmissing-prototypes -Wstrict-prototypes -Wmissing-declarations    heatshrink.c heatshrink_encoder.o heatshrink_decoder.o   -o heatshrink
heatshrink.c: In function ‘proc_args’:
heatshrink.c:407:5: warning: implicit declaration of function ‘getopt’ [-Wimplicit-function-declaration]
     while ((a = getopt(argc, argv, "hedi:w:l:v")) != -1) {
     ^
heatshrink.c:416:51: error: ‘optarg’ undeclared (first use in this function)
             cfg->decoder_input_buffer_size = atoi(optarg);
                                                   ^
heatshrink.c:416:51: note: each undeclared identifier is reported only once for each function it appears in
heatshrink.c:432:13: error: ‘optind’ undeclared (first use in this function)
     argc -= optind;
             ^
<builtin>: recipe for target 'heatshrink' failed
make: *** [heatshrink] Error 1

This is because heatshrink.c fails to #include <getopt.h>.

Failure when compressing no data

When calling heatshrink_encoder_finish directly after heatshrink_encoder_reset the subsequent call to heatshrink_encoder_poll indicated by the return value of the heatshrink_encoder_finish call fails with HSER_ERROR_MISUSE. A more robust API should instead either return HSER_FINISH_DONE in heatshrink_encoder_finish directly (there's no pending data to be written) or detect this situation in heatshrink_encoder_poll and return without writing any output to the buffer.

To demonstrate the issue:

#include <stdbool.h>
#include <stddef.h>
#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

#include "heatshrink_encoder.h"

__attribute__((nonnull))
static bool run_compress(const char* buf, size_t bufsize)
{
    size_t destsize = 256;
    char* dest = calloc(destsize, sizeof(char));
  
    uint8_t *ptr = (uint8_t *)buf;
    uint8_t *dst = (uint8_t *)dest;

    HSE_poll_res rsepr;

    heatshrink_encoder hse = {0};
    heatshrink_encoder_reset(&hse);

    size_t processed = 0;

    while(bufsize) {
        switch(heatshrink_encoder_sink(&hse, ptr, bufsize, &processed)) {
        case HSER_SINK_OK:
            break;
        case HSER_SINK_ERROR_NULL:
        case HSER_SINK_ERROR_MISUSE:
        default:
            return false;
        }

        ptr += processed;
        bufsize -= processed;

pollagain:
        switch(heatshrink_encoder_poll(&hse, dst, destsize, &processed)) {
        case HSER_POLL_MORE:
            dst += processed;
            destsize -= processed;
            goto pollagain;
        case HSER_POLL_EMPTY:
            dst += processed;
            destsize -= processed;
            break;
        case HSER_POLL_ERROR_NULL:
        case HSER_POLL_ERROR_MISUSE:
        default:
            return false;
        }
    }

    heatshrink_encoder_finish(&hse);

drainagain:
    switch(rsepr = heatshrink_encoder_poll(&hse, dst, destsize, &processed)) {
    case HSER_POLL_MORE:
        dst += processed;
        destsize -= processed;
        goto drainagain;
    case HSER_POLL_EMPTY:
        dst += processed;
        destsize -= processed;
        break;
    case HSER_POLL_ERROR_NULL:
    case HSER_POLL_ERROR_MISUSE:
    default:
        free(dest);
        printf("Failed encoding with %zu bytes remaining at %zu bytes done (%d).\n", bufsize, destsize, rsepr);
        return false;
    }

    destsize = (size_t)((char*)dst - dest);

    free(dest);

    printf("Successfully encoded into %zu bytes.\n", destsize);

    return true;
}

int main (int argc, char** argv)
{
    (void)argc;
    (void)argv;

    const char data[] = "SAMPLEDATA";

    bool overall = true;

    // Succeeds
    overall &= run_compress(data, strlen(data));

    // Fails
    overall &= run_compress(data, 0);
    
    return overall ? 0 : 1;
}

Expected output:

Successfully encoded into 12 bytes.
Successfully encoded into 0 bytes.

Actual output:

Successfully encoded into 12 bytes.
Failed encoding with 0 bytes remainung at 0 bytes done (-2).

The original code I used for compression (when I noticed the underlaying issue) is like this:

typedef struct compression_state {
    bool active;
    message_t *msg;
    size_t parts;
    size_t bufsize;
    size_t bufused;
    char *buffer;
    heatshrink_encoder *encoder;
} compression_state_t;

static char compress_buffer[1536] = { 0 };

static heatshrink_encoder compress_encoder = { 0 };

static compression_state_t compress_state = {
    .active = false,
    .msg = NULL,
    .parts = 0,
    .bufsize = sizeof(compress_buffer),
    .bufused = 0,
    .buffer = &compress_buffer[0],
    .encoder = &compress_encoder
};

bool compress_init(message_t *msg)
{
    if(compress_state.active) {
        // Trying to start compressor while another one already active
        return false;
    }

    compress_state.msg = msg;
    compress_state.parts = 25;
    compress_state.bufused = 0;

    memset(compress_state.buffer, 0, compress_state.bufsize);

    heatshrink_encoder_reset(compress_state.encoder);

    compress_state.active = true;

    return true;
}

static void compress_crank()
{
    HSE_poll_res res;
    do {
        size_t processed = 0;

        res = heatshrink_encoder_poll(
            compress_state.encoder,
            (uint8_t *)(compress_state.buffer + compress_state.bufused),
            compress_state.bufsize - compress_state.bufused,
            &processed);

        if(HSER_POLL_MORE != res && HSER_POLL_EMPTY != res) {
            break;
        }

        compress_state.bufused += processed;

        if(compress_state.bufused == compress_state.bufsize || !processed) {
            process_buf(compress_state, compress_state.buffer, compress_state.bufused);

            compress_state.parts--;
            compress_state.bufused = 0;
        }
    } while(HSER_POLL_MORE == res);
}

bool compress_push(char* buf, size_t bufsize)
{
    if(!compress_state.active) {
        // Trying to compress data while no compressor active
        return false;
    }

    while(bufsize) {
        size_t processed = 0;

        HSE_sink_res res = heatshrink_encoder_sink(compress_state.encoder, (uint8_t *)buf, bufsize, &processed);
        if(res != HSER_SINK_OK) {
            return false;
        }

        if(processed > bufsize) {
            return false;
        }

        bufsize -= processed;
        buf += processed;

        compress_crank();
    }

    return !!compress_state.parts;
}

bool compress_finish()
{
    if(!compress_state.active) {
        // Trying to finalize compression while no compressor active
        return false;
    }

    HSE_finish_res res = HSER_FINISH_DONE;
    do {
        res = heatshrink_encoder_finish(compress_state.encoder);

        compress_crank();
    } while(HSER_FINISH_MORE == res);

    if(compress_state.bufused) {
        process_buf(compress_state, compress_state.buffer, compress_state.bufused);
        compress_state.bufused = 0;
    }

    return HSER_FINISH_DONE == res;
}

Which ran into a buffer overrun when no call to compress_init and compress_finish took place due to this unexpected behaviour of the heatshrink API.

Decoder for JavaScript?

I would like to encode with heatsink a bitmap image data on a micro-controller and display it in the browser.
Therefore my question is: is there a JavaScript implementation of the decoder?

Java library

Hi, thanks for your great work!
I would like to decompress the data generated by an embedded system on Java.
Have you got a ready-to-use solution?
Should i use a standard lzss java class?

Thanks

inlinung static functions

Is there a rationale for not declaring static functions such as st_backref_index_msb to be __inline? It might be worth giving the extra hint to the compiler.

Fail to decompress buffer

This is really strange, I am able to compress but fail to decompress a "specific" set of data buffer. I am using static linking with following configuration:

#define HEATSHRINK_STATIC_INPUT_BUFFER_SIZE 512
#define HEATSHRINK_STATIC_WINDOW_BITS 10
#define HEATSHRINK_STATIC_LOOKAHEAD_BITS 3

uint8_t compressed[] = {
0x80, 0x80, 0x35, 0x45, 0x01, 0x7B, 0xA9, 0x07, 0x01, 0xF4, 0x01, 0xB0, 0x00, 0xE0, 0xA0, 0x0D, 
0x45, 0x80, 0x5E, 0x82, 0x00, 0x35, 0xA8, 0xB7, 0x4F, 0x7E, 0x10, 0x89, 0x10, 0x03, 0x33, 0x3A, 
0x2C, 0xBA, 0x21, 0x7C, 0x04, 0x87, 0xE1, 0x42, 0xB2, 0x7C, 0xE7, 0xFF, 0x88, 0x70, 0x26, 0x20, 
0x00, 0xC2, 0xE0, 0x70, 0x10, 0x16, 0x82, 0x9F, 0x00, 0x1C, 0x00, 0x70
};

/* Actual input */
uint8_t decompressed[] = {
0x01, 0x00, 0x00, 0x00, 0x45, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0xD4, 0x07, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x01, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x05, 0x00, 0x00, 0x00, 0x16, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x04, 0x00, 0x00, 0x00, 0xA8, 0x6E, 0x3D, 0xF0, 0x08, 0x22, 0x3D, 0xF0, 0x33, 0x45, 0x2E, 0x10, 0x7C, 0x01, 0x00, 0xF0, 0x42, 0x64, 0xF3, 0x3F, 0xF8, 0x0E, 0x00, 0x05, 0x05, 0x05, 0x0B, 0x03, 0x01, 0x64, 0xF3, 0x3F, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00
};

I have also attached my test application. So problem is:

If I try to "only" decompress the compressed buffer, I run into a never ending loop of poll. However if I first compress the "decompressed" buffer and then decompress in the same application everything works. I am not able to understand why?

to check both usecases, make if 0 @line 202 to 1

test_heatshrink.txt

PS: Rename file to .c

Documentation

The documentation could has some gaps:

  • How to use the compression / uncompression state machines, esp. "finish" usage
  • Recommended window, lookahead settings
  • Note static libraries and other additions to the makefile

anyone know of an encoder for js?

I have a backend that uses this to decode and no options on my js frontend for encoding. Just wondering if there's something for js that will encode.

Standardized benchmarking

The benchmarking so far has been a bit ad hoc, and there isn't really enough in place to objectively compare changes such as 6dc62d4 (clearly correct) vs. 5ae5977 (likely to be more effective, though a bit more complicated and potentially more work on the hot path).

Before the next release, there should be a script and/or make bench target that compresses & uncompresses a standard corpus, such as the Canterbury Corpus, and reports on compression ratios and timing at several combinations of window and lookahead sizes.

Compression parameters not stored in compressed data??

I built an application using heatshrink as a library to compress data on an embedded board, with following build params

DYNAMIC_ALLOC 0
WINDOW_BITS 8
LOOKAHEAD_BITS 4
USE_INDEX 0

I compress say 'seq 10000' (the Unix seq command), which is 48894 bytes, and get 27797 bytes.

I then take that that data over to x86 and decode, using the heatshrink binary as provided in the distro. I do not get 48894, instead I get 60k+. WIth the -v option, I can see the decoder is using -w 11 -l 4.

Aren't the compression parameters used to compress the data in the data itself?? It appears not. Does the decode site really have to know the encoder's parameters ??

Hopefully I made some glaring error in my workflow...

Confuse with the code snippet in heatshrink.c

static int encoder_sink_read(config *cfg, heatshrink_encoder *hse,
uint8_t *data, size_t data_sz) {
size_t out_sz = 4096;
uint8_t out_buf[out_sz];
memset(out_buf, 0, out_sz);
size_t sink_sz = 0;
size_t poll_sz = 0;
HSE_sink_res sres;
HSE_poll_res pres;
HSE_finish_res fres;
io_handle *out = cfg->out;

size_t sunk = 0;
do {
    if (data_sz > 0) {
        sres = heatshrink_encoder_sink(hse, &data[sunk], data_sz - sunk, &sink_sz);
        if (sres < 0) { die("sink"); }
        sunk += sink_sz;
    }

    do {
        pres = heatshrink_encoder_poll(hse, out_buf, out_sz, &poll_sz);
        if (pres < 0) { die("poll"); }
        if (handle_sink(out, poll_sz, out_buf) < 0) die("handle_sink");
    } while (pres == HSER_POLL_MORE);

    if (poll_sz == 0 && data_sz == 0) {
        fres = heatshrink_encoder_finish(hse);
        if (fres < 0) { die("finish"); }
        if (fres == HSER_FINISH_DONE) { return 1; }
    }
} while (sunk < data_sz);
return 0;

}

I found the fact that "data_sz" never have been changed. So branch “if (poll_sz == 0 && data_sz == 0)” won't execute except for zero data_sz. I thinks data_sz here might be substituted with sink_sz due to the symmetry.

Fix compilation on MinGW

Hi there,

great bunch of code!

However, when I was testing it, I was stuck to MinGW; but there's nor "err.h".

Workaround: replace

#include <err.h>

by

#if !defined(__MINGW32__)
#include <err.h>
#else

#define err(retval, ...) do { \
    fprintf(stderr, __VA_ARGS__); \
    fprintf(stderr, "Undefined error: %d\n", errno); \
    exit(retval); \
} while(0)

#endif

in heatshrink.c. After that, it compiles and runs flawlessly.

Enable theft tests in Travis-CI

It would be great to get the theft-based tests running on Travis-CI. Currently, they aren't active because we can't assume libtheft will be present and aren't building it. (greatest is used for unit tests, but theft is used for additional property-based testing / stress tests.)

Vendoring theft in the heatshrink repo would work, but I would prefer to set it up as a git subtree (or otherwise reference it externally) so upstream changes would be incorporated. (It's probably not worth setting up greatest as a sub-repo, since it's just one file. It's already included.)

compress stream of array involve negative values

thank you for good library
I have a large uint8 data (sound data). I want to compress it with heatshirink.
for examle assume I want to compress this 240 byte
const char data[] =
{136, 133, 128, 117, 114, 144, 136, 130, 122, 107, 138, 139, 130, 128, 104, 128, 139, 132, 133, 104,
120, 135, 132, 141, 110, 114, 136, 129, 145, 119, 104, 132, 130, 150, 130, 98 , 120, 122, 151, 145,
99 , 114, 110, 141, 162, 108, 111, 107, 126, 168, 120, 111, 110, 116, 171, 136, 111, 114, 110, 162,
150, 108, 111, 107, 151, 165, 119, 110, 107, 138, 169, 130, 108, 105, 122, 168, 142, 108, 113, 110,
153, 154, 108, 111, 111, 144, 169, 123, 110, 105, 126, 174, 139, 111, 105, 108, 160, 153, 111, 99 ,
99 , 150, 169, 125, 98 , 89 , 125, 174, 142, 101, 92 , 105, 160, 162, 108, 98 , 105, 147, 174, 120,
93 , 122, 174, 148, 102, 98 , 107, 150, 168, 122, 98 , 107, 126, 165, 150, 104, 107, 116, 151, 174,
101, 102, 119, 168, 150, 105, 102, 110, 150, 172, 125, 95 , 104, 129, 171, 148, 93 , 92 , 113, 156,
119, 90 , 105, 129, 169, 148, 98 , 99 , 113, 136, 159, 122, 99 , 116, 114, 139, 145, 104, 111, 117,
156, 138, 107, 113, 104, 132, 166, 129, 104, 99 , 107, 163, 171, 119, 101, 90 , 126, 181, 144, 102,
104, 160, 181, 126, 98 , 101, 116, 165, 165, 114, 107, 107, 125, 169, 147, 117, 117, 104, 139, 163,
113, 105, 113, 156, 168, 128, 108, 110, 120, 166, 156, 104, 104, 102, 132, 174, 139, 108, 101, 102};
the compress function dont have good result with this raw data.
the dalta of the data also have negative values also dont have good compression in result.

signed char test_data[256];
for(int i=0;i<240;i++) { test_data[i]=data[i+1]-data[i];}

is the heatshirink good choice for my aim?
is it possible give me an example with such above values?
thanks in advance

Possible data loss at stream end in -w4 -l2

I got an excellent bug report from @unixdj with detailed, minimal steps to reproduce data loss at the end of the stream:

$ echo -n aaaa | ./heatshrink -e -w4 -l2 | ./heatshrink -d -w4 -l2
a   # should be "aaaa"

A regression test has been added.

This is not currently known to occur under any other combinations of settings except -w4 -l2, the absolute minimum.

A change that fixes this without breaking reverse compatibility would be strongly preferred.

Contribution notes

There should be a CONTRIBUTING.md with notes on contributing to the project.

heatshrink binary only decompresses the first block

After patching Heatshrink with the changes in #7 and building with MinGW (gcc 4.7.2), a valid "heatshrink.exe" executable is produced, but this binary fails to decompress data correctly.

$ ./heatshrink.exe -ev LICENSE LICENSE.enc

$ ./heatshrink.exe -dv LICENSE.enc LICENSE.dec

$ md5sum LICENSE*
cd12e61f206f9ee1a6622d0df2f773d1 *LICENSE
e7e84c0c9268751994a1830f9b98d399 *LICENSE.dec
81239e24ddf1a6dde9d5cff95d186911 *LICENSE.enc

$ ls -l LICENSE*
-rw-r--r-- 1 ppemberton Administrators 784 Dec 12 15:11 LICENSE
-rw-r--r-- 1 ppemberton Administrators 387 Dec 22 11:41 LICENSE.dec
-rw-r--r-- 1 ppemberton Administrators 681 Dec 22 11:41 LICENSE.enc

It appears that the decode step is only processing the first block of data in the file.

Interestingly the byte counts in verbose mode are wrong too:

$ ./heatshrink.exe -ev LICENSE LICENSE.enc
LICENSE 14.29 %  784 -> 672 (-w 11 -l 4)

$ ./heatshrink.exe -dv LICENSE.enc LICENSE.dec
LICENSE.enc -3.27 %      367 -> 379 (-w 11 -l 4)

Likely bug in get_bits at end of stream when count > 8

Introduction

I'm in the process of porting heatshrink to typescript for use with a mobile app that deals with heatshrink data from an embedded device. As I was porting and testing the get_bits function I noticed what appears to be an edge case bug.

The comment in get_bits indicates that if there are not enough bits in the input buffer to satisfy count it should suspend immediately and not consume any bits, however the check for this condition linked below appears to miss a case when count > 8 since it only performs the check if there are no more whole bytes left.

Example situation

input_buffer: [1, 2]
input_index: 1
input_size: 2
bit_index: 1
current_byte: 1
count: 10

In this situation the user is asking for 10 bits but there are only 9 bits remaining in the input buffer (8 from the last byte and one remaining in the current byte. As I understand the code it will consume current byte before suspending, which I suspect is incorrect.

Could you please clarify if this was the intended behavior or a latent bug? It appears it could only affect situations with a window size > 8 bits based on a preliminary reading of the usage of get_bits.

Link to Relevant Code

https://github.com/atomicobject/heatshrink/blob/master/heatshrink_decoder.c#L298

My 8086 DOS heatshrink uses (segmented inicomp depacker, single-segment help page depacker, streaming ELD library depacker)

Hi, I wanted to let you know I am using the heatshrink compression format for several parts of my 86 DOS debugger project, lDebug (that's an L).

The first use was in inicomp, my executable depacker for triple-mode executables (DOS kernel, DOS device driver, DOS application). The heatshrink format is used as one of many options. This depacker supports compressed as well as uncompressed data sizes beyond 64 KiB, using 8086 segmented addressing in Real/Virtual 86 Mode. Other than that, it is special in that the destination buffer is always below the source and it is valid for the destination to partially overwrite the source if the source pointer is always above-or-equal the destination pointer. That means the entire data must be stored in memory, but less memory than the full source + full destination is needed.

The second use is for lDebug help pages. This is ready, but not yet used by default. The help pages always fit within less than 64 KiB so most of the segmentation things have been taken out of this one. It comes with a stand alone test program which uses a 256-byte file buffer to hold parts of the source file.

The third use is for the Extensions for lDebug packed library executable.
I wrote some about the latest use on my blog. Like the help page depacker this uses a 256-byte file buffer for the compressed input. It also has a stand alone test program; this one supports input and output files > 64 KiB too.

Unlike the other two depackers, this one uses a 4 KiB circular decompression buffer (thus window size must not be > -w 12), and the implementation of its put_file_data will grab data after a certain depackskip counter reaches zero. The compressed data stream is much larger than 64 KiB, but only the output data of interest is grabbed by put_file_data. If that function has filled its output buffer, it will pause the current depack call. A paused depack call can be resumed when more data is needed from later on in the decompressed data stream. To implement the pausing and resumption, I run depack on its own stack separate from the main application's, and I save all needed working registers on either stack when switching stacks.

Compatibility on PIC32

Hi,

does this code work on PIC32 microcontrollers.

I found some of the dependency files are missing ( like.... getopt.h, err.h... etc)
Could you please inform whether it works on PIC32?

unable to decode the compressed file based on command tool

./test -e -v acc1.txt acc1.z
acc1.txt 70.92 % 27301 -> 7940 (-w 11 -l 4)

./test -d -v acc1.z acc1.unz
...
-- popping 4 bit(s)
-- pulled byte 0x49
-- accumulated 00000004
-- backref count (lsb), got 0x0004 (+1)
Segmentation fault: 11

heatshrink_decoder_poll return HSDR_POLL_MORE if out buffer is exactly sized as the decoded result

in this example output buffer is sized exactly as the input buffer: program remain il loop.
if i resize the output buffer as sizeof(data)+1 it work, and leftover 1 byte

`
const char data[] = "Cappuccetto Rosso, chiamata anche Cappuccetto, e' una bambina che vive con la sua mamma in una casetta vicino al bosco. Un giorno la mamma le consegna un cestino pieno di cose buone da portare alla nonna malata, che vive al di la' della foresta. La mamma "; //256byte data

char compress[1500];
char out[sizeof(data)];

int Encode() {
heatshrink_encoder hse; //512byte in stack
heatshrink_encoder_reset(&hse);
size_t copied;
size_t tot_copied = 0;
const char* in=data;
size_t remaining = sizeof(data);
size_t srclen = remaining;
size_t readed;
do
{
heatshrink_encoder_sink(&hse, (uint8_t*)in, remaining, &readed);
in += readed;
remaining -= readed;
//printf("readed:%lu\n",readed);
heatshrink_encoder_poll(&hse, (uint8_t*)compress + tot_copied, sizeof(compress)- tot_copied,&copied);
tot_copied += copied;
//printf("copied:%lu tot:%lu\n",copied,tot_copied);
} while (remaining);
heatshrink_encoder_finish(&hse);
HSE_poll_res pres;
do
{
pres = heatshrink_encoder_poll(&hse, (uint8_t*)compress + tot_copied, sizeof(compress)- tot_copied,&copied);
tot_copied += copied;
} while (pres != HSER_POLL_EMPTY);
Uart_Printf(">>>>>>>>>>>>src:%lu dst:%lu\n", srclen, tot_copied);
return tot_copied;
}

int Decode(size_t remaining) {
HSD_sink_res sres;
HSD_poll_res pres;
HSD_finish_res fres;
heatshrink_decoder hsd;
heatshrink_decoder_reset(&hsd);
size_t tot_copied = 0;
size_t readed;
size_t copied;
const uint8_t* in = (const uint8_t*)compress;
do
{
sres =
heatshrink_decoder_sink(&hsd, in, remaining, &readed);
in += readed;
remaining -= readed;
//printf("readed:%lu\n",readed);
pres =heatshrink_decoder_poll(&hsd, (uint8_t*)out + tot_copied,sizeof(out)- tot_copied, &copied);
tot_copied += copied;
//printf("copied:%lu tot:%lu\n",copied,tot_copied);
} while (remaining);
fres = heatshrink_decoder_finish(&hsd);
do //REMAIN ON THIS LOOP FOREVER !!!
{
pres =heatshrink_decoder_poll(&hsd, (uint8_t*)out + tot_copied, sizeof(out) - tot_copied, &copied);
tot_copied += copied;
//printf("src:%lu dst:%lu\n",srclen,tot_copied);
} while (pres != HSER_POLL_EMPTY);
Uart_Printf(">>>>>>>>>>>>decompress:%lu\n", tot_copied);
return tot_copied;
}

void main() {
SysTimer_Init();
Uart_Init(Uart_STDOUT, 0); //console
int n=Encode();
Decode(n);
Uart_Println(out);
Uart_Println("end");
while (1);
}
`

Add blocking example implementation of LZSS

This would document some implementation details, and help people who want to make output-compatible implementations in other languages such as Java ( See issue #33 ).

LZSS is quite a bit simpler if the implementation can run in one pass -- much of heatshrink's code is managing the suspending/resuming.

Backlog size calculation

The calculation of the backlog size using FLAG_BACKLOG_IS_PARTIAL and FLAG_BACKLOG_IS_FILLED seems incorrect. The number of bytes shifted in save_backlog() is equal to match_scan_index, but st_step_search() presumes that the number of bytes after the first shift is (window_length - lookahead_sz) after the first shift and (window_length - 1) after subsequent ones.

  • The lines start = end - window_length + 1; in st_step_search() should drop the + 1.
  • The number of bytes shifted the first time may be greater than window_length - lookahead_size if the last token pushed was a backref.
  • When window length equals lookahead size, the formula (window_length - lookahead_sz) gives us 0, when the number shifted obviously shouldn't be that (see also #20). And the backlog is unlikely to be filled in two shifts.
Proposed solution:

Declare that the backlog is always full, initially full of zeroes. This will make the backlog size calculation simple (equals window length!), and is compatible with the decoder (all buffers are initially memset to 0).

[Question] Outpud compat of static vs dynamic version

Is the output of a stream of data encoded with the static implementation compatible with the same data stream using the encoder with dynamic allocation (while using same parameters)?

In particular, when compressing a stream with the static version using parameters for HEATSHRINK_STATIC_WINDOW_BITS and HEATSHRINK_STATIC_LOOKAHEAD_BITS do I simply need to mirror those exact same settings with the dynamic version and decompression should be fine or is there anything else to take care of? What is the process in the other direction when compressing data with the dynamic version: What do I need to take care of such that a given static configuration can decompress the data?

Are there in general any overall guarantees about stream compatibility based on the settings done in the configuration header?

Compiling for ESP8266 fails

Hey,

I've just tried to use the library on a ESP8266 platform. Sadly there are several errors displayed in heatshrink.c:

Error GCC invalid conversion from 'void*' to 'io_handle*' [-fpermissive] 121:37
Error GCC 'heatshrink_encoder_alloc' was not declared in this scope 282:86
Error GCC 'heatshrink_encoder_free' was not declared in this scope 305:32
Error GCC 'heatshrink_decoder_alloc' was not declared in this scope 352:39
Error GCC 'heatshrink_decoder_free' was not declared in this scope 382:32

Any help is appreciated :)

Indexing for a 15-bit window encoder does not work

Hello,

Just thought I should report something I noticed.

The index initialization end exceeds the maximum size of a 16-bit signed integer when the window sized is 15 bits. The input offset is 32768 and cannot be represented in the index table.

The cast from unsigned to signed in the array init loop makes it negative (?) so I don't think it runs.

Restart decompression

Hello,

I am using heatshrink to decompress on-the-fly a file, I try to implement a progressive download but if the device reboot (it is an embedded device), I lose the decompression buffer. I did some tests and found out that I needs to store: head_index, state, current_byte, bit_index and buffers (from the structure heatshrink_decoder) to restart where I left off.

However I don’t really understand how buffers is built and I would like to know if there is a way to rebuild it instead of having to store it.

If it is not possible, is there some sync point where I can restart the decompression? I am using a window size of 9 and a look ahead of 3.

Clarification

Hello,

Is it possible for the expanded size to be lesser than the data that is sunk?

The compressed file is huge (32MB to 9MB), I will receive the compressed file over a network in 1KB chunks and expand to save the file.

Best regards,
Karthik

Fallthrough in usage should be `break`

While looking through my local changes I noticed a small difference between the local changes and the upstream develop branch:

heatshrink/src/heatshrink.c

Lines 410 to 412 in ffd9505

case 'h': /* help */
usage();
/* FALLTHROUGH */

In my local version I imported the fix for #46 using a break in that place, which is IMHO the more natural choice, as you normally don't expect the usage() display to cause any side-effects on your program configuration, thus resulting in the following code locally instead:

        case 'h':               /* help */
            usage();
            break;

Given that usage() internally calls exit(1); this is even stranger, as this comment causes the expectation of the code continuing after the call to usage(); returns.

Encoding with options -w 10 -l 5 and then decoding leads to corrupted file

I did a quick test to evaluate heatshrink compressing a GPX file with parameters -w 10 -l 5 but the decoded file is smaller than the original and is has several zeroed out parts.

I got this result both by compiling heatshrink under linux as by using pip install heatshrink2 under windows:
./heatshrink -e -w 10 -l 5 test.gpx test.hs
./heatshrink -d test.hs test_decoded.gpx

and

python -m heatshrink2 compress -w 10 -l 5 test.gpx test.gpx_hspython_w10_l5
python -m heatshrink2 decompress test.gpx_hspython_w10_l5 test.gpx_decompress_hspython_w10_l5

If I do not specify the -w/-l parameters under python then the result is correct.
Unless I am missing something in the documentation, w 10 / l 5 should be a valid parameter combination (w between 4 and 15 and l between 3 and 10-1), so I assume I am running into a bug ?

Test files in attachment:
test.zip

Question on how to use encoder in this specific example

So, we have this use case:

  • There fixed buffer of data to be sent, let's say 300 bytes
  • The buffer can be filled with entries of varying sizes until the buffer is full. Only complete entries should be contained in the buffer. An entry is between 30 and 60 bytes in length.

Now the idea is to compress this buffer. But here I seem to run into the issue that the poll does not actually append data every time, so I have no way of knowing if the entry I compress actually fits in the buffer.

Is there a simple way to solve this?

zero-copy rom source data possible ?

A use case for heatshrink is storing data in flash compressed and decompressing it to a RAM buffer as needed (the original blog post seems to show the library came from this very need). However, the current design implies copying data to RAM input buffer. this is necessary if the data is in a file, but not if it's in ROM/ MCU flash : in this case the data is readily available to use, however the implementation stores the input buffer right next to decompression window buffer.

Would it be possible to provide separate source data as a pointer to ROM + decompression scratch data - avoiding sink interruptions, unnecessary copies and RAM consumption or is copying and proximity of those buffers necessary by the algorithm ?

Thanks, makapuf

Compilation fails on Centos 7 / Python3.6 (c compiler std definition)

I'm running into an issue building heatshink2 on Centos 7 / Python3.6 with GCC 4.8.5:

heatshrink2/_heatshrink/heatshrink_encoder.c:430:10: note: use option -std=c99 or -std=gnu99 to compile your code

This looks exactly the same as this issue and the same solution resolved it for me:

diff --git a/setup.py b/setup.py
index f7d2c10..7cf6967 100644
--- a/setup.py
+++ b/setup.py
@@ -21,6 +21,7 @@ def find_version():
 EXT = '.pyx' if USE_CYTHON else '.c'

 heatshrink_module = Extension('heatshrink2.core',
+                              extra_compile_args=['-std=gnu99'],
                               include_dirs=[
                                   'heatshrink2/_heatshrink'
                               ],

Should I open a PR with this change?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.