Git Product home page Git Product logo

Comments (6)

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024

So this is touching relatively common problem.

Let me start with some analogy. Imagine two systems which need to exchange a lot of numbers. The problem is one is producing a text file with numbers separated by commas and the other one is expecting them to be separated by tabs.
The numbers are encoded exactly the same way, but the separators are different.

There is a BLOCK format and STREAM format.
BLOCKs are the the data, the meat, the numbers, while STREAM is wrapping all block together with all headers, commas and/or tabs.

A STREAM contains a lot of BLOCKs. STREAM has stream header, block header, BLOCK, block header, BLOCK, block header, BLOCK, etc...

Both stream and block headers are quite small (let's say 8 bytes), while block carry data and are large (64K - 4MB). Make sense?

SH BH BLOCK BH BLOCK BH BLOCK BH BLOCK BH BLOCK

Stream header carries information what is the overall length of the stream, what compression method was used, what was default block size, etc. Block header say something about block, how many bytes it actually had before compression, and how many bytes after compression, etc. And then there is a BLOCK of compressed LZ4 data.

BLOCK format in lz4net is absolutely the same any any other implementation of LZ4, but STREAM format is not. It is custom wrapping of LZ4 blocks.

You did not give me enough information about how lz4net is used, and lz4net had already two APIs: BLOCK and (custom) STREAM. If it expecting just BLOCK then it is almost trivial. If it expects STREAM you would need to implement your own code for headers (~8 bytes) but you can copy blocks without any modifications.

You will need to implement this file in Rust:
https://github.com/MiloszKrajewski/lz4net/blob/master/src/LZ4/LZ4Stream.cs

(If you need to implement more it means you are doing something wrong)

Also it will be even less if Rust just need to write or just read, as you need to implement only half of this custom stream handling.

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024

I assume there are no further questions.

from k4os.compression.lz4.

Michaelschnabel-DM avatar Michaelschnabel-DM commented on July 18, 2024

Hi,
sorry for not responding on time. I was on a festival and had no internet connection the last few days.

Thank you very much for your response!

This is the code to compress and decompress my byte arrays with the legacy lz4net nuget package:
Bildschirmfoto 2023-06-11 um 18 39 35

Is there the Stream format involved or should it be only blocks with block headers?

Please let me know if you need any further information.

from k4os.compression.lz4.

Tarcontar avatar Tarcontar commented on July 18, 2024

Ahh used the wrong account to respond sorry.
And this is how i encode it in rust using the lz4 crate:
Bildschirmfoto 2023-06-12 um 09 13 28

when i try to decode this with the c# code above i get a "LZ4 block is corrupted or invalid length has been given" but the length is correct, I just checked that.

Regards
Tarcontar

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024
  • Good news first: as your C# code was using BLOCK (the one with raw data) mode only (which is identical) it should not be a problem.
  • Bad news: can you change C# code? your Encode is killing all potential performance benefits with Concat? Are you able to modify this code, or it has been shipped and forgotten?
  • Bad news: you are using STREAM (the one with headers) mode in Rust.
  • Good news: although I don't know details, BLOCK mode must be available in Rust (because it is used by STREAM mode anyway), you just used wrong set of methods.

Anyway, it should be simple and not too much code. If you case about performance of C# Encode (Decode is ok) you should tweak C# code a little bit, and in Rust you need to find BLOCK mode API (not STREAM).
If you need help I can definitely improve it, guessing roughly 10x (byte array allocation with GetBytes and Concat is absolute killer).

Looking at this:
https://docs.rs/crate/lz4/latest/source/src/block/mod.rs

it seems you need: compress_to_buffer and decompress_to_buffer

DO NOT use: prepend_size: true because this is not part of BLOCK mode specification, it seems it is just some extension specific to this library.

from k4os.compression.lz4.

Tarcontar avatar Tarcontar commented on July 18, 2024

Hi,

thank you so much for your help!
I got it working with using the block compression only from lz4 rust crate as follows:

fn decode(&self, in_data: Vec<u8>) -> Vec<u8> {
        let mut array: [u8; BYTE_LENGTH_OF_UINT32] = [0; BYTE_LENGTH_OF_UINT32];
        array.copy_from_slice(&in_data[0..BYTE_LENGTH_OF_UINT32]);
        let _compressed_size = u32::from_le_bytes(array);
        array.copy_from_slice(&in_data[BYTE_LENGTH_OF_UINT32..2 * BYTE_LENGTH_OF_UINT32]);
        let uncompressed_size = u32::from_le_bytes(array);

        decompress(&in_data[2 * BYTE_LENGTH_OF_UINT32..in_data.len()], Some(uncompressed_size as i32)).unwrap()
    }

    fn encode(&self, in_data: Vec<u8>) -> Vec<u8> {
        let original_size = in_data.len() as u32;
        let compressed_data = compress(&in_data, Some(CompressionMode::DEFAULT), false).unwrap();
        let compressed_size = (compressed_data.len() + 4) as u32;

        let compressed = compressed_size.to_le_bytes();
        let original = original_size.to_le_bytes();

        [[compressed, original].concat(), compressed_data].concat()
    }

Thank you very much for your time and effort!

Best Regards
Tarcontar

from k4os.compression.lz4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.