Git Product home page Git Product logo

Comments (5)

MiloszKrajewski avatar MiloszKrajewski commented on August 15, 2024

I'll take a look at this next week. Can you check reading blicks instead of single bytes? GetByte is least tested API.

from k4os.compression.lz4.

RLashofRegas avatar RLashofRegas commented on August 15, 2024

Interesting. So when I simply read the stream as blocks using Stream.Read() it does not throw an exception (the streams are equivalent), this code:

private static byte[] ReadStream(Stream stream, int length, int blockSize = 1024)
{
    byte[] bytes = new byte[length + blockSize];
    int numBytesToRead = length;
    int numBytesRead = 0;
    do
    {
        int n = stream.Read(bytes, numBytesRead, blockSize);
        numBytesRead += n;
        numBytesToRead -= n;
    }
    while (numBytesToRead > 0);

    return bytes;
}

private static void ReadBlocks(Stream decompressionStream, Stream intermediateStream)
{
    byte[] decompressionBytes = ReadStream(decompressionStream, (int)intermediateStream.Length);
    byte[] intermediateBytes = ReadStream(intermediateStream, (int)intermediateStream.Length);

    for (int i = 0; i < decompressionBytes.Length; i++)
    {
        if (decompressionBytes[i] != intermediateBytes[i])
        {
            throw new Exception("Bytes not equal.");
        }
    }
}

However, when I read them using the TarInputStream from the SharpZipLib library they are not equivalent. Namely, tarStream.GetNextEntry() throws "Header checksum invalid" here. This is the original error that led me down the rabbit hole of comparing the streams, and interestingly TarInputStream is calling Read() not ReadByte() but it's still causing problems. SharpZipLib code for that is here. If you note the comment there about "We have found EOF, and the record is not full!" that is the problem that I referenced in the original post that is causing the garbage data at the end of the stream because SharpZipLib is just returning the same bytes that were read on the previous call to ReadBlock. My code for reading the tar archives is as follows (again, this fails on tarStream.GetNextEntry() which is after intermediateTarStream.GetNextEntry() so the intermediate stream did not throw the same header checksum invalid error):

private static void ReadTar(LZ4DecoderStream decompressionStream, MemoryStream intermediateStream)
{
    using (var tarStream = new TarInputStream(decompressionStream, Encoding.UTF8))
    using (var intermediateTarStream = new TarInputStream(intermediateStream, Encoding.UTF8))
    {
        TarEntry tarEntry, intermediateTarEntry = null;
        while (true)
        {
            intermediateTarEntry = intermediateTarStream.GetNextEntry();
            tarEntry = tarStream.GetNextEntry();
            if (tarEntry == null || intermediateTarEntry == null)
            {
                if (tarEntry == null && intermediateTarEntry == null)
                {
                    break;
                }
                else 
                {
                    throw new Exception("One stream ended, the other still has data.");
                }
            }

            if (tarEntry.IsDirectory || intermediateTarEntry.IsDirectory)
            {
                if (tarEntry.IsDirectory && intermediateTarEntry.IsDirectory)
                {
                    continue;
                }
                else
                {
                    throw new Exception("One stream found a directory and the other didn't");
                }
            }

            using (var originalEntryContents = new MemoryStream())
            using (var intermediateEntryContents = new MemoryStream())
            {
                tarStream.CopyEntryContents(originalEntryContents);
                intermediateTarStream.CopyEntryContents(intermediateEntryContents);

                ReadByteByByte(originalEntryContents, intermediateEntryContents);
            }

        }
    }
}

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on August 15, 2024

I did some testing with Tar streams and it works fine.

I mean I understand that it might be a bug but it is not a general problem, there must be something specific to data. I actually guess that it might be something about how SharpZipLib call LZ4Stream (for example, tries to read -1 bytes), because I understand that if you decompress first and then use TarInputStream it is all fine? it fails ONLY if TarInputStream reads directly from LZ4Stream?

Anyway, I think without actual files it becomes a wild goose chase. Maybe you can generate fake data having same problem?

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on August 15, 2024

@RLashofRegas any luck reproducing it?

from k4os.compression.lz4.

RLashofRegas avatar RLashofRegas commented on August 15, 2024

Sorry been super busy with other things but no. I will try next chance I get but I had tried previously and was not able to reproduce with other files.

from k4os.compression.lz4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.