Git Product home page Git Product logo

Comments (12)

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024 1

ReadAllToArray implementation is bad, actually very bad and prone to corruption.

You rent an array (Rent), load data into it (Read), return a slice of it (AsSpan.Slice), and the immediately return array to pool (Return) so it can be rented by other thread (or not even that, just next Rent would return it). This means two threads would assume they exclusively own it but in fact they would both read/write from/to the same array.

My problem is that allocating memory is costly

The whole thing can be implemented with ArrayPool (although not the way you did it) so with "zero" memory pressure (not really zero, but "amortized zero" over long time).

Then it stays like this for now.

Oh, it should not. Current implementation of ReadAllToArray is plain wrong and dangerous (ie: it can start saving garbage into destination file).

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024 1

You mean that i rent a minimum size = compressed.Lengh + uncompressed size, to have one array, that contains initial the compressed data and later at end of array the uncompressed data?

Renting one array with space for both uncompressed and compressed data is actually not a bad idea, but not what I meant.

By "not the way you did it" I just meant the you were using ArrayPool wrong by renting it and returning it before finishing processing.

I fixed that after posting this image. After Slice().ToArray() is in invoked.

Sticking .ToArray() on it works but negates all the benefit of pool as .ToArray() is memory allocation, so you rent first to immediately allocate anyway, making renting completely useless.

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024

I don't understand what you are trying to do.
Could you give me an example using existing API and I can help you simplifying it.

from k4os.compression.lz4.

Maruhl avatar Maruhl commented on July 18, 2024

I have a file in a special format and contains several compressed data. This format contains various also metadata.

In this metadata it says what kind of compression, where it can be found, how big it is and how big it is decompressed.

So I have a FileStream, and take these compressed data chunks out by stream as well (via a stream wrapper). And these I then want to simply decode using LZ4Codec.Decode(stream).

LZ4Stream.Decode(..) does not work at this place, because no LZ4 header etc. exists.

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024

I still don't understand how the thing you are describing works.

But I can do some guesses:

  • It is a custom stream, not LZ4 stream, so LZ4Stream won't help
  • There is a chance then that LZ4 chunks are just LZ4 BLOCKs (not STREAMs) so you can try to use LZ4Codec.Decode(...) (Codec, not Stream)

from k4os.compression.lz4.

Maruhl avatar Maruhl commented on July 18, 2024

Thanks for hints.

At the moment I am doing this as well:
grafik

As I mentioned before, I have no idea about compression algorithms.

Is there a special reason why only arrays can be used with LZ4Codec and no stream is supported?

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024
var decompressed = new byte[decompressedSize];

depends on size, but most of the time ArrayPool<byte> would be better

LZ4Codec.Decode(source.ReadAllToArray(), decompressed.AsSpan());

No idea what source.ReallAllToArray() does, I assume it is reading one chunk? Not really All? Right? At the same time I guess ReadAllToArray() seems to allocate array on every call, so you immediately put pressure on GC (and slow things down).

using MemoryStream decompressedMemoryStream = new(decompressed);
decompressedMemoryStream.CopyTo(destination);

This looks weird. Why create a memory stream while you could just write to destination?

destination.Write(decompressed, 0, decompressedSize);

Is there a special reason why only arrays can be used with LZ4Codec and no stream is supported?

Because it is simple things to do. Reading and writing arrays from and to streams is trivial. Convert.ToBase64String(...) does not support streams either, and focuses on things it does best: converting byte[] to string, nothing more. Nobody asks "why I can't just Convert.ToBase64(Stream, Stream)" though.

from k4os.compression.lz4.

Maruhl avatar Maruhl commented on July 18, 2024

ReadAllToArray() exists only so that I can use LZ4Codec.Decode.
source.ReadAllToArray() - use already ArrayPool

grafik

The decompression code I first kind of cobbled together to compile the old code from Net 4.5 framework days. At least the tests are green. Before only byte[] was used.

My problem is that allocating memory is costly, at least in my case, since all I have is a stream and it also ends up in a FileStream. If I call the stuff 100k times the performance is not so good, ArrayPool will come good here - but is also not the solution. Since I don't know how big the data is, the memory might not be enough either.
And not to mention multithreading.

Ok I thank you. Then it stays like this for now.

from k4os.compression.lz4.

Maruhl avatar Maruhl commented on July 18, 2024

Yes you are right.
I fixed that after posting this image. After Slice().ToArray() is in invoked.
I believe the ArrayPool.Shared is ThreadSafe, but the resulting allocated area of course is not.

But thanks for the hint.

*But if I could pass a stream right away, I wouldn't have to allocate anything. :P

from k4os.compression.lz4.

Maruhl avatar Maruhl commented on July 18, 2024

The whole thing can be implemented with ArrayPool (although not the way you did it) so with "zero" memory pressure (not really zero, but "amortized zero" over long time).

You mean that i rent a minimum size = compressed.Lengh + uncompressed size, to have one array, that contains initial the compressed data and later at end of array the uncompressed data?

from k4os.compression.lz4.

MiloszKrajewski avatar MiloszKrajewski commented on July 18, 2024

After Slice().ToArray() is in invoked.

This is allocation, unnecessary if you care about performance.

I believe the ArrayPool.Shared is ThreadSafe,

Yes. It is.

but the resulting allocated area of course is not.

Exactly.

*But if I could pass a stream right away, I wouldn't have to allocate anything. :P

There is some truth to it. If someone else writes all your code you don't need to write any.

Not sure how ChunkReaderStreamWrapper really works, but assuming it is a pretend-stream over fragment of a real stream, I would say implementation is trivial:

void CopyLz4Block(Stream source, Stream target, int chunkLength, int decompressedSize)
{
  var sourceBuffer = ArrayPool<byte>.Shared.Rent(chunkLength);
  var targetBuffer = ArrayPool<byte>.Shared.Rent(decompressedSize);
    
  try
  {
    source.Read(sourceBuffer, 0, chunkLength); 
    LZ4Codec.Decode(sourceBuffer, targetBuffer);
    target.Write(targetBuffer, 0, decompressedSize);
  }
  finally
  {
    ArrayPool<byte>.Shared.Return(sourceBuffer);
    ArrayPool<byte>.Shared.Return(targetBuffer);
  }
}

No allocations, and the logic is just 3 lines: Read, Decode, Write

from k4os.compression.lz4.

Maruhl avatar Maruhl commented on July 18, 2024

If someone else writes all your code you don't need to write any.

🎉

Not sure how ChunkReaderStreamWrapper really works, but assuming it is a pretend-stream over fragment of a real stream...

Yes, it supports only access to the real chunk data of the stream - it knows the position and the size of the compressed data.

By "not the way you did it" I just meant the you were using ArrayPool wrong by renting it and returning it before finishing processing.

Sticking .ToArray() on it works but negates all the benefit of pool as .ToArray() is memory allocation, so you rent to immediately allocate anyway, making renting completely useless.

I am right there with you

Your solution is good - "I can't see the forest for the trees" (I mean, I have too many code issues in the solution to think clearly and I can't see the simplest solutions).

from k4os.compression.lz4.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.