Comments (12)
ReadAllToArray
implementation is bad, actually very bad and prone to corruption.
You rent an array (Rent
), load data into it (Read
), return a slice of it (AsSpan.Slice
), and the immediately return array to pool (Return
) so it can be rented by other thread (or not even that, just next Rent
would return it). This means two threads would assume they exclusively own it but in fact they would both read/write from/to the same array.
My problem is that allocating memory is costly
The whole thing can be implemented with ArrayPool (although not the way you did it) so with "zero" memory pressure (not really zero, but "amortized zero" over long time).
Then it stays like this for now.
Oh, it should not. Current implementation of ReadAllToArray
is plain wrong and dangerous (ie: it can start saving garbage into destination file).
from k4os.compression.lz4.
You mean that i rent a minimum size = compressed.Lengh + uncompressed size, to have one array, that contains initial the compressed data and later at end of array the uncompressed data?
Renting one array with space for both uncompressed and compressed data is actually not a bad idea, but not what I meant.
By "not the way you did it" I just meant the you were using ArrayPool
wrong by renting it and returning it before finishing processing.
I fixed that after posting this image. After Slice().ToArray() is in invoked.
Sticking .ToArray()
on it works but negates all the benefit of pool as .ToArray()
is memory allocation, so you rent first to immediately allocate anyway, making renting completely useless.
from k4os.compression.lz4.
I don't understand what you are trying to do.
Could you give me an example using existing API and I can help you simplifying it.
from k4os.compression.lz4.
I have a file in a special format and contains several compressed data. This format contains various also metadata.
In this metadata it says what kind of compression, where it can be found, how big it is and how big it is decompressed.
So I have a FileStream, and take these compressed data chunks out by stream as well (via a stream wrapper). And these I then want to simply decode using LZ4Codec.Decode(stream).
LZ4Stream.Decode(..) does not work at this place, because no LZ4 header etc. exists.
from k4os.compression.lz4.
I still don't understand how the thing you are describing works.
But I can do some guesses:
- It is a custom stream, not LZ4 stream, so
LZ4Stream
won't help - There is a chance then that LZ4 chunks are just LZ4 BLOCKs (not STREAMs) so you can try to use
LZ4Codec.Decode(...)
(Codec
, notStream
)
from k4os.compression.lz4.
Thanks for hints.
At the moment I am doing this as well:
As I mentioned before, I have no idea about compression algorithms.
Is there a special reason why only arrays can be used with LZ4Codec and no stream is supported?
from k4os.compression.lz4.
var decompressed = new byte[decompressedSize];
depends on size, but most of the time ArrayPool<byte>
would be better
LZ4Codec.Decode(source.ReadAllToArray(), decompressed.AsSpan());
No idea what source.ReallAllToArray()
does, I assume it is reading one chunk? Not really All
? Right? At the same time I guess ReadAllToArray()
seems to allocate array on every call, so you immediately put pressure on GC (and slow things down).
using MemoryStream decompressedMemoryStream = new(decompressed);
decompressedMemoryStream.CopyTo(destination);
This looks weird. Why create a memory stream while you could just write to destination
?
destination.Write(decompressed, 0, decompressedSize);
Is there a special reason why only arrays can be used with LZ4Codec and no stream is supported?
Because it is simple things to do. Reading and writing arrays from and to streams is trivial. Convert.ToBase64String(...)
does not support streams either, and focuses on things it does best: converting byte[]
to string
, nothing more. Nobody asks "why I can't just Convert.ToBase64(Stream, Stream)
" though.
from k4os.compression.lz4.
ReadAllToArray() exists only so that I can use LZ4Codec.Decode.
source.ReadAllToArray() - use already ArrayPool
The decompression code I first kind of cobbled together to compile the old code from Net 4.5 framework days. At least the tests are green. Before only byte[] was used.
My problem is that allocating memory is costly, at least in my case, since all I have is a stream and it also ends up in a FileStream. If I call the stuff 100k times the performance is not so good, ArrayPool will come good here - but is also not the solution. Since I don't know how big the data is, the memory might not be enough either.
And not to mention multithreading.
Ok I thank you. Then it stays like this for now.
from k4os.compression.lz4.
Yes you are right.
I fixed that after posting this image. After Slice().ToArray() is in invoked.
I believe the ArrayPool.Shared is ThreadSafe, but the resulting allocated area of course is not.
But thanks for the hint.
*But if I could pass a stream right away, I wouldn't have to allocate anything. :P
from k4os.compression.lz4.
The whole thing can be implemented with ArrayPool (although not the way you did it) so with "zero" memory pressure (not really zero, but "amortized zero" over long time).
You mean that i rent a minimum size = compressed.Lengh + uncompressed size, to have one array, that contains initial the compressed data and later at end of array the uncompressed data?
from k4os.compression.lz4.
After Slice().ToArray() is in invoked.
This is allocation, unnecessary if you care about performance.
I believe the ArrayPool.Shared is ThreadSafe,
Yes. It is.
but the resulting allocated area of course is not.
Exactly.
*But if I could pass a stream right away, I wouldn't have to allocate anything. :P
There is some truth to it. If someone else writes all your code you don't need to write any.
Not sure how ChunkReaderStreamWrapper
really works, but assuming it is a pretend-stream over fragment of a real stream, I would say implementation is trivial:
void CopyLz4Block(Stream source, Stream target, int chunkLength, int decompressedSize)
{
var sourceBuffer = ArrayPool<byte>.Shared.Rent(chunkLength);
var targetBuffer = ArrayPool<byte>.Shared.Rent(decompressedSize);
try
{
source.Read(sourceBuffer, 0, chunkLength);
LZ4Codec.Decode(sourceBuffer, targetBuffer);
target.Write(targetBuffer, 0, decompressedSize);
}
finally
{
ArrayPool<byte>.Shared.Return(sourceBuffer);
ArrayPool<byte>.Shared.Return(targetBuffer);
}
}
No allocations, and the logic is just 3 lines: Read
, Decode
, Write
from k4os.compression.lz4.
If someone else writes all your code you don't need to write any.
🎉
Not sure how ChunkReaderStreamWrapper really works, but assuming it is a pretend-stream over fragment of a real stream...
Yes, it supports only access to the real chunk data of the stream - it knows the position and the size of the compressed data.
By "not the way you did it" I just meant the you were using ArrayPool wrong by renting it and returning it before finishing processing.
Sticking .ToArray() on it works but negates all the benefit of pool as .ToArray() is memory allocation, so you rent to immediately allocate anyway, making renting completely useless.
I am right there with you
Your solution is good - "I can't see the forest for the trees" (I mean, I have too many code issues in the solution to think clearly and I can't see the simplest solutions).
from k4os.compression.lz4.
Related Issues (20)
- h264 nal unit invalid HOT 2
- Implicit reference to System.Runtime.CompilerServices.Unsafe appears to cause wrong version to be selected HOT 3
- Chained encoder does not produce the same result as the lz4 cli when chaining is enabled HOT 4
- Migrating from L4zNet to new implementation
- InvalidDataException: LZ4 frame magic number expected HOT 10
- Implement frame format with K4os.Compression.LZ4 HOT 2
- Question about random access, i want to make something similar to the original dictionaryRandoimAccess.c example HOT 5
- why Length return 7 HOT 1
- Question regarding Lz4net archived repo HOT 6
- LZ4 Legacy format support HOT 2
- LZ4EncoderStream Dispose Issue HOT 3
- Compression Issue? HOT 3
- LZ4 with Maui HOT 7
- Stream API CopyTo got NotSupportedException HOT 4
- LZ4DecoderStream ReadAsync calls original stream's sync Read internally HOT 2
- [Question] Block vs Stream vs Pickler to store compressed bitmaps in memory HOT 10
- Adding non compressed data at the end of the stream HOT 2
- The `int LZ4Frame#Encode` overloads don't close the frame HOT 2
- The repository contains the code with the incompatible licenses HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k4os.compression.lz4.