Git Product home page Git Product logo

microsoft.io.recyclablememorystream's Introduction

Microsoft.IO.RecyclableMemoryStream NuGet Version

A library to provide pooling for .NET MemoryStream objects to improve application performance, especially in the area of garbage collection.

Get Started

Install the latest version from NuGet

Install-Package Microsoft.IO.RecyclableMemoryStream

Purpose

Microsoft.IO.RecyclableMemoryStream is a MemoryStream replacement that offers superior behavior for performance-critical systems. In particular it is optimized to do the following:

  • Eliminate Large Object Heap allocations by using pooled buffers
  • Incur far fewer gen 2 GCs, and spend far less time paused due to GC
  • Avoid memory leaks by having a bounded pool size
  • Avoid memory fragmentation
  • Allow for multiple ways to read and write data that will avoid extraneous allocations
  • Provide excellent debuggability and logging
  • Provide metrics for performance tracking

Features

  • The semantics are close to the original System.IO.MemoryStream implementation, and is intended to be a drop-in replacement as much as possible.
  • Rather than pooling the streams themselves, the underlying buffers are pooled. This allows you to use the simple Dispose pattern to release the buffers back to the pool, as well as detect invalid usage patterns (such as reusing a stream after it’s been disposed).
  • RecyclableMemoryStreamManager is thread-safe (streams themselves are inherently NOT thread safe).
  • Implementation of IBufferWrite<byte>.
  • Support for enormous streams through abstracted buffer chaining.
  • Extensive support for newer memory-related types like Span<byte>, ReadOnlySpan<byte>, ReadOnlySequence<byte>, and Memory<byte>.
  • Each stream can be tagged with an identifying string that is used in logging - helpful when finding bugs and memory leaks relating to incorrect pool use.
  • Debug features like recording the call stack of the stream allocation to track down pool leaks
  • Maximum free pool size to handle spikes in usage without using too much memory.
  • Flexible and adjustable limits to the pooling algorithm.
  • Metrics tracking and events so that you can see the impact on the system.

Build Targets

At least MSBuild 16.8 is required to build the code. You get this with Visual Studio 2019.

Supported build targets in v2.0 are: net462, netstandard2.0, netstandard2.1, and netcoreapp2.1 (net40, net45, net46 and netstandard1.4 were deprecated). Starting with v2.1, net5.0 target has been added.

Testing

A minimum of .NET 5.0 is required for executing the unit tests. Requirements:

  • NUnit test adapter (VS Extension)
  • Be sure to set the default processor architecture for tests to x64 (or the giant allocation test will fail)

Benchmark tests

The results are available here

Change Log

Read the change log here.

How It Works

RecyclableMemoryStream improves GC performance by ensuring that the larger buffers used for the streams are put into the gen 2 heap and stay there forever. This should cause full collections to happen less frequently. If you pick buffer sizes above 85,000 bytes, then you will ensure these are placed on the large object heap, which is touched even less frequently by the garbage collector.

The RecyclableMemoryStreamManager class maintains two separate pools of objects:

  1. Small Pool - Holds small buffers (of configurable size). Used by default for all normal read/write operations. Multiple small buffers are chained together in the RecyclableMemoryStream class and abstracted into a single stream.
  2. Large Pool - Holds large buffers, which are only used when you must have a single, contiguous buffer, such as when you plan to call GetBuffer(). It is possible to create streams larger than is possible to be represented by a single buffer because of .NET's array size limits.

A RecyclableMemoryStream starts out by using a small buffer, chaining additional ones as the stream capacity grows. Should you ever call GetBuffer() and the length is greater than a single small buffer's capacity, then the small buffers are converted to a single large buffer. You can also request a stream with an initial capacity; if that capacity is larger than the small pool block size, multiple blocks will be chained unless you call an overload with asContiguousBuffer set to true, in which case a single large buffer will be assigned from the start. If you request a capacity larger than the maximum poolable size, you will still get a stream back, but the buffers will not be pooled. (Note: This is not referring to the maximum array size. You can limit the poolable buffer sizes in RecyclableMemoryStreamManager)

There are two versions of the large pool:

  • Linear (default) - You specify a multiple and a maximum size, and an array of buffers, from size (1 * multiple), (2 * multiple), (3 * multiple), ... maximum is created. For example, if you specify a multiple of 1 MB and maximum size of 8 MB, then you will have an array of length 8. The first slot will contain 1 MB buffers, the second slot 2 MB buffers, and so on.
  • Exponential - Instead of linearly growing, the buffers double in size for each slot. For example, if you specify a multiple of 256KB, and a maximum size of 8 MB, you will have an array of length 6, the slots containing buffers of size 256KB, 512KB, 1MB, 2MB, 4MB, and 8MB.

Pool Image Comparison

Which one should you use? That depends on your usage pattern. If you have an unpredictable large buffer size, perhaps the linear one will be more suitable. If you know that a longer stream length is unlikely, but you may have a lot of streams in the smaller size, picking the exponential version could lead to less overall memory usage (which was the reason this form was added).

Buffers are created, on demand, the first time they are requested and nothing suitable already exists in the pool. After use, these buffers will be returned to the pool through the RecyclableMemoryStream's Dispose method. When that return happens, the RecyclableMemoryStreamManager will use the properties MaximumFreeSmallPoolBytes and MaximumFreeLargePoolBytes to determine whether to put those buffers back in the pool, or let them go (and thus be garbage collected). It is through these properties that you determine how large your pool can grow. If you set these to 0, you can have unbounded pool growth, which is essentially indistinguishable from a memory leak. For every application, you must determine through analysis and experimentation the appropriate balance between pool size and garbage collection.

If you forget to call a stream's Dispose method, this could cause a memory leak. To help you prevent this, each stream has a finalizer that will be called by the CLR once there are no more references to the stream. This finalizer will raise an event or log a message about the leaked stream.

Note that for performance reasons, the buffers are not ever pre-initialized or zeroed-out. It is your responsibility to ensure their contents are valid and safe to use buffer recycling. If you want to avoid accidental data leakage, you can set ZeroOutBuffer to true. This will zero out the buffers on allocation and before returning them to the pool. Be aware of the performance implications.

Usage

You can jump right in with no fuss by just doing a simple replacement of MemoryStream with something like this:

class Program
{
    private static readonly RecyclableMemoryStreamManager manager = new RecyclableMemoryStreamManager();

    static void Main(string[] args)
    {
        var sourceBuffer = new byte[] { 0, 1, 2, 3, 4, 5, 6, 7 };
        
        using (var stream = manager.GetStream())
        {
            stream.Write(sourceBuffer, 0, sourceBuffer.Length);
        }
    }
}
IMPORTANT Note that RecyclableMemoryStreamManager should be declared once and it will live for the entire process lifetime. It is perfectly fine to use multiple pools if you desire, especially if you want to configure them differently.

To facilitate easier debugging, you can optionally provide a string tag, which serves as a human-readable identifier for the stream. This can be something like “ClassName.MethodName”, but it can be whatever you want. Each stream also has a GUID to provide absolute identity if needed, but the tag is usually sufficient.

using (var stream = manager.GetStream("Program.Main"))
{
    stream.Write(sourceBuffer, 0, sourceBuffer.Length);
}

You can also provide an existing buffer. It’s important to note that the data from this buffer will be copied into a buffer owned by the pool:

var stream = manager.GetStream("Program.Main", sourceBuffer, 
                                    0, sourceBuffer.Length);

You can also change the parameters of the pool itself:

var options = new RecyclableMemoryStreamManager.Options()
{
    BlockSize = 1024,
    LargeBufferMultiple = 1024 * 1024,
    MaximumBufferSize = 16 * 1024 * 1024,
    GenerateCallStacks = true,
    AggressiveBufferReturn = true,
    MaximumLargePoolFreeBytes = 16 * 1024 * 1024 * 4,
    MaximumSmallPoolFreeBytes = 100 * 1024,
};

var manager = new RecyclableMemoryStreamManager(options);

You should usually set at least BlockSize, LargeBufferMultiple, MaximumBufferSize, MaximumLargePoolFreeBytes, and MaximumSmallPoolFreeBytes because their appropriate values are highly dependent on the application.

Usage Guidelines

While this library strives to be very general and not impose too many restraints on how you use it, its purpose is to reduce the cost of garbage collections incurred by frequent large allocations. Thus, there are some general guidelines for usage that may be useful to you:

  1. Set the BlockSize, LargeBufferMultiple, MaximumBufferSize, MaximumLargePoolFreeBytes and MaximumSmallPoolFreeBytes properties to reasonable values for your application and resource requirements. Important!: If you do not set MaximumFreeLargePoolBytes and MaximumFreeSmallPoolBytes there is the possibility for unbounded memory growth!
  2. Always dispose of each stream exactly once.
  3. Most applications should not call ToArray and should avoid calling GetBuffer if possible. Instead, use GetReadOnlySequence for reading and the IBufferWriter methods GetSpan\GetMemory with Advance for writing. There are also miscellaneous CopyTo and WriteTo methods that may be convenient. The point is to avoid creating unnecessary GC pressure where possible.
  4. Experiment to find the appropriate settings for your scenario.

A working knowledge of the garbage collector is a very good idea before you try to optimize your scenario with this library. An article such as Garbage Collection, or a book like Writing High-Performance .NET Code will help you understand the design principles of this library.

When configuring the options, consider questions such as these:

  • What is the distribution of stream lengths that I expect?
  • How many streams will be in use at one time?
  • Is GetBuffer called a lot? How much use of large pool buffers will I need?
  • How resilient to spikes in activity do I need to be? i.e., How many free bytes should I keep around in case?
  • What are my physical memory limitations on the machines where this will be used?

IBufferWriter<byte>: GetMemory, GetSpan, and Advance

RecyclableMemoryStream implements IBufferWriter so it can be used for zero-copy encoding and formatting. You can also directly modify the stream contents using GetSpan\GetMemory with Advance. For instance, writing a BigInteger to a stream:

var bigInt = BigInteger.Parse("123456789013374299100987654321");

using (var stream = manager.GetStream())
{
    Span<byte> buffer = stream.GetSpan(bigInt.GetByteCount());
    bigInt.TryWriteBytes(buffer, out int bytesWritten);
    stream.Advance(bytesWritten);
}

GetReadOnlySequence

GetReadOnlySequence returns a ReadOnlySequence that can be used for zero-copy stream processing. For example, hashing the contents of a stream:

using (var stream = manager.GetStream())
using (var sha256Hasher = IncrementalHash.CreateHash(HashAlgorithmName.SHA256))
{
    foreach (var memory in stream.GetReadOnlySequence())
    {
        sha256Hasher.AppendData(memory.Span);
    }
    
    sha256Hasher.GetHashAndReset();
}

GetBuffer and ToArray

RecyclableMemoryStream is designed to operate primarily on chained small pool blocks. To access these blocks use GetReadOnlySequence for reading and GetSpan\GetMemory with Advance for writing. However, if you still want a contiguous buffer for the whole stream there are two APIs which RecyclableMemoryStream overrides from its parent MemoryStream class:

  • GetBuffer - If possible, a reference to the single block will be returned to the caller. If multiple blocks are in use, they will be converted into a single large pool buffer and the data copied into it. In all cases, the caller must use the Length property to determine how much usable data is actually in the returned buffer. If the stream length is longer than the maximum allowable stream size, a single buffer will still be returned, but it will not be pooled. If no possible contiguous buffer can be returned due to .NET array-size limitations, then an OutOfMemoryException will be thrown.
  • ToArray - It looks similar to GetBuffer on the surface, but is actually significantly different. In ToArray the data is always copied into a new array that is exactly the right length for the full contents of the stream. This new buffer is never pooled. Users of this library should consider any call to ToArray to be a bug, as it wipes out many of the benefits of RecyclableMemoryStream completely. However, the method is included for completeness, especially if you are calling other APIs that only take a byte array with no length parameter. An event is logged on all ToArray calls.

You can optionally configure the RecyclableStreamManager.ThrowExceptionOnToArray property to disallow calls to RecyclableMemoryStream.ToArray. If this value is set to true, then any calls to ToArray will result in a NotSupportedException.

Metrics and Hooks

ETW Events

RecyclableMemoryStream has an EventSource provider that produces a number of events for tracking behavior and performance. You can use events to debug leaks or subtle problems with pooled stream usage.

Name Level Description
MemoryStreamCreated Verbose Logged every time a stream object is allocated. Fields: guid, tag, requestedSize, actualSize.
MemoryStreamDisposed Verbose Logged every time a stream object is disposed. Fields: guid, tag, allocationStack, disposeStack.
MemoryStreamDoubleDispose Critical Logged if a stream is disposed more than once. This indicates a logic error by the user of the stream. Dispose should happen exactly once per stream to avoid resource usage bugs. Fields: guid, tag, allocationStack, disposeStack1, disposeStack2.
MemoryStreamFinalized Error Logged if a stream has gone out of scope without being disposed. This indicates a resource leak. Fields: guid, tag, allocationStack.
MemoryStreamToArray Verbose Logged whenever ToArray is called. This indicates a potential problem, as calling ToArray goes against the concepts of good memory practice which RecyclableMemoryStream is trying to solve. Fields: guid, tag, stack, size.
MemoryStreamManagerInitialized Informational Logged when the RecyclableMemoryStreamManager is initialized. Fields: blockSize, largeBufferMultiple, maximumBufferSize.
MemoryStreamNewBlockCreated Verbose Logged whenever a block for the small pool is created. Fields: smallPoolInUseBytes.
MemoryStreamNewLargeBufferCreated Verbose Logged whenever a large buffer is allocated. Fields: requiredSize, largePoolInUseBytes.
MemoryStreamNonPooledLargeBufferCreated Verbose Logged whenever a buffer is requested that is larger than the maximum pooled size. The buffer is still created and returned to the user, but it can not be re-pooled. Fields: guid, tag, requiredSize, allocationStack.
MemoryStreamDiscardBuffer Warning Logged whenever a buffer is discarded rather than put back in the pool. Fields: guid, tag, bufferType (Small, Large), reason (TooLarge, EnoughFree).
MemoryStreamOverCapacity Error Logged whenever an attempt is made to set the capacity of the stream beyond the limits of RecyclableMemoryStreamManager.MaximumStreamCapacity, if such a limit is set. Fields: guid, tag, requestedCapacity, maxCapacity, allocationStack.

Event Hooks

In addition to the logged ETW events, there are a number of .NET event hooks on RecyclableMemoryStreamManager that you can use as triggers for your own custom actions:

Name Description
BlockCreated A new small pool block has been allocated.
BufferDiscarded A buffer has been refused re-entry to the pool and given over to the garbage collector.
LargeBufferCreated A large buffer has been allocated.
StreamCreated A new stream has been created.
StreamDisposed A stream has been disposed.
StreamDoubleDisposed A stream has been disposed twice, indicating an error.
StreamFinalized A stream has been finalized, which means it was never disposed before it went out of scope.
StreamLength Reports the stream's length upon disposal. Can allow you to track stream metrics.
StreamConvertedToArray Someone called ToArray on a stream.
StreamOverCapacity An attempt was made to expand beyond the maximum capacity allowed by the pool manager.
UsageReport Provides stats on pool usage for metrics tracking.

Debugging Problems

Once you start introducing re-usable resources like the pooled buffers in RecyclableMemoryStream, you are taking some of the duties of the CLR away from it and reserving them for yourself. This can be error-prone. See the Usage section above for some guidelines on making your usage of this library successful.

There are a number of features that will help you debug usage of these streams.

Stream Identification

Each stream is assigned a unique GUID and, optionally, a tag.

The GUID is unique for each stream object and serves to identify that stream throughout its lifetime.

A tag is an optional, arbitrary string assigned by the caller when a stream is requested. This can be a class name, function name, or some other meaningful string that can help you identify the source of the stream's usage. Note that multiple streams will contain the same tag. They identify where in your code the stream originated; they are not unique stream identifiers.

Callstack Recording

If you set the GenerateCallStacks property on RecyclableMemoryStreamManager to true, then major operations on the stream, such as allocation and disposal, will record the call stack of those method calls. These will be reported in ETW events in the event of detected programming errors such as double-dispose or finalization.

Turning this feature on causes a very significant negative performance impact, so should only be done when actively investigating a problem.

Double-Dispose Protection

If Dispose is called twice on the same stream, an event is logged with the relevant stream's information. If GenerateCallStacks is turned on, this will include the call stacks for allocation and both disposals.

Non-Dispose Detection

If Dispose is never called for a stream, the finalizer will eventually be called by the CLR, and an event will be logged with relevant stream information, including the allocation stack, if enabled. Buffers for finalized streams are lost to the pool, and this should be considered a bug.

Concurrency

Concurrent use of RecyclableMemoryStream objects is not supported under any circumstances. However, RecyclableMemoryStreamManager is thread-safe and can be used to retrieve streams in a multi-threading scenario.

ETW Events

Use an ETW event monitor such as PerfView to collect and analyze ETW events.

Many of these events contain helpful clues about the stream in question, including its tag, guid, and stacks (if enabled).

Reference

Read the API documentation here.

License

This library is released under the MIT license.

Support

Check the support policy here

microsoft.io.recyclablememorystream's People

Contributors

aleks-ivanov avatar andrepostiga avatar arikt-ms avatar benmwatson avatar brantburnett avatar dependabot[bot] avatar doubleyewdee avatar faustodavid avatar grbell-ms avatar hankovich avatar injun-lee avatar jamesqo avatar jaxelr avatar kerams avatar lbargaoanu avatar lechu445 avatar lifefreedom avatar magicpanda0618 avatar michens avatar mosdav avatar ndrwrbgs avatar ninedan avatar paulomorgado avatar selvasingh avatar shiftkey avatar stakx avatar sungam3r avatar tgnm avatar tylerdm avatar vbfox avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

microsoft.io.recyclablememorystream's Issues

RecyclableMemoryStream class should override CopyToAsync

Because RecyclableMemoryStream is not overriding the CopyToAsync it means it's using the default one in Stream class which generates a byte[] buffer each time it's called.

This kind of defeats the purpose of using RecyclableMemoryStream.

Not sure why Stream doesn't support overriding CopyTo as well as this would also benefit from being overridden.

Nuget package contains debugging symbols without source

In the [Publish Symbols] step during a build on our build server, the following error occurs:

##[error]Indexed source information could not be retrieved from 'F:\tfsagent_work\13\s\src[project]\bin\Microsoft.IO.RecyclableMemoryStream.pdb'. Symbol indexes could not be retrieved.

This seems to be caused by the presence of a .pdb file in the Nuget package and the absence of the corresponding source files.

Assembly on NuGet in Debug release?

I'm testing something with Benchmark.NET and when I attempt to benchmark some code referencing this library, I get the message that Microsoft.IO.RecyclableMemory is non-optimized. Is it possible that the public NuGet package is accidentally including the debug config?

image

Questions about defaults

I'm a little surprised that the default block size is 128 KB, effectively allocating these directly inside the large object heap. Isn't one of the purpose of this library avoiding LOH allocations?

I know it's configurable so that's not a big deal. :) I'm just curious to know how these defaults were chosen.

Incorrect/superfluous MaxStreamLength check in Write(byte[] buffer, int offset, int count) method

The lines 553-558 in the method RecyclableMemoryStream.Write(byte[] buffer, int offset, int count) implement a check whether the required capacity for the write operation exceeds MaxStreamLength:

long requiredBuffers = (end + blockSize - 1) / blockSize;
            
if (requiredBuffers * blockSize > MaxStreamLength)
{
    throw new IOException("Maximum capacity exceeded");
}

This check is not only broken, but also superfluous.

It is broken for any scenario where MaxStreamLength is not a multiple of blockSize and streams with a capacity of exactly or close to MaxStreamLength. When a write operation is involving the last block, an IOException will be incorrectly thrown, since the expression requiredBuffers * blockSize will necessarily become greater than MaxStreamLength in such cases.

It is also superfluous, and it should be safe to simply delete it. The code lines just above it are doing the same check about MaxStreamLength being exceeded (and this one seems to be correct).

Is the Dispose(false) implementation safe?

Hey implementors, great job on this project!

I had one question while looking at the code: I notice that you are "touching" another managed object (the manager) from the Dispose method on the stream, even when it is invoked from the finalizer. I was under the impression that was verboten, since the manager could have been gc'ed by the time the finalizer runs.

Is that no longer the case, generally?

CLS-compliance

Can RecyclableMemoryStreamManager be made CLS-Compliant?

Guid generation is expensive

Hello

I've recently started using this library to optmise a hot path in order to reduce allocations but I've found that creating a new RecyclableMemoryStream is quite expensive:

image

The id field (Guid) on the RecyclableMemoryStream is being initialized on the constructor regardless of whether ETW tracing is enabled or not and generating a new Guid is expensive (interop call + 16 bytes allocated).

I've changed the constructor code to only initialize the field if ETW tracing is enabled and it now looks like this:

image

I've pushed my changes in case you find this solution sensible.

Documentation on library beyond blog post?

The work I am doing for my project deals with decompressing a file (like 10MB to a 40MB stream) and then running those streams through some patch program which in turn may output a 40MB stream that is then fed once again into a patch program many times. In this instance it means I am using like 5 or 6 40MB streams with a few seconds.

I've found this library significantly reduces memory usage but I can't really figure out what the options do. the only documentation specified is a blog post but it doesn't really explain what any of the options actually do (I don't deal with a lot of memory-related things). I have also found that the memory allocated for the pools doesn't seem to be returned, or returnable, unless I'm missing something. E.g. the app seems to allocate about 600MB of data (on top of 200 idle) but after it ends the app still sits at 800MB used. I understand you want to keep these pools around and allocated but is there a way to get rid of them? I only use them for a certain task, so once that task has finished, keeping it around is not beneficial. But the documentation has nothing that even looks at this kind of scenario.

The lack of intellisense makes using this library extremely difficult as I have almost no idea what some of the options do.

Stream could track current buffer / offset to improve performance

Every time a write happens, we convert the stream's position to a block/offset tuple to know where we need to start writing.

For few, long writes, this is fine. For many short writes the overhead could add up.

Instead, we could track the current block and offset index so we already know where we need to start writing.

Unit tests: Replace `[ExpectedException]` with `Assert.Throws`

There are several unit tests in the UnitTests project that check whether exceptions are thrown when they should be. This is done in two different ways:

  • By placing an [ExpectedException(typeof(TException))] custom attribute on the test method, or
  • by wrapping a method call or short code block with Assert.Throws<TException>(() => …);.

Using [ExpectedException] is problematic for at least two reasons:

  • It is too coarse-grained. It doesn't matter which part of a test method throws. However, speaking in terms of Arrange-Act-Assert, only the Act part of a test method should be tested for exceptions.
  • This attribute is no longer supported starting with NUnit version 3, so it would stand in your way if you wanted to migrate to a more recent version of NUnit.

Suggestion: For these two reasons (and, to a minor degree, for consistency's sake) all test methods using [ExpectedException] should be converted to Assert.Throws.

Plan to support more than 2GB?

Hello,

I'm from Dicom Server , and our service is heavily using RecyclableMemoryStream for performance improvement.

One issue we are facing is handling with >2GB file. Our service deals with medical images, and it happens some time the file size could be super big and exceed 2 GB.

I see the 2GB limitation from here , so wondering if there is a plan to support more than 2GB ?

Thanks
Peng Chen

Allow multiple Dispose calls OR expose Disposed state

A MemoryStream can be disposed twice or multiple times, but the RecyclableMemoryStream throws an exception.

Consider this example:

var stream = storage.MemoryStreamManager.GetStream();
blob.DownloadToStream(stream); // may throw a StorageException
using (var reader = new StreamReader(stream, Encoding.UTF8)) {
    // read data
}

Here, the stream should be disposed if DownloadToStream fails. However, when put in a using block, an exception will be raised when the reader is disposed (and configured to close the underlying stream, which is the default).

According to the Dispose docs disposing multiple times should be allowed:

If an object's Dispose method is called more than once, the object must ignore all calls after the first one. The object must not throw an exception if its Dispose method is called multiple times. Instance methods other than Dispose can throw an ObjectDisposedException when resources are already disposed.

Alternatively the disposed field could be exposed with a property to allow for explicit checks:

var stream = storage.MemoryStreamManager.GetStream();
try {... } finally { if (!stream.Disposed) stream.Dispose() }

... although supporting multiple Dispose calls would be more convenient and less clunky.

Add ReadFully() method

Can you add a ReadFully(Stream stream) method to RecyclableMemoryStream?

This could save an intermediate buffer allocation and a lot of memory copying.

I took a stab at it with the caveat that the RecyclableMemoryStream can't be using a large buffer. You may want to remove this constraint.

You'll also probably want to have RecyclableMemoryStreamManager.GetStream() return RecyclableMemoryStream so that the method is accessible without a cast.

        public void ReadFully(Stream stream) {
            if (this.largeBuffer != null) {
                throw new InvalidOperationException();
            }

            while (true) {
                EnsureCapacity(this.length + 1);
                var blockAndOffset = GetBlockAndRelativeOffset(this.position);
                var block = this.blocks[blockAndOffset.Block];

                int count = stream.Read(block, blockAndOffset.Offset, block.Length - blockAndOffset.Offset);
                if (count == 0) {
                    break;
                }
                long end = (long)this.position + count;
                this.position = (int)end;
                this.length = Math.Max(this.position, this.length);
            }
        }

The following piece of code hangs the write call

RecyclableMemoryStreamManager manager = new RecyclableMemoryStreamManager();
manager.GenerateCallStacks = true;
RecyclableMemoryStream stream = new RecyclableMemoryStream(manager, "Tag1");
StreamWriter writer = new StreamWriter(stream);
for(long i=0;i<1024 *1024 * 1024;i++)
{
writer.Write('c');
}
for(long i=0;i<1024 *1024 * 1024;i++)
{
writer.Write('c');
}
writer.Flush();

The second loop hangs at I value 1073611776

Are the event invocations inside RecyclableMemoryStreamManager susceptible to race conditions?

I noticed that RecyclableMemoryStreamManager raises events in the following manner:

if (this.BlockCreated != null)
{
    this.BlockCreated();
}

which appears to be susceptible to a race condition: what if another thread unsubscribes the last handler from the event just before the invocation, but after the null check? Since the buffer manager class is supposed to be thread-safe, this should probably be fixed.

In his book "CLR via C#" (see pp. 264-265 in the 3rd edition), Jeffrey Richter recommends the following pattern for safely raising an event:

EventHandler blockCreated = Interlocked.CompareExchange(ref this.BlockCreated, null, null);
if (blockCreated != null)
{
    blockCreated();
}

Alternatively, if C# 6 syntax can be used in this project:

this.BlockCreated?.Invoke();

IntelliSense Documentation Not Shown

Hi,

When installing RMS from NuGet in VS the IntelliSense doesn't show the comments/documentation. I think this is just a case of enabling the XML Documentation file in the Project Settings.

Cheers,
Indy

Clarification needed

The RecyclableMemoryStreamManager exposes an event called "StreamCreated". Is that triggered when a stream from the pool is used or only when a new stream is actually allocated?

Questions on memory usage and configuration

After spending some time reading the code, it appears default behavior is to pool everything with no upper limit. I wasn't able to find that information in the documentation.

Can you check my understanding?

MaximumFreeLargePoolBytes is never set, so the buffer gets added to the pool on dispose here.

The max potential memory usage of the large pool is

(maximumBufferSize/largeBufferMultiple) * MaximumFreeLargePoolBytes

or, stated another way:

the number of pools * MaximumFreeLargePoolBytes

If I don't call GetBuffer() and I don't call GetStream(asContiguousBuffer=true), I will only ever use small blocks.

So, if you rarely call GetBuffer(), a valid sizing strategy would be to create medium-sized small blocks (say 1/4 of the size of your expected common stream size), a large MaximumFreeSmallPoolBytes, and a MaximumFreeLargePoolBytes size of 1 byte to force unpooled large buffer allocation in the rare case you need it (if it was set to 0 large buffers would be pooled and retained indefinitely).

Copying part of a RecyclableMemoryStream to another Stream

Hello,

The WriteTo method allows me to copy an entire RecyclableMemoryStream to another Stream without any buffer allocation. However, if I need to copy only part of a RecyclableMemoryStream to another Stream, I have to use a buffer.

What is your opinion about creating an overload of WriteTo that takes an offset and length?

ETW is not supported on all platforms

For example, Unity on some platforms (e.g. Android) replaces the body of methods marked with EventAttribute with throw new NotSupportedException("linked away"), which results in runtime exceptions.

It would be nice to have automatic detection of ETW support (at worst ability to disable ETW from application code, e.g. set Events.Writer to null).

Typo in NuGet package title

According to the .nuspec file, this project's <title> is Micrisift.IO.RecyclableMemoryStream. This typo won't affect an Install-Package (since only the <id> is relevant for that), but the incorrect title can be seen e. g. on nuget.org:

micrisift

RecyclableMemoryStream exhausts all available RAM if requestedSize > (int.MaxValue - BlockSize)

When a RecyclableMemoryStream is created where requestedSize satisfies the condition:

(numberOfRequiredBlocks * BlockSize) > int.MaxValue
where
numberOfRequiredBlocks = Ceil( requestedSize / blockSize )

then the method RecyclableMemoryStream.EnsureCapacity loops until all available virtual address space is exhausted (running x64 code with much more than 10GB RAM available; thus it is not a problem with regard to available RAM or 32bit virtual address space).

The problem occurs at the execution of the following while loop:

while (this.Capacity < newCapacity)
{
    blocks.Add((this.memoryManager.GetBlock()));
}

When given a requestedSize that satisfied the condition shown at the beginning of my post, the property Capacity will eventually overflow, thus not allowing the while-loop to exit. (1)

Proposed fix: Either make the Capacity property a long type (2), or add a sanity check that prevents Capacity from overflowing...


Foot notes:

(1) Strictly speaking, whether the while-loop eventually exits depends -- aside from the virtual memory size -- on the chosen block size and an "appropriate" requestedSize value. With the default block size of 128K, the Capacity property will overflow and eventually reach the value 0 again, thus turning the while-loop effectively into an infinite loop. When choosing some other 'odd' block size, Capacity property will still overflow, but not necessarily reach precisely 0 again, thus possibly allowing the while loop to eventually exit -- but chances are good that all RAM has been exhausted already before that would happen.

(2) It would be nice if RecyclableMemoryStream would support stream sizes larger than 2GB (i.e. not using int types but long types for all concerned method arguments/variables/fields/properties). It is not really a show stopper though, as one can split large data blobs into multiple MemoryStream objects that are wrapped into a custom Stream class representing those multiple MemoryStreams as one continuous stream...

Why is there no default static pool manager?

I do not know if this is just a silly question or a feature request. Suppose there are multiple separately maintained assemblies allocating RecyclableMemoryStreams. Should each assembly declare its own static RecyclableMemoryStreamManager? This goes somewhat against the grain of the idea of pooling, although I understand that block size requirements might be different. But still I assume that the defaults may be quite sane for a sizable range of applications.

So I wonder why is there no default, per-appdomain RecyclableMemoryStreamManager? It would be accessed through a static property (RecyclableMemoryStreamManager.Default), and the RecyclableMemoryStream could then have a parameterless constructor (and other managerless constructors) that would use the default manager. As many other things in the BCL, the default pool may be made configurable through app.config as well.

Possible memory leak

I am using the RecycableMemoryStream inside a using clause,
when I profile a long runnig process I see it has ~100K instances of RecycableMemoryStream and ~200K instances of byte[] .
Is it possible this is due to a memory leak? I am not using more than a few instances at a time.

NetStandard Support?

It would be helpful if this library was re-targeted to build for .Net Standard, so we could use it in .net core projects and other places.

Thanks!

Use in HttpClient

Is there anyway of making HttpClient use this?

I'm processing files from blob storage of up to 100Mb in a webjob and I'm trying to minimise the amount of disk I/O and also memory churn.

Any other suggestions gratefully received.

Add a setting to disallow ToArray

Calling ToArray on a RecyclableMemoryStream should be considered a bug because it wipes out all the benefits of using the library. While the method does work as intended, we could add a setting on RecyclableMemoryStreamManager to cause an exception to be thrown when called.

GetBuffer() throws System.UnauthorizedAccessException on .NET Core

When calling GetBuffer() in a .Net Core app, an UnauthorizedAccessException ("MemoryStream's internal buffer cannot be accessed.") is thrown.

The reason is that GetBuffer() is not marked as override when compiled for netstandard. Therefore, the original implementation of MemoryStream.GetBuffer() will be called, unless the call is explicitly made on a variable of type RecyclableMemoryStream.

#if NETSTANDARD1_4
   public byte[] GetBuffer()
#else
   public override byte[] GetBuffer()
#endif

I don't know what was the motivation for this conditional compilation, since MemoryStream.GetBuffer() has always been virtual.

make the library portable

unless there is a strong dependency on desktop CLR, can this lib be portable? especially can add coreCLR target?

Make ToArray Obsolete

As it is not recommended to use ToArray method because it defeats the purpose of using this library it should be marked with Obsolete attribute.

Buffer size absurdly large

I was benchmarking this library when I noticed that the buffer length from GetBuffer was very large. E.G., a 2056 length string resulted in a buffer with a length of 131072. Moreover, my serialization of serval floats (length 146) also resulted in a buffer with a length of 131072.

I really would not like memory chunks of 130kb. That scares me.

I feel like I am missing a very basic implementation detail. Am I possibly doing it wrong or is this a bug ?

Why does Close call Dispose?

A closed MemoryStream still has uses such as GetBuffer which are now throwing NullReferenceException when migrating to RecyclableMemoryStream. Is it possible not to dispose on close?

Performance opportinities

I have just played with a profiler and massively increased the performance of my RMS fork for my primary use case of write/read small buffers of several Kbs. Spreads/Spreads@cbeac8f

  • Aggressively inlining all methods that are used internally. E.g. Capacity.
  • Making all methods related to event tracing conditional to a compiler symbol to avoid needless method calls.
  • Using System.Buffers array pool (modified to optionally return exact buffer sizes when requested) instead of ConcurrentBag.
  • Using vectorized memory copy instead of Buffer.BlockCopy.
  • Pooling of RMS instances using Roslyn's ObjectPool implementation.
  • Manual implementation of WriteByte instead of redirecting to Write via a temporary byte[1]. I also added non-virtual SafeWriteByte.
  • Using ThrowHelper for better inlining and less code size.

Some of this is directly applicable to the original implementation. My fork is now incompatible with upstream so I cannot create a PR, just put it there FYI & discussion.

I had the idea of integrating RMS with System.Buffers shared pool for a long time - not only this is faster, but also reduces memory by avoiding a separate pool. But the default shared pool implementation could return a buffer larger that the requested size, so I created a custom implementation with a parameter exactSize. Without shared pool modifications RMS could work with ArraySegments internally - that shouldn't be slower. And given that the shared pool returns a buffer that could be larger only by a power of two, that larger buffer could be split into several blocks and RMS will just increase capacity by more than one block. In Dispose() we just need to check if currently returning ArraySegment has the same buffer as previous and do not return it twice.

Additional Memory<byte> support

#68 added some support for RecyclableMemoryStream to read and write Span<byte>/Memory<byte>. It would be good to also add support for creating a RecyclableMemoryStream from an existing Memory<byte>.

RecyclableMemoryStream.GetStream already has an overload that takes an existing byte[] buffer:

MemoryStream GetStream(string tag, byte[] buffer, int offset, int count)

So the suggestion here is to add a new overload that takes an existing Memory<byte> buffer:

MemoryStream GetStream(string tag, Memory<byte> buffer)

Trying to read a huge file

Hi all,

I am trying to read a huge file (2.5 gigabyte) file, but it allways ends up with: "Unhandled Exception: System.IO.IOException: Maximum capacity exceeded" and the Ram is allways around 100% taken. What I am doing wrong?

           `int blockSize = 10;
            int largeBufferMultiple = 1024 * 1024;
            int maxBufferSize = 16 * largeBufferMultiple;

            var manager = new RecyclableMemoryStreamManager(blockSize,
                                                            largeBufferMultiple,
                                                            maxBufferSize);

            manager.GenerateCallStacks = true;
            manager.AggressiveBufferReturn = true;
            manager.MaximumFreeLargePoolBytes = maxBufferSize * 4;
            manager.MaximumFreeSmallPoolBytes = 100 * blockSize;
            RecyclableMemoryStream memoryStream = new RecyclableMemoryStream(manager);
            using (FileStream fileStream = File.OpenRead(@"C:\Temp\test.bin"))
            {
                


                // MemoryStream memoryStream = new MemoryStream();
                fileStream.CopyTo((memoryStream);`

Stream disposed too soon

using (var ms = memoryStreamManager.GetStream(nameof(ZipOutputStreamHelper)))
{
    using (StreamWriter writer = new StreamWriter(ms))
    using (JsonTextWriter jsonWriter = new JsonTextWriter(writer))
    {
        var ser = new JsonSerializer();
        ser.Serialize(jsonWriter, sequence);
        jsonWriter.Flush();
    }
    return ms.ToArray();
}

We use a pattern like this in our code. Basically, the implementation of the JsonTextWriter is such that it doesn't finish off writing to the stream until it is disposed. Calling Flush doesn't seem to finalise it. This worked fine as a MemoryStream, but not as a RecycledMemoryStream. Because Streams call Dispose on their underlying Streams we end up disposing the RecycledMemoryStream before we access the array.

Is there a way to change this at all? Or is it going to be a case of finding a way to correct the behaviour of the JsonTextWriter so that we can finalise the stream before calling ToArray?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.