If you do a large multipart upload with your own Partial</co

Also see <a class="issue-link js-issue-link" data-error-text="Failed to load title" da

I have raised <a class="issue-link js-issue-link" data-error-text="Failed to load titl

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Direct memory exhaustion large multipart upload about yada HOT 18 CLOSED

juxt commented on September 24, 2024

Direct memory exhaustion large multipart upload

from yada.

Comments (18)

malcolmsparks commented on September 24, 2024

Thanks Mike. I have seen the java.lang.OutOfMemoryError: Direct buffer memory on Linux too recently. I did some investigation by printing out the refCnt on each buffer and it seemed a little strange - buffers had high refCnt values (e.g. 5 or 6) which would often descend monotonically. I can replicate the bug in much the same way, by steaming a local large file.

My current approach to this is to consider the bug is either in yada (bY), manifold (bM), aleph (bA) or netty (bN).

It could be in bY, but the issue also manifests itself outside of multipart (iirc), since I've seen the same thing with large files streaming directly to a temp dir, with no multipart involved. The only code in yada involved here is minimal. See https://github.com/juxt/yada/blob/master/dev/src/yada/dev/upload.clj which adds a :consumer which yada calls directly with the stream of Netty ByfeBuf instances it gets from Aleph. So I'm not sure how the bug could be in yada.

At this stage I'd like to reach out to @ztellman for any wisdom he can impart but since this is OSS I can't exactly escalate this through management ;)

So let's assume it's a bug in manifold, aleph or netty. In order to discount manifold I'm working on a replacement for aleph and netty (undertow and xnio) on the undertow branch. This is nearly done and should point to whether it's bM or (bA or bN). In any case, it should be relatively straight-forward to create a failing test for manifold+aleph to present to Zach. Or he might indicate that yada is not calling manifold correctly.

I will prioritise this issue for work over the Easter break.

from yada.

malcolmsparks commented on September 24, 2024

Also see clj-commons/aleph#214

from yada.

malcolmsparks commented on September 24, 2024

I have raised clj-commons/aleph#224 and will follow up in due course with a smaller failing test that doesn't involve yada. @mfikes thanks for this and sorry I haven't got a better answer for you yet.

from yada.

mfikes commented on September 24, 2024

Thanks @malcolmsparks. I appreciate your help!

from yada.

ztellman commented on September 24, 2024

Can you try -Dio.netty.allocator.numDirectArenas=0 as a JVM flag, and see if the issue disappears?

from yada.

mfikes commented on September 24, 2024

@ztellman Yes, I can confirm that that JVM flag causes the issue to disappear. (I tried it on OS X.)

from yada.

ztellman commented on September 24, 2024

Interestingly, I've seen that exact error on OS X, but never on Linux. I've used that flag to work around it on my dev box. Honestly, I never dug into it too much, because it didn't affect production. You might try -XX:MaxDirectMemorySize=1g (or whatever is appropriate for your machine) and see what happens. If it doesn't, it might be worth talking to the Netty folks and see if there's an official workaround.

from yada.

malcolmsparks commented on September 24, 2024

I have some failing tests that I'll try over the weekend which stress the
stack quite hard. I'll report on Sunday.

Malcolm Sparks
Director

Email: [email protected]
Web: https://juxt.pro

JUXT LTD.
Software Consulting, Delivery, Training

On 18 March 2016 at 17:42, Mike Fikes [email protected] wrote:

@ztellman https://github.com/ztellman Yes, I can confirm that that JVM
flag causes the issue to disappear. (I tried it on OS X.)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#75 (comment)

from yada.

malcolmsparks commented on September 24, 2024

I've tried the MaxDirectMemorySize=1g before, it doesn't help (or at least,
it only delays the problem)

Malcolm Sparks
Director

Email: [email protected]
Web: https://juxt.pro

JUXT LTD.
Software Consulting, Delivery, Training

On 18 March 2016 at 17:49, Malcolm Sparks [email protected] wrote:

I have some failing tests that I'll try over the weekend which stress the
stack quite hard. I'll report on Sunday.

Malcolm Sparks
Director

Email: [email protected]
Web: https://juxt.pro

JUXT LTD.
Software Consulting, Delivery, Training

On 18 March 2016 at 17:42, Mike Fikes [email protected] wrote:

@ztellman https://github.com/ztellman Yes, I can confirm that that JVM
flag causes the issue to disappear. (I tried it on OS X.)

—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#75 (comment)

from yada.

ztellman commented on September 24, 2024

And just to be clear, I've seen this issue with a vanilla Java Netty application, so I'm fairly sure the issue is not with the Clojure elements of the stack.

EDIT: Though it is possible that the Clojure stuff is exacerbating the problem, somehow.

from yada.

malcolmsparks commented on September 24, 2024

That's good to know. Zach, thanks for your help here - netty still feels a bit like the 'dark arts' to me.

from yada.

ztellman commented on September 24, 2024

When you do your tests, you should add (aleph.netty/leak-detector-level! :paranoid) and see what sort of "I got GCed before my reference count went to zero" warnings you get.

from yada.

d-t-w commented on September 24, 2024

I found this issue while working through my own unrelated problem with netty direct buffer pooling.

I don't use yada, and have only a passing familiarity with aleph, but I have been using clojure and netty for years so thought I might chip in some info.

My first observation is that the issue is intermittent. Sometimes memory usage would blow out almost immediately, sometimes after a minute or two of execution, and sometimes the message would be received fully with zero issue.

My second observation is that message consumption is very slow. Consuming a 1GB file fully takes about 10 minutes. In comparison a simple netty server with only the HTTP codec that drops all chunks on the floor will receive that same 1GB file in about 3 seconds.

Thirdly, I tried briefly reproducing this error with the latest versions of bidi/yada/aleph and all three trial runs completed without issue. Maybe I was lucky, or maybe this error has been scrubbed out somewhere recently.

Diagnosing:

There is some background info on Netty buffer pooling in my link above.

If we start a new REPL we will have the default max direct memory of 2GB, that results in 16 Pool Arenas with, to begin with, zero PoolChunks each.

(.directArenas (PooledByteBufAllocator/DEFAULT))
=>
[#object[io.netty.buffer.PoolArena$DirectArena
         0x157c4c39
         "Chunk(s) at 0~25%:
          none
          Chunk(s) at 0~50%:
          none
          Chunk(s) at 25~75%:
          none
          Chunk(s) at 50~100%:
          none
          Chunk(s) at 75~100%:
          none
          Chunk(s) at 100%:
          none
          tiny subpages:
          small subpages:
          "]
 #object[io.netty.buffer.PoolArena$DirectArena
         0x45cbb12c
         "Chunk(s) at 0~25%:
          none
          Chunk(s) at 0~50%:
          none
          Chunk(s) at 25~75%:
          none
          Chunk(s) at 50~100%:
          none
          Chunk(s) at 75~100%:
          none
          Chunk(s) at 100%:
          none
          tiny subpages:
          small subpages:
          "]
...
...

At this point it's also a good idea to fire up jvisualvm with the Buffer Pools plugin.

We expect netty to allocate all the memory required for a single message to a single arena, since there's a event-loop thread -> pool arena ThreadLocal cache, so one event-loop thread always uses the same arena.

We also expect that we will likely only allocate a single 16MB (default) PoolChunk to that arena, since Netty caches buffers in the same ThreadLocal cache, and we intend to receive a http-chunk, drop it, release it, receive another http-chunk, etc. We should only really use a single ByteBuffer (and this is the behaviour we see with a bare Netty server with HttpServerCodec dropping all http-chunks received for a single full http request).

If I then start a 1GB upload I see my direct memory usage (via visualvm) jump to 16MB as the PoolChunk is allocated, and for quite some time (several minutes generally) this remains the case. At any point we can check the state of the PoolArenas, and we see a single arena with a single PoolChunk, with a small amount of memory allocated (basically enough for one 64k buffer and a couple smaller).

(.directArenas (PooledByteBufAllocator/DEFAULT))
=>
[#object[io.netty.buffer.PoolArena$DirectArena
         0x31bd8884
         "Chunk(s) at 0~25%:
          Chunk(46d7fd67: 1%, 114688/16777216)
          Chunk(s) at 0~50%:
          none
          Chunk(s) at 25~75%:
          none
          Chunk(s) at 50~100%:
          none
          Chunk(s) at 75~100%:
          none
          Chunk(s) at 100%:
          none
          tiny subpages:
          16: (2049: 1/32, offset: 8192, length: 8192, elemSize: 256)
          small subpages:
          1: (2051: 1/8, offset: 24576, length: 8192, elemSize: 1024)
          "]

Now, at some point the issue occurs, and all of the remaining http-chunk are read immediately into memory within a few seconds, with none of the allocated buffers being released.

I've no idea what causes this, but effectively it looks as if each remaining http-chunk in the request is read by netty, allocated to a bytebuffer, and that buffer is not released, the next http-chunk being read and allocated, and so on. Remember netty is capable of performing that read for the entire 1GB in about 3s. This allocation without release is the large vertical spike in the image below.

Can also be seen by inspecting the PoolArena directly, we see a slew of allocated PoolChunk:

(.directArenas (PooledByteBufAllocator/DEFAULT))
=>
[#object[io.netty.buffer.PoolArena$DirectArena
         0x19bda70e
         "Chunk(s) at 0~25%:
          Chunk(81c0942: 5%, 688128/16777216)
          Chunk(s) at 0~50%:
          none
          Chunk(s) at 25~75%:
          none
          Chunk(s) at 50~100%:
          Chunk(21913aed: 54%, 9027584/16777216)
          Chunk(44bc617a: 99%, 16760832/16777216)
          Chunk(s) at 75~100%:
          Chunk(3e5b6a9d: 99%, 16760832/16777216)
          Chunk(s) at 100%:
          Chunk(56e5320d: 100%, 16777216/16777216)
          Chunk(7e8b2c6e: 100%, 16777216/16777216)
          Chunk(599b2574: 100%, 16777216/16777216)
          Chunk(3cf1b358: 100%, 16777216/16777216)
          Chunk(43fb8b91: 100%, 16777216/16777216)
          Chunk(643427dc: 100%, 16777216/16777216)
          Chunk(64840eab: 100%, 16777216/16777216)
          Chunk(55de0a10: 100%, 16777216/16777216)
          Chunk(4a66b439: 100%, 16777216/16777216)
          Chunk(f29606e: 100%, 16777216/16777216)
          Chunk(4b1cd676: 100%, 16777216/16777216)
          Chunk(6eb7db8f: 100%, 16777216/16777216)
          Chunk(118f6d1c: 100%, 16777216/16777216)
          Chunk(55614553: 100%, 16777216/16777216)
          Chunk(786386ef: 100%, 16777216/16777216)
          Chunk(28d851cf: 100%, 16777216/16777216)
          Chunk(42d53e45: 100%, 16777216/16777216)
          Chunk(1dc1b091: 100%, 16777216/16777216)
          Chunk(79710938: 100%, 16777216/16777216)
          Chunk(13cd79f0: 100%, 16777216/16777216)
...
...
...

At this point if you have sent a file > max direct memory size you will see an OOM.

In this test case I have sent a 1GB file (somehow corresponds to almost 2GB memory usage, not entirely sure how that's doubled up - is @ztellman 's byte-streams smart enough to convert a direct ByteBuf to a direct ByteBuffer?).

Anyway, since I have already processed some small amount of the message (pre-spike), I can fit the remaining amount in memory. You'll see from the image above that the yada listener continues to consume these allocated bytebuffer, unaware that any one-by-one http-chunk execution has been overwhelmed and the entire full http request read entirely into memory. It will complete successfully within about ten minutes.

Hope some of this has been of use. As I said I'm not sure if this issue persists since I couldn't reproduce with latest versions, if it does then you have some more info to go forward with.

from yada.

malcolmsparks commented on September 24, 2024

@d-t-w, many thanks for this very thorough write-up. Your description of the problems concur with what I was seeing when testing with large files, and I failed in my attempt to get to the cause of the issue. Since then some dependencies have changed but I still have the original file-upload code and some large files that I can test with, to see if the issue remains.

from yada.

d-t-w commented on September 24, 2024

No problem @malcolmsparks, if you find the issue remains and you want help isolating it to netty or beyond just ping me - happy to look at it further, was just passing yesterday.

from yada.

malcolmsparks commented on September 24, 2024

I've done some testing with some large 4Gb files today.

With the current yada master 3963d26, which uses aleph 0.1.4 I am not seeing the issue any more, either with or without -Dio.netty.allocator.numDirectArenas=0.

I'm only seeing about 10MBps throughput, however, but that may be because I'm writing the file. I've seen yada stream an upload at about 1GBps before, so I need to keep investigating. However, at least it's reliably working now, even if performance could be improved.

from yada.

malcolmsparks commented on September 24, 2024

Ah, I discovered I was still printing out to the console on every buffer. Now I've removed that I'm up to about 80MB/s. I'm streaming a 4Gb file, without chunked transfer encoding, and writing it to an NVMe SSD disk. I can stream the whole file this way in 52 seconds. For comparison, a cp manages the same thing in 7 secs on my system, so yada seems pretty fast now.

from yada.

malcolmsparks commented on September 24, 2024

Closing as per above comments.

from yada.

Direct memory exhaustion large multipart upload about yada HOT 18 CLOSED

Comments (18)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent