<div class="snippet-clipboard-content notranslate position-relative overflow-auto" data-snippet-clip

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

<a class="user-mention notranslate" data-hovercard-type="user" data-hover

Jemalloc defrag situation about valkey HOT 11 OPEN

zuiderkwast commented on July 21, 2024

Jemalloc defrag situation

from valkey.

Comments (11)

zvi-code commented on July 21, 2024 2

Am I wrong in assuming we get more fragmentation when I/O threads are used?

I don't think this is true for most cases. By default we do not use the per-cpu or per-thread arena, jemalloc does create by default multiple arenas(depend on # cpus), but usually all allocations are from arena 0. But allocations are going through the thread cache. It amortizes the access to arena to reduce the threads contention, by allocating batches of entries. But if system is not fragmented already, close in time allocations, even from different threads, will likely be allocated from close slabs/extents/mmaps. Thread cache also avoids going down to arena in case of short lived alloc/free.

In any case, as the number of threads is not huge, I expect they have very little affect on fragmentation either way, the customer workload, as @madolson mentioned, is bigger factor in fragmentation(but not the only one).

from valkey.

zuiderkwast commented on July 21, 2024

Regarding the defrag and the need to defrag, I think we can try to think of ways to avoid the fragmentation. Few other project seem to need defrag.

Am I wrong in assuming we get more fragmentation when I/O threads are used?

A request is parsed into string objects which are later stored in the database as keys and values. When this is done in different threads, they get allocated in different arenas. Then they are all stored in the database. Maybe we can avoid it if we let the main thread allocate the keys and values that will be stored. Then, all the allocations done by I/O threads would be short-lived and not cause fragmentation.

An idea I saw somewhere is that the request parser could return pointers into the query buffer, rather than duplicating the data as sds strings. A RESP request is always an array of strings (AKA multibulk) so we'd just need to store an offset and a length for each argument temporarily until the command returns.

from valkey.

madolson commented on July 21, 2024

At AWS, we very rarely see places where defragmentation is actually useful. Most of the time, new data is getting added and removed at about the same rate and it naturally defragments itself. The only time it's really useful is when there is a fundamental shift in the workload and the data becomes permanently fragmented, and a one-time defrag will help free up a bunch of memory.

A request is parsed into string objects which are later stored in the database as keys and values. When this is done in different threads, they get allocated in different arenas. Then they are all stored in the database. Maybe we can avoid it if we let the main thread allocate the keys and values that will be stored. Then, all the allocations done by I/O threads would be short-lived and not cause fragmentation.

Reminds me a bit of the segcache architecture, which allocates in large segments with a lot of key/value pairs that are all freed as a bunch.

An idea I saw somewhere is that the request parser could return pointers into the query buffer, rather than duplicating the data as sds strings. A RESP request is always an array of strings (AKA multibulk) so we'd just need to store an offset and a length for each argument temporarily until the command returns.

@hpatro ^

from valkey.

PingXie commented on July 21, 2024

The only time it's really useful is when there is a fundamental shift in the workload and the data becomes permanently fragmented, and a one-time defrag will help free up a bunch of memory.

This ^

This is why I don't think it justifies the overhead/complexity to vendor jemalloc. linking #15 (comment)

from valkey.

zuiderkwast commented on July 21, 2024

The only time it's really useful is when there is a fundamental shift in the workload and the data becomes permanently fragmented, and a one-time defrag will help free up a bunch of memory.

Then I think should gradually get rid of the get_defrag_hint() patch.

The first thing we can do is to make it optional, I mean add a dumb variant of defrag. If we're running unpatched jemalloc, we can just realloc everything.

Most of the time, new data is getting added and removed at about the same rate and it naturally defragments itself.

So to avoid excessive defrag work, we can measure fragmentation but wait an hour or so before we start the defrag to see if it resolves itself?

The third thing would be to try to avoid creating fragmentation in the first place (if my speculation about I/O threads right).

If we these works well in practice (live, not just release candidate) then in another release we can make it permanent and unvendor jemalloc. How about this plan?

from valkey.

hpatro commented on July 21, 2024

An idea I saw somewhere is that the request parser could return pointers into the query buffer, rather than duplicating the data as sds strings. A RESP request is always an array of strings (AKA multibulk) so we'd just need to store an offset and a length for each argument temporarily until the command returns.

I came across this idea on a rust rewrite of Redis https://github.com/seppo0010/rsedis/blob/master/parser/src/lib.rs#L13-L28
I did try a dirty hack around this and the problem(s) I observed was that we would still need to create those objects down the layer to store the data. So, object creation is inevitable and with the workload Madelyn mentioned above fragmentation is bound to happen.
The other issue with that is we need to significantly change our db layer and plenty of other API(s) that require the key/values as robj to this. So, we would need to touch most of our code flow(s) dealing with robj.
Overall, I didn't see much of performance gain as well in SET/GET operations.

from valkey.

zuiderkwast commented on July 21, 2024

@hpatro Yes, we still need to allocate those objects, but my point is that if it's done in the main thread (rather than in the I/O threads), they will be in the same jemalloc arena as the other allocations in the main thread. Jemalloc allocates from one arena per thread and if we have objects in lots of different arenas, that would cause more fragmentation. Does it make sense?

We can still get fragmentation though, just a bit less, if this is true.

from valkey.

hpatro commented on July 21, 2024

@hpatro Yes, we still need to allocate those objects, but my point is that if it's done in the main thread (rather than in the I/O threads), they will be in the same jemalloc arena as the other allocations in the main thread. Jemalloc allocates from one arena per thread and if we have objects in lots of different arenas, that would cause more fragmentation. Does it make sense?

We can still get fragmentation though, just a bit less, if this is true.

That makes sense. I was trying to highlight the delayed robj creation for the storage layer can get really messy with the overall code structure in place.

from valkey.

zuiderkwast commented on July 21, 2024

I was trying to highlight the delayed robj creation for the storage layer can get really messy with the overall code structure in place.

I see. The easiest solution is probably that the main thread allocates robj for all arguments immediately when it starts executing the command. It's not optimal but maybe it fixes the fragmentation issue.

from valkey.

zvi-code commented on July 21, 2024

At AWS, we very rarely see places where defragmentation is actually useful. Most of the time, new data is getting added and removed at about the same rate and it naturally defragments itself. The only time it's really useful is when there is a fundamental shift in the workload and the data becomes permanently fragmented, and a one-time defrag will help free up a bunch of memory.

@madolson I agree the current defrag process is not so useful (I'm talking purely about the logic in jemalloc, it was improved with jemalloc 5.3 but still it is not guaranteed to even converge at all), but I disagree with the statement that "naturally defragment itself". I think that the current (pre-serverless) AWS experience hides the issue of memory being "locked" and not freed back to the OS. Another factor for evaluating the issues with defrag is the utilization, thanks to jemalloc bucket allocator, if you are at low utilization to begin with, fragmented memory will not cause major customer pain (besides visibility). Forward looking, when considering things like multi-tenancy, I believe fragmented memory will become a much bigger issue.

from valkey.

PingXie commented on July 21, 2024

Thanks for the insight, @zvi-code! Appreciate it.

Forward looking, when considering things like multi-tenancy, I believe fragmented memory will become a much bigger issue.

I understand that you are providing a counter argument to the statement of "naturally defragment itself", which makes sense to me. I just wanted to clarify that my main rationale for de-vendoring jemalloc is that the overhead of vendoring any allocator for the sake of active_defrag outweighs the perceived benefit by a large margin, given the current use cases being single tenancy exclusively.

from valkey.

Jemalloc defrag situation about valkey HOT 11 OPEN

Comments (11)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent