Comments (7)
I think we are recommending just option 1
from valkey.
@PingXie Yes, I looked at the PR but it isn't currently implemented as in the #308 (comment) comment, right? I think this comment looks good and it is more simple than the current array of counters per thread.
@lipzhu Do I understand correctly? Is the suggestion to compute the diff and report back to the main atomic used_memory
only when it changed more than 100KB (etc.), like the code below? If yes, then I think there's no point in using jemalloc's stats.
static _Atomic size_t used_memory = 0;
static _Thread int64_t thread_used_memory_delta = 0;
#define THREAD_MEM_MAX_DELTA (100 * 1024)
static inline void update_zmalloc_stat_alloc(size_t size) {
thread_used_memory_delta += size;
if (thread_used_memory_delta >= THREAD_MEM_MAX_DELTA) {
atomic_fetch_add_explicit(&used_memory, thread_used_memory_delta, memory_order_relaxed);
thread_used_memory_delta = 0;
}
}
static inline void update_zmalloc_stat_free(size_t size) {
thread_used_memory_delta -= size;
if (thread_used_memory_delta <= THREAD_MEM_MAX_DELTA) {
atomic_fetch_sub_explicit(&used_memory, -thread_used_memory_delta, memory_order_relaxed);
thread_used_memory_delta = 0;
}
}
So maybe there's no point using jemalloc's stats, but I'm answering the questions anyway, Ping:
I have a few questions about using mallctl("stats.allocated") or a variant of it as you described above
- What is the overhead?
I think mallctl("stats.allocated")
has the overhead of at least one function call, which is maybe too much here.
But if we use mallctl("thread.allocatedp")
and mallctl("thread.deallocatedp")
instead, they are called only during initalization to get a pointer to jemalloc's own counters. These are just thread local uint64_t variables, that we can access. One for allocated and one for deallocated memory.
- What about other allocators?
We would need fallback to count this ourselves, but it can be conditional (ifdef) so no cost if jemalloc is used.
from valkey.
@PingXie Yes, I looked at the PR but it isn't currently implemented as in the #308 (comment) comment, right? I think this comment looks good and it is more simple than the current array of counters per thread.
No. I don't think @lipzhu has implemented it yet. But yes your implementation is what I like to see and 100 KB (or 128 KB?) seems good to me too.
from valkey.
Actually I wonder if we should go even higher, like 1MB.
from valkey.
I think it's safe to remove this exact counting. It's already not exact anyway (see discussion in the PR).
In the contributor summit, someone mentioned that we should remove the memory counter in zmalloc and instead rely on the metrics from jemalloc's mallctl()
. It does the same accounting as we do anyway, except that it also includes allocations done without zmalloc, which is a better metric to use when checking the maxmemory limit IMHO.
- We can get allocated memory using
mallctl("stats.allocated")
. - If that's too expensive, it's possible to access a pointer (
uint64_t *
) to jemalloc's own thread local counters usingmallctl("thread.allocatedp")
andmallctl("thread.deallocatedp")
.
We can optimize the balance between exactness and performance using some heuristics.
The worst case scenario is that two threads allocates a huge amount of memory almost simultaneously when we're close to the maxmemory limit. To avoid that, we could fetch the metric more frequently when we're closer to the maxmemory, say when we're over 90% of maxmemory, and less often otherwise.
Another possible heuristic is to count large allocations immediately (say over 10MB) and increment the global counter immediately from zmalloc in this case, and otherwise rely on the value we fetch from jemalloc less often, e.g. in cron.
from valkey.
@zuiderkwast, did you get a chance to look at @lipzhu's proposal at #308 (comment)? The idea is to cache the small delta locally and only when the accumulated changes exceed some threshold commit them to the global variable atomically . On the reporting path, we could sum up the global and the deltas from all threads to get a close enough reading. Note that the local delta would need to be a signed number.
I have a few questions about using mallctl("stats.allocated")
or a variant of it as you described above
- What is the overhead?
- What about other allocators?
from valkey.
@zuiderkwast @PingXie Thanks for your comments.
Let me clarify the decision according to your comments in this issue and pr. Please correct me if I misunderstood.
- if not defined jemalloc, prefer proposal in #308 (comment) to track the
used_memory
, threshold is 1MB. - If defined jemalloc, just return
used_memory
byje_mallctl("stats.allocated")
, thus we can save the cost ofupdate_zmalloc_stat_alloc/free
andje_malloc_usable_size
when call zmalloc/zfree.
from valkey.
Related Issues (20)
- Deduplicate nextPingExt() and getNextPingExt() HOT 1
- Valkey daily run performance benchmark scenarios
- [BUG] A client blocked on authentication through a module can't log into a cluster when it is down HOT 1
- [NEW] benchmark over test framework
- Updated CI testing to improve catching tests early.
- Investigate using path filters to reduce the number of tests we are running
- Investigate performance improvements related to slot ownership in getNodeByQuery HOT 8
- Request for Valkey to support element level TTL for hash, set and sorted set HOT 3
- [BUG] Stream lag value could be incorrect HOT 1
- [NEW] Add "Total" message and used_memory_human log information in serverCron() function
- Changing Valkey RDB magic for Valkey 8 HOT 14
- [DOC] Document the clang-format flow HOT 2
- Test failure unit/cluster/slot-migration
- Revise config defaults for Valkey 8 HOT 6
- [NEW] Improve cluster logs HOT 1
- GEOSEARCHSTORE - ASC | DESC options are meaningless HOT 2
- Extended Redis client introspection functionality HOT 1
- Valkey support for memory locking (?)
- Memory protected (debug) mode
- Introduce new rax function `raxAllocSize` to return rax tree allocation size in constant time
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from valkey.