Describe the issue See numbers below. There are other issues like

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi all, I just released <a href="https://github.com/ZiggyCreatures/FusionCache/release

Hi <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="

Hi all, I've finally released <a href="https://github.com/ZiggyCreatures/FusionCache/r

[Issue] FusionCache allocates on reads and allocates a lot on writes about fusioncache HOT 9 CLOSED

neon-sunset commented on June 12, 2024 2

[Issue] FusionCache allocates on reads and allocates a lot on writes

from fusioncache.

Comments (9)

jodydonetti commented on June 12, 2024 3

Hi @neon-sunset , I just finished a big effort in reducing cpu usage and memory allocation, and I pushed all the changes in the release/v1_0_0 branch.

First of all I'd like to thanks you again for opening this issue and pointing out some perf shortcoming that FusionCache had: lately I've been very busy with completing the last big features (the new Auto-Recovery, Open Telemetry support, etc), preparing for an On .NET episode and finishing the last touches for the big v1.0 release, culminated in the preview1 last week.

Having said that, what you pointed out is important, so I finally took some time to go allocation hunting and trying to optimize whatever I could in the common happy paths: I'm happy to report that I've been able to get some really nice results, which I'd like to show (again using your own benchmark code).

Set

Before:

Method	Mean	Error	StdDev	P95	Ratio	Rank	Allocated	Alloc Ratio
SetFastCache	17.22 ns	0.283 ns	0.265 ns	17.65 ns	0.13	1	40 B	0.12
SetMemoryCache	54.79 ns	0.481 ns	0.450 ns	55.55 ns	0.42	2	104 B	0.31
SetLazyCache	95.75 ns	1.070 ns	1.001 ns	97.07 ns	0.74	3	280 B	0.83
SetFusionCache	128.94 ns	1.373 ns	1.284 ns	130.83 ns	1.00	4	336 B	1.00

After:

Method	Mean	Error	StdDev	P95	Ratio	Rank	Allocated	Alloc Ratio
SetFastCache	17.55 ns	0.204 ns	0.191 ns	17.84 ns	0.16	1	40 B	0.25
SetMemoryCache	54.51 ns	0.397 ns	0.372 ns	55.06 ns	0.50	2	104 B	0.65
SetLazyCache	94.94 ns	0.892 ns	0.834 ns	96.08 ns	0.87	3	280 B	1.75
SetFusionCache	109.46 ns	1.102 ns	1.030 ns	111.10 ns	1.00	4	160 B	1.00

As you can see I've been able to drop the cpu usage by about 15%, barely above LazyCache while having more features, and we are still talking about nanoseconds.

The allocations instead dropped by 52%, from 336 bytes to 160 bytes: LazyCache now allocates 75% more, again while having less features.

Get

Before:

Method	Mean	Error	StdDev	P95	Ratio	Rank	Allocated	Alloc Ratio
GetFastCache	6.474 ns	0.1256 ns	0.1174 ns	6.683 ns	0.15	1	-	0.00
GetMemoryCache	26.896 ns	0.3287 ns	0.2914 ns	27.140 ns	0.62	2	-	0.00
GetLazyCache	34.369 ns	0.4198 ns	0.3927 ns	34.895 ns	0.79	3	-	0.00
GetFusionCache	43.267 ns	0.2805 ns	0.2190 ns	43.494 ns	1.00	4	48 B	1.00

After:

Method	Mean	Error	StdDev	P95	Ratio	Rank	Allocated	Alloc Ratio
GetFastCache	6.511 ns	0.0934 ns	0.0874 ns	6.633 ns	0.17	1	-	NA
GetMemoryCache	26.503 ns	0.2110 ns	0.1974 ns	26.814 ns	0.68	2	-	NA
GetLazyCache	33.986 ns	0.2404 ns	0.2131 ns	34.317 ns	0.87	3	-	NA
GetFusionCache	38.936 ns	0.2911 ns	0.2581 ns	39.367 ns	1.00	4	-	NA

Here cpu went down about 10-15% , which is nice.

Allocations instead became awesome: they dropped to a beautiful zero, so no more allocations (damn closures).

Conclusions

This is not the end of the optimization story, and now that all the main features (and then some) are there and v1.0 is around the corner, I plan on working even more on optimizing cpu usage and allocations.

Of course, as you pointed out yourself, while the other caches are memory (L1) caches with a set of features that goes from barebone (so to be micro-optimized) to relatively small, FusionCache is an hybrid cache (transparently L1 or L1+L2) with a lot of features available: resiliency, timeouts, optional 2nd level with a backplane, extensive logging, fully observable (Open Telemetry) and more.

All of this to say that, while the "pay for play" policy is something I absolutely agree on it's also fair to say that with all these features, even just the various "is null/enabled" checks here and there for each feature + a minimum of extra memory for the needed metadata in support of them, it will always be a kind of apples to oranges comparison, in a way.

I'm interested in knowing what you think, if you'd like.

Meanwhile, again thanks!

from fusioncache.

jodydonetti commented on June 12, 2024 2

I'm mostly worried about avoidable cost caused by some of the internal design decisions that are not necessary to achieve the desired functionality. This library is fairly popular so it is important to ensure that it does not suffer from common cache implementation issues.

I agree, and I'll try to see if I can improve it.

See dotnet/extensions#4766

eh, I know about this new discussion: I've been part of the old discussion some years ago so I'm writing my thoughts on this new one, will publish them later today.

One of the key concepts promoted in .NET in the last few years is "pay for play" policy - users should only pay for what they use whenever possible.

Totally agree with the concept: on a pragmatic note though, with the time and resources available to me (it's an oss work done in my spare time), personally I tried to find the right balance between adding useful features that cover various scenarios and common production problems with performance and resource allocation. It's not always easy as you can imagine, but I'll strive to make it even better in the future.

Thank you for your work.

Thanks to you for sharing your thoughts, it's good to see participation!

from fusioncache.

neon-sunset commented on June 12, 2024 1

I'm mostly worried about avoidable cost caused by some of the internal design decisions that are not necessary to achieve the desired functionality. This library is fairly popular so it is important to ensure that it does not suffer from common cache implementation issues.

In particular, Microsoft libraries like MemoryCache and FASTER do support many features - the former fully integrates with logging/tracing/ETW and supports flexible eviction behavior and callbacks, can be integrated with second-level distributed cache and much more without allocating nearly as much, and that despite its own design being also considered flawed/outdated for certain scenarios and memory footprint being higher than it could have been. See dotnet/extensions#4766

One of the key concepts promoted in .NET in the last few years is "pay for play" policy - users should only pay for what they use whenever possible.

Thank you for your work.

from fusioncache.

neon-sunset commented on June 12, 2024 1

Awesome! The reductions in memory usage are welcome and will help with reducing the churn on load spikes.

As for you asking whether FastCache is usable within FusionCache - at this point I'm regretting quite aggressive design decision to make it a global static cache, it definitely does not fit together well with FusionCache which is instanced and isolated in a more, you could say, reasonable way. Maybe in the future in some version 2.0 this will change but I just don't have time for this at the moment unfortunately. The idea of very thin layer on top of NonBlocking.ConcurrentDictionary worked really well in production with eviction not being "front-loaded" like other cache libraries do - letting another thread deal with that. But it needs work. There's another JitBit.FastCache library which ~~stole~~ copied the approach from FastCache.Cached but it does not do any of the necessary eviction heavy lifting (you have to do some form of parallelization and you can't be running the thing every 10s or so), does not use NonBlocking and boxes cached value in another class, all three of which become big limitations when scaling to >1-10M items. Just letting you know so you don't repeat repeat its mistakes.

Therefore, if you think it's worth investing the effort, FusionCache could see either its own FastCache-like memory cache implementation or opt to wait for the better first-party alternative to MemoryCache to be provided. The third extra difficult option which is what I wanted to do back then for 2.0 of fastcache was writing a Segcache-like implementation envisioned after Twitter's take on Segcache (https://github.com/twitter/pelikan/tree/master/src/storage/seg) but version 1.x turned out to be sufficient in terms of both memory and eviction throughput scaling for the tasks I was solving back then so such 6-12 months-long undertaking lost its attractiveness 😅

As for your note regarding the inherent cost of internal and external abstractions, there are a couple of points I wanted to share which may be of interest to you:

As of .NET 8, the actual cost on microbenchmarks will be hidden because DynamicPGO will devirtualize the calls to most frequently used type of virtual/interface call or a delegate. This makes estimating performance more difficult because should the user's code not have such monomorhpic distribution of used instances, the cost of abstractions will resurface. You may also want to use [SimpleJob(RuntimeMoniker.NativeAot80)] on occasion and see how much its performance differs from JIT which can partially showcase you how the code would perform should DynamicPGO be not as successful
Internal abstractions can be made zero-cost by using struct generics constrained to an interface type over accepting interface type itself - .NET will monomorphize such calls to exact implementation e.g. CoreLib uses this technique heavily for performance-sensitive code. However, this is one of the "last resort" optimizations that are useful when other low hanging fruits have been taken. So if you feel like continuing performance work, you may want to start with the profiler instead, I'm mostly saying this because am too guilty of not doing so and favouring blind microbenchmarks instead 😀

Watching the episode right now :)

from fusioncache.

jodydonetti commented on June 12, 2024 1

Hi all, I just released preview2 🥳

Please give it a try, thanks!

from fusioncache.

jodydonetti commented on June 12, 2024

Hi @neon-sunset and thanks for your report, I appreciate the time it took you.

As you have pointed out other libs do in fact more than FastCache, and FusionCache in particular I think supports a lot of features and options, so it's something kinda expected, at least up to a certain point.

Having said that it's important to make sure the perf impact is as small as possible: in the last months I've been quite busy finishing the new Auto-Recovery feature which took a lot of time, and it may be that I haven't paid the needed attention to some extra allocations or cpu cycle spent here and there.

I'll try to make sure that the resources needed are as low as possible, and will update you on this later on as soon as I have updates: keep in mind though that to support all the extra features that FusionCache provides (2nd level, fail-safe, auto-recovery, etc), some extra resources are to be accounted for.

Meanwhile thanks again for your report!

from fusioncache.

jodydonetti commented on June 12, 2024

Oh, one thing I'd like to add that may have been missing here, is that currently FusionCache is based on the 2 main caching abstraction in .net, IMemoryCache and IDistributedCache.
This is to say that, as of today, the perf limits will be MemoryCache + a delta, meaning it cannot be lower than MemoryCache.

Having said that, while there's a huge value in being able to use any implementation of IDistributedCache, there's less value in being able to support any implementation of IMemoryCache (which, basically, means MemoryCache). Because of this I'm thinking from some time to detach the hard dependency from IMemoryCache to bypass its limits, and to come up with support for more performant and lightweight implementations. This will be doable because from the start I decided to support the 2 core interfaces as "inputs", but to never expose them as "outputs": so I can add support without breaking anything.

As said I've been experimenting from some time with this approach, but currently have nothing to see yet, will update.

ps: is FastCache usable for heterogeneous data or just homogeneous? Meaning, can I use it for different types of data in the same cache (maybe by just using it with <object>) ?

Thanks.

from fusioncache.

jodydonetti commented on June 12, 2024

Thanks for all the info, really damn interesting.

Will definitely try the AOT benchmark: I was already curious and now even more 😬

About the second point: do you have maybe some material about that? Blog post, a docs page, examples, anything really.

Thanks again, and if you like let me know how about the episode.

from fusioncache.

jodydonetti commented on June 12, 2024

Hi all, I've finally released v1.0 🥳

Feels a little unreal tbh, but yep: v1.0 is out.

from fusioncache.

[Issue] FusionCache allocates on reads and allocates a lot on writes about fusioncache HOT 9 CLOSED

Comments (9)

Set

Get

Conclusions

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent