Git Product home page Git Product logo

Comments (10)

bartelink avatar bartelink commented on May 14, 2024

Thanks!

Have not felt the need to generalize in this direction but can appreciate your desire to have only one set of cache stuff to reason about.

One thing to bear in mind is that a distributed cache is questionable; the S.C.MC holds object refs and does not require serialization - a distributed cache in front of GES is pretty pointless as GES already has good caching built-in and a distributed cache in front of cosmos likely will cost as much in management overhead as it will save in RU cost.

Assuming it doesnt cause too much mess and achieves something worthwhile, would take a PR to make the wiring in of the cache provider pluggable via an optional arg on the resolver etc. (would prefer not to have two concrete impls in the actual file EventStore.fs and Cosmos.fs files though - if anything it'd be nice to define an interface in a shared place and then have your impl and the stock one). At present there are small variances in the key strategy between Cosmos and ES; the code has enough tests to make doing a spike feasible, but I wont suggest it's going to be trivial to do.

Can you explain a little about how your MemoryCache operates - does it hold a ref to an object and self-prune like SRCMC?

from equinox.

DSilence avatar DSilence commented on May 14, 2024

I do agree that the benefits of distributed caching are very questionable (and you likely have investigated into it way more than I have). Although, I believe that building a common abstraction may be handy as more providers are added.

As for the MemoryCache, overall it works in a similar manner to SRCMC:

from equinox.

bartelink avatar bartelink commented on May 14, 2024

I wouldn't say we've investigated it as such; the penalty of an extra roundtrip in terms of latency etc and having to care and feed a server and/or state thereof are the main reasons why its not been done.
(eta: interesting paper on distributed caches: Fast key-value stores:
An idea whose time has come and gone
)

The most primary reason of all of course is that doing it in memory means that blue/green deploys are not relevant (the schema does not require versioning), and there is no burden for the fold state to require a codec.

A common abstraction (likely an interface) would likely clarify the interconnect between the Resolver and the Store so would likely be useful regardless.

While the perf counters in SRCMC are nice, its definitely not a decider given the importance of netcore. Similarly, while there are some funky bits in the API, that's not a major concern (I'm not aware of any use cases where making adjustments at runtime and/or varying much beyond the max allocation would make sense?)

The largest stumbling block however seems to be computing the "cost"; the GC-derived heuristic for SRCMC fits the need perfectly in that we're dealing with a third party object 'tree'. Being able to rely on an upper limit to the memory consumption (not risking OOM) is pretty key in typical runtime environments.

from equinox.

bartelink avatar bartelink commented on May 14, 2024

@DSilence Did you get any time to investigate or think further on this? If there are to be any minor changes to the contracts (I can't picture any but...), it'd be nice to get them into the 2.0 final release...

from equinox.

DSilence avatar DSilence commented on May 14, 2024

Hey @bartelink. I did some initial prototyping by introducing the interface with the following signature:

type CacheItemOptions =
    | AbsoluteExpiration of DateTimeOffset
    | RelativeExpiration of TimeSpan

type ICache =
    abstract member UpdateIfNewer: policy:CacheItemOptions -> key: string -> Async<'entry>
    abstract member TryGet: key: string -> Async<'entry option>

The breaking change here is usage of async, to enable more scenarios than in-memory implementation.

  1. What is your opinion on this?
  2. I didn't have a lot of time to properly evaluate and test this and it does seem like a breaking change, so if you agree with the approach, I'd keep it out of 2.0 final release.

from equinox.

bartelink avatar bartelink commented on May 14, 2024

Assuming the actual interface works and doesn't make a mess (looks fine), I won't lose sleep over the cost of the Async (or the builder could internally wrap in a DU e.g. type Cache = AsyncCache of ICache | NotAsync of Cache) - it may well be possible to address it by adding a builder overload etc.

Have you considered how you're going to to compute a weight for an arbitrary folded-state value given the interface you described? Will you instead do a max items count or something else ? The reality of how that's addressed will have big effects on the viability your side. And an idea from left field: what are the chances of migrating your other usages to MemoryCache instead - do you e.g. use the Distributed Cache ?

I'm happy to keep the issue open and let you drill in at a point that works for you. I've tagged some final naming tweaks items as 2.0, but there is no actual date set so, who knows: you might have reached a point before then.

from equinox.

DSilence avatar DSilence commented on May 14, 2024

Regarding the perf impact - I actually did the numbers. This is for System.Runtime.Caching.MemoryCache.
Code:

[<MemoryDiagnoser>]
type BenchmarkAsync () =
    let mutable cache : MemoryCache option = None

    [<GlobalSetup>]
    member self.InitCache () =
        let config = NameValueCollection(1)
        config.Add("cacheMemoryLimitMegabytes", "50")
        cache <- Some(new MemoryCache("Test", config))
        cache.Value.Add("test", new Object(), new DateTimeOffset(2020, 12, 12 ,10, 20, 30, TimeSpan.Zero)) |> ignore

    [<GlobalCleanup>]
    member self.CleanupCache () =
        cache.Value.Dispose()

    [<Benchmark>]
    member self.GetWithAsyncReturn () =
        async {
            return cache.Value.Get "test"
        } |> Async.RunSynchronously

    [<Benchmark>]
    member self.GetWithTaskBuilder () =
        let task = ContextInsensitive.task {
            return cache.Value.Get "test"
        }
        task.Result

    [<Benchmark(Baseline = true)>]
    member self.GetDirectly () =
        cache.Value.Get "test"

Results:

BenchmarkDotNet=v0.11.5, OS=Windows 10.0.18362
Intel Core i7-7700HQ CPU 2.80GHz (Kaby Lake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=2.2.301
  [Host]     : .NET Core 2.2.6 (CoreCLR 4.6.27817.03, CoreFX 4.6.27818.02), 64bit RyuJIT DEBUG
  DefaultJob : .NET Core 2.2.6 (CoreCLR 4.6.27817.03, CoreFX 4.6.27818.02), 64bit RyuJIT

Method Mean Error StdDev Median Ratio RatioSD Gen 0 Gen 1 Gen 2 Allocated
GetWithAsyncReturn 606.7 ns 18.563 ns 52.961 ns 578.3 ns 3.73 0.36 0.2537 - - 800 B
GetWithTaskBuilder 224.5 ns 4.677 ns 9.658 ns 221.3 ns 1.42 0.06 0.0482 - - 152 B
GetDirectly 163.4 ns 1.653 ns 1.546 ns 163.2 ns 1.00 0.00 0.0100 - - 32 B

Regarding your questions:

  • At the moment, I've used System.Runtime.Caching.MemoryCache as a primary implementation. The implementation should be ported quite easily to Microsoft.Extensions.Caching, using something like GC.GetTotalMemory as a weight, but didn't give it too much thought yet.

  • About using DU for distinguishing between sync/async, I've though about using that, but after giving a brief look through the codebase desided against it, since async still had to go all the way.

  • We've used Distributed cache for CosmosDb, as Azure Redis Cache has been much cheaper and saved a ton of RU for us. This has been for a module which has NOT been eventsourced and didn't use change feed feature at all. Right now we're using EventStore for all event sourced parts, with no additional caching on top - atm the performance is enough for our workloads, and any perf issues encountered were related to serialzation perf and were fixed by using https://github.com/neuecc/Utf8Json/.

My free time has been kinda sporadic lately but I will try to get back to you with some prototype done.

from equinox.

bartelink avatar bartelink commented on May 14, 2024

Really appreciate the detailed response; feels like we should be able to get something sorted quickly when the time comes.

Your benchmarks suggest just making the interface async, keeping the source minimal and idiomatic is the right choice.

Given the nature of EventStore, yes, caching is rarely going to make a significant difference to overall throughput unless you have lots of nodes in the cluster etc., as it internally caches recently hit streams.

Would love to know the perf cost of a GC.TotalMemory on a well-loaded process but that does seem to answer the question.

Regarding Utf8Json, @Szer did a spike which we ultimately didn't merge (copy retained in https://github.com/jet/equinox/tree/Szer-utf8json). If I or someone get around to doing System.Text.Json support as discussed in https://github.com/jet/equinox/issues/79, and/or you're interested in using it, it may make sense to add an Equinox.Codec.Utf8Json.

from equinox.

bartelink avatar bartelink commented on May 14, 2024

Some minor diffs coming through in rc3 - have no other plans to do big changes before baking v2.

And an aside re caching - PR #151 is slightly relevant re the above - it provides a way to maintain a rolling-state with caching and etag-optimizations to e.g. avoid null writes and associated cache invalidation (we'll be using it to store some denormalized state that we currently index in ElasticSearch).

Also, if you do anything on this, it might be nice if we could avoid clients needing to take a reference to System.Runtime.Caching also as part of the work (the integration tests require a <Reference atm, same for apps, and avoiding that would be a nice touch)

from equinox.

bartelink avatar bartelink commented on May 14, 2024

This is addressed by opening up an extensibility point in #161 by @DSilence 🙏

from equinox.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.