maypok86 / otter Goto Github PK
View Code? Open in Web Editor NEWA high performance lockless cache for Go.
Home Page: https://maypok86.github.io/otter/
License: Apache License 2.0
A high performance lockless cache for Go.
Home Page: https://maypok86.github.io/otter/
License: Apache License 2.0
We need a specific hash table that ignores hash collisions and supports node swapping
For the moment we will probably use a modification of the hash table from this repository, but in the future we should implement ConcurrentHashMap from the standard Java library or some other implementation.
Added hash table implementation with desired characteristics and covered by tests
My go-to cache library for Go so far has been bluele/gcache - probably much less performant than many other alternatives but it has a very handy helper function that handles coordinating loads to avoid thundering herd.
https://pkg.go.dev/github.com/bluele/gcache#readme-loading-cache
Builder could have a new WithLoaderFunc(func(K) V).
Calling Get(key_not_loaded_yet) in multiple goroutines concurrently will call the optional LoaderFunc once and then return to all goroutines all at once.
If my understanding is correct, the current implementation has a bug that may evict more objects than needed. I believe it should return
at this line instead of continuing to evict because an object has already been evicted from the main queue https://github.com/maypok86/otter/blob/main/internal/s3fifo/small.go#L62
As you have mentioned (Yiling-J/theine-go#29), it is better to store a fingerprint instead of the full object in the ghost. Storing data in the ghost defeats the purpose of using a ghost queue...
the size of the ghost queue does not need to be accurate, so you can estimate using the total cache size divided by the mean object size. VMware found that the ghost queue size has little impact between 40% and 200% of the main queue size (in terms of #objects).
there are multiple if statement seems not necessary, e.g., https://github.com/maypok86/otter/blob/main/internal/s3fifo/small.go#L55
We need to implement a basic version of the cache with ttl support in the api
Basic version of the cache is implemented
Right now otter uses quite a few extra bytes per node (at least 48).
I have an idea how to easily reduce this to 40 bytes.
Right now the expiry time is stored as the number of seconds since January 1, 1970 UTC in uint64 and takes 8 bytes. But we can only store the time difference between the start and the node's expiry time, then we can store in uint32. I doubt a program using otter will ever run continuously for 4294967295 seconds or 136+ years. ๐
Also otter stores a mutex per node which takes 8 bytes, but it seems we just need a simple spinlock which will take 4 bytes
A cache system needs statistics to optimize its performance and improve the efficiency of data retrieval. Statistics provide valuable information about the access patterns and usage of data, which can be used to make informed decisions regarding caching strategies and resource allocation. Here are a few reasons why statistics are important for a cache system:
Access patterns: Statistics help identify the most frequently accessed data items or resources. By analyzing access patterns, the cache system can prioritize caching those items that are accessed more frequently, reducing the cache miss rate and improving overall performance.
Cache hit rate: Statistics allow the cache system to measure the cache hit rate, which is the percentage of requests that are successfully served from the cache
Added fast and test covered statistics package
We need benchmarks to compare the performance of ours with the other caches available in go
Implemented basic benchmarks for this project
Thank you for this library @maypok86
Creating a cache leads to one core being utilized at 100%. Creating multiple caches bumps this by one core at a time.
Steps to reproduce the behavior:
package main
import "github.com/maypok86/otter"
// We assign this so that we're sure it doesn't get garbage collected
var Cache *otter.Cache[string, string]
func main() {
b, err := otter.NewBuilder[string, string](1_000)
if err != nil {
panic(err)
}
c, err := b.Build()
if err != nil {
panic(err)
}
Cache = c
// Block forever
select {}
}
CPU usage is negligible.
I believe the issue lies in the goroutine spawned here https://github.com/maypok86/otter/blob/main/cache.go#L74
The necessary expiration algorithm for a cache determines when and how items in the cache should be evicted or removed. The goal is to remove items that are no longer needed or have expired in order to make space for new items.
Expiry algorithm for variable ttl is implemented
The eviction policy for a cache determines which items should be evicted (removed) from the cache when it becomes full and a new item needs to be added. The goal of an eviction policy is to maximize cache hit rate and minimize cache misses by making informed decisions about which items to remove.
We will use an eviction policy called S3-FIFO based on an paper from https://blog.jasony.me/system/cache/2023/08/01/s3fifo
I'll add a basic implementation, but no tests and possibly with bugs for now
Currently, Otter uses ring buffer based queues, but this type of queue does not allow easy deletion of an arbitrary element, and we need it because of the expiration policy and arbitrary calls to the Delete function by the user.
Queues on a doubly-linked list can solve this problem, but instead require 16 bytes for each key-value pair (prev and next pointers), and put additional pressure on gc. So one would like to come up with a workaround for this.
And the workaround suggestion is pretty simple. Let's store in each key-value pair store an index to the location where it is stored in the buffer. For example, like this:
type Node[K comparable, V any] struct {
...
policyIndex uint32
}
And when deleting it, just put nil in the right cell. Such a solution will additionally waste 12 bytes (pointer to node and index). The only problem is that we need to maintain indexes in nodes when expanding the buffer. Also, after deleting a node we still occupy space for the cell and when adding new nodes the buffer will have to be expanded twice, but we will not use half of these cells.
That's why we want to understand what type of queues suits us better and implement it
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.