Git Product home page Git Product logo

otter's People

Contributors

dependabot[bot] avatar maypok86 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar

otter's Issues

Shard

Clarification and motivation

We need a specific hash table that ignores hash collisions and supports node swapping

For the moment we will probably use a modification of the hash table from this repository, but in the future we should implement ConcurrentHashMap from the standard Java library or some other implementation.

Acceptance criteria

Added hash table implementation with desired characteristics and covered by tests

Thundering herd avoiding LoaderFunc on misses

Clarification and motivation

My go-to cache library for Go so far has been bluele/gcache - probably much less performant than many other alternatives but it has a very handy helper function that handles coordinating loads to avoid thundering herd.
https://pkg.go.dev/github.com/bluele/gcache#readme-loading-cache

Acceptance criteria

Builder could have a new WithLoaderFunc(func(K) V).
Calling Get(key_not_loaded_yet) in multiple goroutines concurrently will call the optional LoaderFunc once and then return to all goroutines all at once.

Some potential bug and improvements

  1. If my understanding is correct, the current implementation has a bug that may evict more objects than needed. I believe it should return at this line instead of continuing to evict because an object has already been evicted from the main queue https://github.com/maypok86/otter/blob/main/internal/s3fifo/small.go#L62

  2. As you have mentioned (Yiling-J/theine-go#29), it is better to store a fingerprint instead of the full object in the ghost. Storing data in the ghost defeats the purpose of using a ghost queue...

  3. the size of the ghost queue does not need to be accurate, so you can estimate using the total cache size divided by the mean object size. VMware found that the ghost queue size has little impact between 40% and 200% of the main queue size (in terms of #objects).

  4. there are multiple if statement seems not necessary, e.g., https://github.com/maypok86/otter/blob/main/internal/s3fifo/small.go#L55

Cache implementation

Clarification and motivation

We need to implement a basic version of the cache with ttl support in the api

Acceptance criteria

Basic version of the cache is implemented

Reduce memory footprint per node

Clarification and motivation

Right now otter uses quite a few extra bytes per node (at least 48).

I have an idea how to easily reduce this to 40 bytes.

Right now the expiry time is stored as the number of seconds since January 1, 1970 UTC in uint64 and takes 8 bytes. But we can only store the time difference between the start and the node's expiry time, then we can store in uint32. I doubt a program using otter will ever run continuously for 4294967295 seconds or 136+ years. ๐Ÿ™‚

Also otter stores a mutex per node which takes 8 bytes, but it seems we just need a simple spinlock which will take 4 bytes

Acceptance criteria

  1. New version of storing expiry time and spinlock has been implemented
  2. Added implementation tests

Add statistics

Clarification and motivation

A cache system needs statistics to optimize its performance and improve the efficiency of data retrieval. Statistics provide valuable information about the access patterns and usage of data, which can be used to make informed decisions regarding caching strategies and resource allocation. Here are a few reasons why statistics are important for a cache system:

  1. Access patterns: Statistics help identify the most frequently accessed data items or resources. By analyzing access patterns, the cache system can prioritize caching those items that are accessed more frequently, reducing the cache miss rate and improving overall performance.

  2. Cache hit rate: Statistics allow the cache system to measure the cache hit rate, which is the percentage of requests that are successfully served from the cache

Acceptance criteria

Added fast and test covered statistics package

Benchmarks

Clarification and motivation

We need benchmarks to compare the performance of ours with the other caches available in go

Acceptance criteria

Implemented basic benchmarks for this project

[BUG] 100% CPU usage with idle cache

Thank you for this library @maypok86

Description

Creating a cache leads to one core being utilized at 100%. Creating multiple caches bumps this by one core at a time.

To Reproduce

Steps to reproduce the behavior:

package main

import "github.com/maypok86/otter"

// We assign this so that we're sure it doesn't get garbage collected
var Cache *otter.Cache[string, string]

func main() {
	b, err := otter.NewBuilder[string, string](1_000)
	if err != nil {
		panic(err)
	}

	c, err := b.Build()
	if err != nil {
		panic(err)
	}

	Cache = c

	// Block forever
	select {}
}

Screenshot 2023-12-12 at 19 33 18

Expected behavior

CPU usage is negligible.

Environment

  • OS: Both in Linux/Ubuntu and MacOS

Additional context

I believe the issue lies in the goroutine spawned here https://github.com/maypok86/otter/blob/main/cache.go#L74

Expiration algorithm

Clarification and motivation

The necessary expiration algorithm for a cache determines when and how items in the cache should be evicted or removed. The goal is to remove items that are no longer needed or have expired in order to make space for new items.

Acceptance criteria

Expiry algorithm for variable ttl is implemented

Eviction policy

Clarification and motivation

The eviction policy for a cache determines which items should be evicted (removed) from the cache when it becomes full and a new item needs to be added. The goal of an eviction policy is to maximize cache hit rate and minimize cache misses by making informed decisions about which items to remove.

We will use an eviction policy called S3-FIFO based on an paper from https://blog.jasony.me/system/cache/2023/08/01/s3fifo

Acceptance criteria

I'll add a basic implementation, but no tests and possibly with bugs for now

Rethinking queues

Clarification and motivation

Currently, Otter uses ring buffer based queues, but this type of queue does not allow easy deletion of an arbitrary element, and we need it because of the expiration policy and arbitrary calls to the Delete function by the user.

Queues on a doubly-linked list can solve this problem, but instead require 16 bytes for each key-value pair (prev and next pointers), and put additional pressure on gc. So one would like to come up with a workaround for this.

And the workaround suggestion is pretty simple. Let's store in each key-value pair store an index to the location where it is stored in the buffer. For example, like this:

type Node[K comparable, V any] struct {
    ...
    policyIndex uint32
}

And when deleting it, just put nil in the right cell. Such a solution will additionally waste 12 bytes (pointer to node and index). The only problem is that we need to maintain indexes in nodes when expanding the buffer. Also, after deleting a node we still occupy space for the cell and when adding new nodes the buffer will have to be expanded twice, but we will not use half of these cells.

That's why we want to understand what type of queues suits us better and implement it

Acceptance criteria

  1. The type of queues to be used in the future has been selected.
  2. Implemented a new version of eviction policy on these queues.
  3. Implementation tests are written

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.