yiling-j / theine Goto Github PK

high performance in-memory cache

License: BSD 3-Clause "New" or "Revised" License

Python 99.48% Makefile 0.52%

theine's Issues

type safe and elegent decorator

With the help of type hints it's possible to limit key function's signature, just the same way as Cacheme. Using a separate function avoids the weird lamda or f-string in original decorator

@Cache("tlfu", 10000)
def get_user_info(user_id: int) -> Dict:
    return {}

# or use existing cache
cache = Cache("tlfu", 10000)
@cache
def get_user_info(user_id: int) -> Dict:
    return {}

# register key function: function name is not important, so just use _ here
@get_user_info.key
def _(user_id: int) -> str:
    return f"user:{user_id}"

Add cache statistics

Memory Leak with Thread creation

When a new Cache is initialised, a new Thread is spawned from the following code in the init method:

self._maintainer = Thread(target=self.maintenance, daemon=True)
self._maintainer.start()

The Thread is never stopped and even if the Cache dies the thread remains and holds onto the cached data. Even if the data is cleared, the Thread will continue to persist.

Minimal Reproducible Example

from theine import Cache

# Loop endlessly - this will crash after a few minutes
while True:
    # Create a new cache - which creates a new Thread
    c = Cache("tlfu", 10000)
    # Add some fake data. Not necessary for the crash, but it takes up memory
    c.set("data", ["data"] * 2048)
    # Remove the cache, just to be explicit.
    del c

Proposal to Integrate SIEVE Eviction Algorithm

Hi there,

Our team (@1a1a11a) has developed a new cache eviction algorithm, called SIEVE. It’s simple, efficient, and scalable.

Why SIEVE could be a great addition:

Simplicity: Integrating SIEVE is straightforward, usually needing to change less than 20 lines of code on average.
Efficiency: On skewed workloads, which are typical in web caching scenarios, SIEVE is top-notch.
Cache Primitive: SIEVE is not just another algorithm; it's a primitive that could enhance or replace LRU/FIFO queues in advanced systems like LeCaR, TwoQ, ARC, and S3-FIFO.

Welcome to dive into the details on our website sievecache.com and on our SIEVE blog.

We would love to explore the possibility of integrating SIEVE into theine. We believe it could be a beneficial addition to the library and the community.

Looking forward to your feedback!

Multiple processes sharing the same in-memory cache

AFAIK nothing exists in Python. Maybe you can configure a shared address space or something and multiple processes be able to share the same cache.

OverflowError: cannot fit 'int' into an index-sized integer

Here is the minimal test case:

from theine import Cache

cache: Cache = Cache(policy="tlfu", size=10000)

def test1() -> None:
    print(len(cache))
    cache.clear()

def test2() -> None:
    print(len(cache))
    cache.clear()

I am running this test file using pytest. Second test fails with OverflowError: cannot fit 'int' into an index-sized integer error. If I remove cache.clear() then it works. If I change the policy to "clockpro" then it also works.

Environment: MacOS 13.5.1, python 3.8.16.

Because Theine Python is combination of Rust & CPython, it would be a little difficult to implement. But if you really need this feature, please leave a comment and I will consider adding it.

Feature Request: TTL only Cache

Unlimited sized Cache

@Memoize(Cache("unlimited"), timedelta(seconds=10 * 60))
def get():
    ...

Zero sized Cache (ignore returned value)

@Memoize(Cache("empty"), timedelta(seconds=10 * 60))
def update_global_dict():
    value = query(..)
    global_dict[value.field2] = value.field1
    global_dict[value.field3] = value.field1

Better Documents

Both code and readme

Documentation: background threads behavior

In a django backend, we had one of the request that (wrongly) triggered the instantiation of a new Cache instance, which was then garbage collected immediately after the request finished.

With this pattern, we noticed the CPU usage of our backend to increase linearly with the number of past requests served. We ultimately traced it down to this erroneous creation of Cache instances.

I guess it is starting a new background thread for each instance, and these threads are not killed with garbage collection, hence resulting in millions of threads after some time.

The theine library has multiple interfaces. Could you document which pattern starts new threads in the background? Would it affect the @Memoize decorator or the django adapter? For instance if inside a request handling we have a dynamic function with @Memoize, wouldn't that also instantiate a new cache at each request without ever killing background threads?

Enhancements

Add Timing Wheel to make TTL more generic
Add Django cache backend adapter

yiling-j / theine Goto Github PK

theine's Issues

type safe and elegent decorator

Add cache statistics

Memory Leak with Thread creation

Proposal to Integrate SIEVE Eviction Algorithm

Multiple processes sharing the same in-memory cache

OverflowError: cannot fit 'int' into an index-sized integer

Add weight and weigher

Add CLOCK-Pro policy to core

Cache Persistence

Feature Request: TTL only Cache

Better Documents

Documentation: background threads behavior

Enhancements

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent