Git Product home page Git Product logo

emcache's Introduction

emcache

A high performance asynchronous Python client for Memcached with full batteries included

https://readthedocs.org/projects/emcache/badge/?version=latest

Emcache stands on the giant's shoulders and implements most of the characteristics that are desired for a Memcached client based on the experience of other Memcached clients, providing the following main characteristics:

  • Support for many Memcached hosts, distributing traffic around them by using the Rendezvous hashing algorithm.
  • Support for different commands and different flag behaviors like noreply, exptime or flags.
  • Support for SSL/TLS protocol.
  • Support for SASL authentication by ASCII protocol.
  • Support for autodiscovery, which should work with AWS and GCP memcached clusters.
  • Adaptative connection pool, which increases the number of connections per Memcache host depending on the traffic.
  • Node healthiness traceability and an optional flag for disabling unhealthy for participating in the commands.
  • Metrics for operations and connections, send them to your favourite TS database for knowing how the Emcache driver is behaving.
  • Listen to the most significant cluster events, for example for knowing when a node has been marked as unhealthy.
  • Speed, Emcache is fast. See the benchmark section.

Usage

For installing

pip install emcache

The following snippet shows the minimal stuff that would be needed for creating a new client and saving a new key and retrieving later the value.

import asyncio
import emcache
async def main():
    client = await emcache.create_client([emcache.MemcachedHostAddress('localhost', 11211)])
    await client.set(b'key', b'value')
    item = await client.get(b'key')
    print(item.value)
    await client.close()
asyncio.run(main())

Emcache has currently support, among many of them, for the following commands:

  • get Used for retrieving a specific key.
  • gets Cas version that returns also the case token of a specific key.
  • get_many Many keys get version.
  • gets_many Many keys + case token gets version.
  • gat Used retrieving a specific key if exists and update expiration time(Get and Touch).
  • gats Cas version that retrieving a specific key if exists and update expiration time(Get and Touch with Cas).
  • gat_many Many keys gat version.
  • gats_many Many keys + case token gats version.
  • set Set a new key and value
  • add Add a new key and value, if and only if it does not exist.
  • replace Update a value of a key, if and only if the key does exist.
  • append Append a value to the current one for a specific key, if and only if the key does exist.
  • prepend Prepend a value to the current one for a specific key, if and only if the key does exist.
  • cas Update a value for a key if and only if token as provided matches with the ones stored in the Memcached server.
  • version Version string of this server.
  • flush_all Its effect is to invalidate all existing items immediately (by default) or after the expiration specified.
  • delete The command allows for explicit deletion of items.
  • touch The command is used to update the expiration time of an existing item without fetching it.
  • increment/decrement Commands are used to change data for some item in-place, incrementing or decrementing it.
  • cache_memlimit This command allow set in runtime cache memory limit.
  • stats Show a list of required statistics about the server, depending on the arguments.
  • verbosity Command control STDOUT/STDERR info, choose level and look logging memcached.

Take a look at the documentation for getting a list of all of the operations that are currently supported.

Some of the commands have support for the following behavior flags:

  • noreply for storage commands like set we do not wait for an explicit response from the Memcached server. Sacrifice the explicit ack from the Memcached server for speed.
  • flags for storage we can save an int16 value that can be retrieved later on by fetch commands.
  • exptime for storage commands this provides a way of configuring an expiration time, once that time is reached keys will be automatically evicted by the Memcached server

For more information about usage, read the docs.

Benchmarks

The following table shows how fast - operations per second - Emcache can be compared to the other two Memcached Python clients, aiomcache and pymemcache. For that specific benchmark two nodes were used, one for the client and one for the Memcached server, using 32 TCP connections and using 32 concurrent Asyncio tasks - threads for the use case of Pymemcache. For Emcache and Aiomcache uvloop was used as a default loop.

In the first part of the benchmark, the client tried to run as mucha set operations it could, and in a second step the same was done but using get operations.

Client Concurrency Sets opS/sec Sets latency AVG Gets opS/sec Gets latency AVG
aiomcache 32 33872 0.00094 34183 0.00093
pymemcache 32 32792 0.00097 32961 0.00096
emcache 32 49410 0.00064 49212 0.00064
emcache (autobatching) 32 49410 0.00064 89052 0.00035

Emcache performed better than the other two implementations reaching almost 50K ops/sec for get and set operations. One autobatching is used it can boost the throughtput x2 (more info about autobatching below)

Another benchmark was performed for comparing how each implementation will behave in case of having to deal with more than 1 node, a new benchmark was performed with different cluster sizes but using the same methodology as the previous test by first, performing as many set operations it could and later as many get operations it could. For this specific use test with Aiomemcahce could not be used since it does not support multiple nodes.

Client Concurrency Memcahed Nodes Sets opS/sec Sets latency AVG Gets opS/sec Gets latency AVG
pymemcache 32 2 21260 0.00150 21583 0.00148
emcache 32 2 42245 0.00075 48079 0.00066
pymemcache 32 4 15334 0.00208 15458 0.00207
emcache 32 4 39786 0.00080 47603 0.00067
pymemcache 32 8 9903 0.00323 9970 0.00322
emcache 32 8 42167 0.00075 46472 0.00068

The addition of new nodes did not add almost degradation for Emcache, in the last test with 8 nodes Emcache reached 42K get ops/sec and 46K set ops/sec. On the other hand, Pymemcached suffered substantial degradation making Emcache ~x5 times. faster.

Autobatching

Autobatching provides you a way for fetching multiple keys using a single command, batching happens transparently behind the scenes without bothering the caller.

For start using the autobatching feature you must provide the parameter autobatching as True, hereby all usages of the get and gets command will send batched requests behind the scenes.

Get´s are piled up until the next loop iteration. Once the next loop iteration is reached all get´s are transmitted using the same Memcached operation.

Autobatching can boost up the throughput of your application x2/x3.

Development

Clone the repository and its murmur3 submodule

git clone --recurse-submodules [email protected]:emcache/emcache

Compile murmur3

pushd vendor/murmur3
make static
popd

Install emcache with dev dependencies

make install-dev

Testing

Run docker containers, add read write privileges

docker compose up -d
docker exec memcached_unix1 sh -c "chmod a+rw /tmp/emcache.test1.sock"
docker exec memcached_unix2 sh -c "chmod a+rw /tmp/emcache.test2.sock"

Run tests

make test

emcache's People

Contributors

artemismagilov avatar chs2 avatar elbaro avatar jgibo avatar jhominal avatar kramar11 avatar lferran avatar mgorven avatar nebu1eto avatar pfreixes avatar squat avatar takeda avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

emcache's Issues

UNIX Socket Support

It would be nice if this library could also connect to memcached via UNIX sockets, that way it I do not need it listening on an IP address.

Thanks!

Ryan

[Idea] Support for auto pipelining

Based on the research done by @mcollina here [1] Emcache could also provide an auto pipelining execution mode where gets would be sent and received at each loop iteration implementing a kind of micro batching pattern.

The unique operation supported for now would be get which is supported in batched mode by Memcached.

We must create as many batches as many different nodes would be participating in the cluster, behind the scenes Emcache will be responsible for grouping all of the keys to each micro-batch and send each of them at each loop iteration.

Some configuration values that could be exposed to the user:

  • Batch size.
  • Batch timeout.
  • Max batch concurrency per loop iteration.

[1] https://github.com/mcollina/ioredis-auto-pipeline

unix client raise timeouterror with client.get("key")

I'm trying to get a value from a non-existent key and should get None but I get a TimeoutError.
And sometimes it works and sometimes it doesn’t

@pytest.fixture(
    params=[
        pytest.param(
            [MemcachedHostAddress("localhost", 11211), MemcachedHostAddress("localhost", 11212)], id="tcp_client"
        ),
        pytest.param(
            [MemcachedUnixSocketPath("/tmp/emcache.test1.sock"), MemcachedUnixSocketPath("/tmp/emcache.test2.sock")],
            id="unix_client",
        ),
    ]
)
def node_addresses(request):
    return request.param


@pytest.fixture()
async def client(node_addresses, event_loop):
    client = await create_client(node_addresses, timeout=2.0)
    try:
        yield client
    finally:
        await client.close()

#this code raise error for unix client
async def test_available_clients(client):
    assert (await client.get(b"key")) is None

try mmh3?

To use the hash function, we compile a custom library. For example for each test. This will also simplify the production of wheels. You can try using a ready-made supported solution https://pypi.org/project/mmh3/
The only thing you need to check

  1. That the hash function has not become slower
  2. That nothing broke after implementation
  3. That using a ready-made library is really convenient and fits perfectly with this project

unable to install emcache on alpine linux

hi @pfreixes

amazing work on emcache!
I'm setting up a python project with emcache but it seems like I can't install emcache on alpine
version of emcache: 0.3.0b0

to reproduce: (need docker installed)

# launch an alpine linux container
docker run -it --rm python:3-alpine /bin/sh
# try installing in the container console
pip install --upgrade -v emcache

error message:

Collecting emcache
ERROR: Could not find a version that satisfies the requirement emcache (from versions: none)
ERROR: No matching distribution found for emcache

while on the other hand, using a different linux OS works:

docker run -it --rm python:3-slim-buster /bin/sh
pip install --upgrade -v emcache

which gives me:

Installing collected packages: emcache
Successfully installed emcache-0.3.0b0

seems like manylinux does wheel doesn't work with musl based distributions like alpine (reference) and I was wondering if it's a lot of work for you to build a package for alpine. it's the go-to option for containers and I believe it will make emcache easier to adopt.

Thanks in advance!

abstract hashing algorithm to cluster initialization

as of version 0.3.0b0, it seems like the hashing algorithm is not configurable (reference)

I think it will help with adoption if hash algorithm can be configured. I'm writing a new application that need to read from an existing memcached cluster (written to by a component who's using CRC hash algorithm). Not being able to configure hash algorithm basically killed emcache for the application I'm working on, and this can easily become a blocker for other people working on existing applications too.

emcache is very well written with extreme performance and I think improvements in compatibility will significantly improve the adoption

make migrate setup.py on pyproject.toml

During the development process, I encountered the fact that it is very difficult to keep dependency versions and building wheels up to date. in Pipeline you need to install setuptools every time. It seems that the code is more difficult to update and configure. It would be great to somehow bring pyproject.toml to one standard, for example use poetry or something else.
[1] - https://packaging.python.org/en/latest/discussions/setup-py-deprecated/#setup-py-deprecated
[2] - https://packaging.python.org/en/latest/guides/modernize-setup-py-project/#modernize-setup-py-project

No manylinux aarch64 wheels

This is required to install emcache in docker containers on M1 macs or other ARM64-based Linux machines like aws graviton 3 machines.

with-as is not supported

This code:

    async with emcache.create_client([emcache.MemcachedHostAddress('localhost', 11211)]) as client:
        await client.set(b'key', b'value')
        item = await client.get(b'key')
        print(item.value)

above raises:
RuntimeWarning: coroutine 'create_client' was never awaited

Publish a stable release

Emcache is being currently battle-tested, once the battle test is done and any critical bug is fixed, a new Stable release would need to be provided.

Support for SSL

Provide support for connecting to Memcached servers which are using SSL.

Improve Continuous Integrations

Since I invited as maintainer in emcache, I think how to improve CI pipeline. I suggest this tasks for make release pipeline get more stable.

  • macOS: use strategy matrix in Python version, parallelize wheel builds.
    • Honestly, Building wheel files in macOS is extremely slow. Because cibuildwheel build wheel files as each Python version and CPU architecture as sequential.
  • Linux: adopt cibuildwheel and use same tasks with macOS to build wheel files.
    • In now, only macOS uses cibuildwheel to build wheel files. So macOS’s build steps and Linux’s build steps are different.
    • I hope this build steps are same in two OS.
  • Documentation: It’ll be good to make documents for explain how this pipeline works and reproduce.

Is it possible to significantly speed up the parser?

Hi, i benching bench_parser_line.

python bench_parser_line.py

and get result

One line Python total time 1.3429830074310303
One line Cython total time 1.000786542892456
Multi line Cython total time 1.6877360343933105

Having looked at the cython code, I think that the difference in performance can be significant with full coverage of C types. I will say right away that I am not an expert in this matter. So I might be wrong. Tests need to be changed a little based on recent changes

Proposing as a maintainer Haze Lee

I would like to propose as a maintainer @Hazealign, already gave you write permissions for pushing to the repository but considering the work done for improving the CI I would like you to also give you the maintainer role.

WDYT?

Connection purge is too aggressive

The current connection purge system is too aggressive, it downscales all of the connections that need to be purged in one iteration. For providing more flexibility to the user we would need to provide an algorithm that can downscale connections in a smother way, or at least we should wait until the next schedule - by default every 60 seconds.

Timeout not working as expected

client = await emcache.create_client(
    [node], autobatching=True,autobatching_max_keys=128, min_connections=0, max_connections=32
)

        import time
        t = time.time()
        try:
          async with asyncio.timeout(3):
            item = await client.gets(key, return_flags=return_flags)
        except asyncio.TimeoutError:
            print(time.time() -t, ' seconds elapsed')
            print('timeout mc key: ', key)
            mg=client.cluster_managment()
            print(mg.unhealthy_nodes())
            print(mg.connection_pool_metrics())

The client is supposed to have the default timeout of 1 second. However the exception is raised after 3 seconds.

3.001142740249634  seconds elapsed
timeout mc key:  b'..'
[]
{MemcachedHostAddress(address='mc276', port=11211): ConnectionPoolMetrics(cur_connections=1, connections_created=1, connections_created_with_error=0, connections_purged=0, connections_closed=0, operations_executed=16, operations_executed_with_error=0, operations_waited=1, create_connection_avg=0.0011269636452198029, create_connection_p50=0.0011269636452198029, create_connection_p99=0.0011269636452198029, create_connection_upper=0.0011269636452198029)}

Also, the connetion time is fast and the key is not large. It looks like a deadlock somewhere in emcache.

Drop pytest-mock and use std python mock

excessive dependence on pytest-mock. Python stdlib has a library called unittest.mock which is used by pytest-mock and wrapped in a mocker. I think, I think, xta dependency in fact does not add anything new for mock testing. sometimes you want to make additional fixtures based on unittest.mock, but this will divide our tests into 2 ways to create mocks. otherwise, you need to use a mocker - and other than wrapping it in the standard library, it doesn’t do anything useful. I do not deny, of course, the fact that this library allows you to use advanced mock testing, but this library uses basic functionality. So what is the main motivation behind this problem? I just want to reduce the dependency deprivation and make the code more specific in the future.

Pytest-mock based on package mock.

[bug] handle remove_waiter that is not there

I'm seeing this in logs sometimes:

CancelledError: null
  File "emcache/connection_pool.py", line 367, in __aenter__
    self._connection = await self._waiter

ValueError: deque.remove(x): x not in deque
...

  File "emcache/client.py", line 319, in set
    result = await self._storage_command(b"set", key, value, flags, exptime, noreply)
  File "emcache/client.py", line 114, in _storage_command
    async with node.connection() as connection:
  File "emcache/connection_pool.py", line 369, in __aenter__
    self._connection_pool.remove_waiter(self._waiter)
  File "emcache/connection_pool.py", line 309, in remove_waiter
    self._waiters.remove(waiter)

autobatching_max_keys typed as bool in create_client

As the title says, in function create_client the parameter autobatching_max_keys is wrongly typed as bool instead of int (client.py:635).

So mypy shows an error, e.g.
error: Argument "autobatching_max_keys" to "create_client" has incompatible type "int"; expected "bool" [arg-type]

support sync client?

I remember being a beginner - and it was difficult for me to work with asynchronous code. I think we need a synchronous client. It is also known that asynchronous code is not thread safe, correct me if I made a mistake in my interpretation. In any case, this is a good start for a smooth transition from synchronous to asynchronous implementation for beginners.
The big problem I see is that there will be twice as many tests and copy-paste.
This is just an idea and does not claim to be correct.

Python 3.12 binary wheel

Description

Since python 3.12 stable has been released in October, would it be possible to do a minor release to include the prebuilt wheels?

Don't log when connection lost as part of purge of unused connections

I don't think a connection lost (for example as unused connections are purged) is necessarily serious enough to emit a WARNING level log about - it's completely routine and expected behavior. As it stands this log creates unwanted noise and I currently filter it.

Perhaps it could be INFO or DEBUG level?

logger.warning(f"Connection lost: {exc}")

Thanks for open sourcing this! Very useful for a side-project of mine

Makefile for local development is not built locally

I can’t properly build the library locally and test using the make command. It would be nice to somehow rewrite the assembly for development locally. I only get it when I send it to GithubAction and that’s sad.

Key validation disallows allowed characters

It would appear to me that there are some characters being disallowed in key validation that should be allowed. For example ñ (\xd1) fails the check in is_key_valid:

>>> from emcache.client import cyemcache
>>> cyemcache.is_key_valid(b'\xd1')
False

But the memcache protocol documentation seems to indicate that this character should be fine:

Data stored by memcached is identified with the help of a key. A key
is a text string which should uniquely identify the data for clients
that are interested in storing and retrieving it. Currently the
length limit of a key is set at 250 characters (of course, normally
clients wouldn't need to use such long keys); the key must not include
control characters or whitespace

(From https://github.com/memcached/memcached/blob/master/doc/protocol.txt)

ñ is neither a control character or whitespace which is what leads me to believe it should be acceptable. Telnetting to an instance of memcache and bypassing emcache successfully allows this as well:

set \xd1 0 100 4
test
STORED
get \xd1
test

Of course, let me know if I'm misunderstanding something here. Also happy to contribute an attempt at a fix if that's welcome!

the code needs to be restructured

the code needs to be restructured. too much copy paste - DRY principle

most common in some client memcached commands

try:
    future = self._loop.create_future()
    parser = cyemcache.AsciiOneLineParser(future)
    self._parser = parser
    self._transport.write(data)
    await future
    result = parser.value()
    return result
finally:
    self._parser = None

Pypi release for python 3.10

I'd like to request a release for python 3.10 on pypi.
I was able to compile and install the package atleast and do some light testing on python 3.10.1 + linux

ValueError when using `increment`/`decrement` on a non-numeric value

    client = await emcache.create_client([emcache.MemcachedHostAddress("localhost", 11211)])
    await client.set(b"foo", b"bar")
    try:
        await client.increment(b"foo", 1)
    except ValueError as ex:
        print(ex)

This will print ValueError: invalid literal for int() with base 10: b'CLIENT_ERROR' cannot increment or decrement non-numeric value

I think that, instead of raising an accidental ValueError at that point, we should check for that kind of error and raise (a subclass of) emcache.CommandError instead.

I would be happy to write the PR to test and fix increment and decrement, but I would like your advice on two points:

  1. For that need, should we raise an instance of the emcache.CommandError base class, of an existing subclass (e.g. emcache.StorageCommandError), or create a new subclass?
  2. In order to check for that error condition, would you prefer checking the bytes object (by calling bytes.isdigit) beforehand, or would you rather that the ValueError exception be caught?

Suppor for Python 3.9

We would need to provide support for Python 3.9, CI would need to run tests under this version and a new version would need to be uploaded.

Request to clarify license

The only indication of the license this code is released with is "Apache Software License" and "MIT License" listed in the PyPi metadata in setup.py. Would it be possible to add a LICENSE file with the exact license being used?

Support for meta commands

New meta-commands are finally GA, more info here https://github.com/memcached/memcached/wiki/MetaCommands

Meta commands come with some semantics that makes it easier to implement some patterns, like the dogpile side effect when a miss happens.

Also, it comes with an interesting feature that allows adding an opaque value to each mg md ms for implementing multiplexing and avoiding the connection stampede problem

Release next version for support arm64 in macOS.

Can we release next version for support Apple Silicon?
If changes are insufficient, I think we can build v0.6.1 again just with macOS build.

p.s. I'll try to use cibuildwheel for Linux too. And also this makes to support arm64(aarch64) Architecture's wheels in Linux too.

Thanks. :)

Event processor should not mask exceptions raised by the message queue extraction call

Events are processed in series using a background Asyncio tasks, the background Asyncio polls continuously a queue and try to be resilient to generic exceptions raised by the events handler, but logging some information.

Current implementation masks exceptions raised by the queue.get [1], we must not do that and if any exception happens during this call we should react upon this and stop keep processing messages.

[1] https://github.com/pfreixes/emcache/blob/master/emcache/cluster.py#L104

Pipeline release with some commands

it is necessary to implement a pipeline mechanism for several different commands in one request to the server. This will significantly improve client performance and eliminate IO/Bound waiting after each command. The Redis client has an excellent implementation.

for example I want to send a request

set city 0 0 9  
Bangalore
get city
set city2 0 0 9  
Bangalore
get city2

After replenishment, it must be properly processed

Unable to install emcache on Windows 11

I'm trying to install emcache on Windows 11, but it returns error saying No matching distribution found.

PS C:\> pip install emcache
ERROR: Could not find a version that satisfies the requirement emcache (from versions: none)
ERROR: No matching distribution found for emcache

Am I missing something?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.