Git Product home page Git Product logo

redisbloom-py's Introduction

GitHub issues CircleCI Dockerhub codecov

RedisBloom: Probabilistic Data Structures for Redis

Forum Discord

logo

Overview

RedisBloom adds a set of probabilistic data structures to Redis, including Bloom filter, Cuckoo filter, Count-min sketch, Top-K, and t-digest. Using this capability, you can query streaming data without needing to store all the elements of the stream. Probabilistic data structures each answer the following questions:

  • Bloom filter and Cuckoo filter:
    • Did value v already appear in the data stream?
  • Count-min sketch:
    • How many times did value v appear in the data stream?
  • Top-k:
    • What are the k most frequent values in the data stream?
  • t-digest:
    • Which fraction of the values in the data stream are smaller than a given value?
    • How many values in the data stream are smaller than a given value?
    • Which value is smaller than p percent of the values in the data stream? (What is the p-percentile value?)
    • What is the mean value between the p1-percentile value and the p2-percentile value?
    • What is the value of the nᵗʰ smallest/largest value in the data stream? (What is the value with [reverse] rank n?)

Answering each of these questions accurately can require a huge amount of memory, but you can lower the memory requirements drastically at the cost of reduced accuracy. Each of these data structures allows you to set a controllable trade-off between accuracy and memory consumption. In addition to having a smaller memory footprint, probabilistic data structures are generally much faster than accurate algorithms.

RedisBloom is part of Redis Stack.

How do I Redis?

Learn for free at Redis University

Build faster with the Redis Launchpad

Try the Redis Cloud

Dive in developer tutorials

Join the Redis community

Work at Redis

Setup

You can either get RedisBloom setup in a Docker container or on your own machine.

Docker

To quickly try out RedisBloom, launch an instance using docker:

docker run -p 6379:6379 -it --rm redis/redis-stack-server:latest

Build it yourself

You can also build RedisBloom on your own machine. Major Linux distributions as well as macOS are supported.

First step is to have Redis installed, of course. The following, for example, builds Redis on a clean Ubuntu docker image (docker pull ubuntu):

mkdir ~/Redis
cd ~/Redis
apt-get update -y && apt-get upgrade -y
apt-get install -y wget make pkg-config build-essential
wget https://download.redis.io/redis-stable.tar.gz
tar -xzvf redis-stable.tar.gz
cd redis-stable
make distclean
make
make install

Next, you should get the RedisBloom repository from git and build it:

apt-get install -y git
cd ~/Redis
git clone --recursive https://github.com/RedisBloom/RedisBloom.git
cd RedisBloom
./sbin/setup
bash -l
make

Then exit to exit bash.

Note: to get a specific version of RedisBloom, e.g. 2.4.5, add -b v2.4.5 to the git clone command above.

Next, run make run -n and copy the full path of the RedisBloom executable (e.g., /root/Redis/RedisBloom/bin/linux-x64-release/redisbloom.so).

Next, add RedisBloom module to redis.conf, so Redis will load when started:

apt-get install -y vim
cd ~/Redis/redis-stable
vim redis.conf

Add: loadmodule /root/Redis/RedisBloom/bin/linux-x64-release/redisbloom.so under the MODULES section (use the full path copied above).

Save and exit vim (ESC :wq ENTER)

For more information about modules, go to the Redis official documentation.

Run

Run redis-server in the background and then redis-cli:

cd ~/Redis/redis-stable
redis-server redis.conf &
redis-cli

Give it a try

After you setup RedisBloom, you can interact with it using redis-cli.

Create a new bloom filter by adding a new item:

# 127.0.0.1:6379> BF.ADD newFilter foo
(integer) 1

Find out whether an item exists in the filter:

# 127.0.0.1:6379> BF.EXISTS newFilter foo
(integer) 1

In this case, 1 means that the foo is most likely in the set represented by newFilter. But recall that false positives are possible with Bloom filters.

# 127.0.0.1:6379> BF.EXISTS newFilter bar
(integer) 0

A value 0 means that bar is definitely not in the set. Bloom filters do not allow for false negatives.

Client libraries

Project Language License Author Stars Package Comment
jedis Java MIT Redis Stars Maven
redis-py Python MIT Redis Stars pypi
node-redis Node.JS MIT Redis Stars npm
nredisstack .NET MIT Redis Stars nuget
redisbloom-go Go BSD Redis Stars GitHub
rueidis Go Apache License 2.0 Rueian Stars GitHub
rebloom JavaScript MIT Albert Team Stars GitHub
phpredis-bloom PHP MIT Rafa Campoy Stars GitHub
phpRebloom PHP MIT Alessandro Balasco Stars GitHub
vertx-redis-client Java Apache License 2.0 Eclipse Vert.x Stars GitHub
rustis Rust MIT Dahomey Technologies Stars GitHub

Documentation

Documentation and full command reference at redisbloom.io.

Mailing List / Forum

Got questions? Feel free to ask at the RedisBloom mailing list.

License

RedisBloom is licensed under the Redis Source Available License 2.0 (RSALv2) or the Server Side Public License v1 (SSPLv1).

redisbloom-py's People

Contributors

631086083 avatar ashtul avatar avitalfineredis avatar chayim avatar dengliming avatar dvirdukhan avatar dvora-h avatar gkorland avatar guykorlandredis avatar guyroyse avatar mnunberg avatar sachin-kottarathodi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

redisbloom-py's Issues

Why 'cfAddNX' so many misjudgement?

I am doing some test of Cuckoo Filter with data set of range(1, 1e8)

redis_cuckoo.cfCreate(CK_FID, 100000000, bucket_size=10, ) cnt = 0 duplicate_cnt = 0 for i in range(100000000): exist = not bool(redis_cuckoo.cfAddNX(CK_FID, i)) cnt += 1 if cnt % 10000 == 0: print(cnt, duplicate_cnt) if exist: duplicate_cnt += 1

After nearly 1000k elements was added , error count was about ~22k, which make no sense for me .
can anybody give me some guidance. TYVM!

Support for T-Digest data structure

Given RedisBloom/RedisBloom#285 there is the following set of commands we should enable:

  • TDIGEST.CREATE: Allocate a new histogram
  • TDIGEST.RESET: Empty out a histogram and re-initialize it
  • TDIGEST.ADD: Add a value to the t-Digest with the specified count
  • TDIGEST.MERGE: Merge one t-Digest into another
  • TDIGEST.CDF: Returns the fraction of all points added which are ≤ x.
  • TDIGEST.QUANTILE: Returns an estimate of the cutoff such that a specified fraction of the data added to the t-Digest would be less than or equal to the cutoff.
  • TDIGEST.MIN: Get the minimum value from the histogram. Will return DBL_MAX if the histogram is empty
  • TDIGEST.MAX: Get the maximum value from the histogram. Will return DBL_MIN if the histogram is empty
  • TDIGEST.INFO : Returns compression, capacity, total merged and unmerged nodes, the total compressions
    made up to date on that key, and merged and unmerged weight.

With the in-depth params in https://oss.redislabs.com/redisbloom/master/TDigest_Commands/

Enhancement to the redisbloom

@gkorland
User should be able to create create the object once and configure the application later to support it using init_app() method like other libraries SQLAlchemy, Bcrypt etc.

from redisbloom.client import Client

rb = Client()

def create_app():
     app = Flask(__name__)
     rb.init_app(app)
     return app

redisbloom should support configuration of Client object using 'app' configuration.

Support redis-py 4.1.0

It would be useful to use CMS in cluster-mode to count very large sets. That is only supported in redis-py 4.1.0 or greater.

Add and expose at runtime in a standard format the __version__ attribute

We should be able expose the package version at runtime via __version__ attribute while ensuring a single source of truth for version number.

Here's a discussion on how to standardize this info: https://stackoverflow.com/questions/458550/standard-way-to-embed-version-into-python-package

Example of redistimeseries-py:

attribute
https://github.com/RedisTimeSeries/redistimeseries-py/blob/master/redistimeseries/_version.py

setup.py way of reading it:
https://github.com/RedisTimeSeries/redistimeseries-py/blob/master/setup.py#L8

How to use bfInfo from redisbloom-py

Hello folks. There is a bfInfo function inside Client class from redisbloom. But whenever I try to use that command I got results like <redisbloom.client.BFInfo object at 0x7ff5e00d3290>.
Can you give me example please how can I print the entire information from real BF INFO command here using that function?

Unknown command `BF.RESERVE`

I am getting the error:

redis.exceptions.ResponseError: unknown command `BF.RESERVE`, with args beginning with: `bloom`, `0.01`, `1000`,

This happens when running the #Using Bloom Filter example from the documentation - I get the same error with other scripts as well.

Redis Server is up and receives the request:

57121:M 12 Sep 2020 11:08:12.964 - Accepted ::1:53467
57121:M 12 Sep 2020 11:08:12.970 - Client closed connection

Full stack trace:

Traceback (most recent call last):
  File "example.py", line 5, in <module>
    rb.bfCreate('bloom', 0.01, 1000)
  File "/Users/henrik/tmp/bloom/venv/lib/python3.7/site-packages/redisbloom/client.py", line 242, in bfCreate
    return self.execute_command(self.BF_RESERVE, *params)
  File "/Users/henrik/tmp/bloom/venv/lib/python3.7/site-packages/redis/client.py", line 901, in execute_command
    return self.parse_response(conn, command_name, **options)
  File "/Users/henrik/tmp/bloom/venv/lib/python3.7/site-packages/redis/client.py", line 915, in parse_response
    response = connection.read_response()
  File "/Users/henrik/tmp/bloom/venv/lib/python3.7/site-packages/redis/connection.py", line 756, in read_response
    raise response
redis.exceptions.ResponseError: unknown command `BF.RESERVE`, with args beginning with: `bloom`, `0.01`, `1000`,

pip freeze:

hiredis==1.1.0
redis==3.5.3
redisbloom==0.4.0
rmtest==0.7.0
six==1.15.0

Python 3.7.3.

"invalid offset - no link found" on bfLoadChunk (Python 3.7.3)

When I try to dump and restore a Bloom filter using bfScandump and bfLoadChunk I get the following error:

>>> from redisbloom.client import Client
>>> r = Client()
>>> i, d = r.bfScandump('bloom', 0)
>>> r.bfLoadChunk('bloom', 0, d)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/dist-packages/redisbloom/client.py", line 257, in bfLoadChunk
    return self.execute_command(self.BF_LOADCHUNK, *params)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 775, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/usr/local/lib/python3.7/dist-packages/redis/client.py", line 789, in parse_response
    response = connection.read_response()
  File "/usr/local/lib/python3.7/dist-packages/redis/connection.py", line 642, in read_response
    raise response
redis.exceptions.ResponseError: invalid offset - no link found
>>>

'bloom' is an existing Bloom filter.

I'm using Python 3.7.3 (2.7.16 gives the same error)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.