Git Product home page Git Product logo

sewiahho / bloompy Goto Github PK

View Code? Open in Web Editor NEW

This project forked from kingking888/bloompy

0.0 0.0 0.0 77 KB

An implementation of 4 kinds of Bloom Filter in Python3.Includes the standard BloomFilter,CountingBloomFilter,ScalableBloomFilter,ScalableCountingBloomFilter.Update from pybloom.(4种布隆过滤器的Python实现(标准、计数、标准扩容、计数扩容)

License: BSD 3-Clause "New" or "Revised" License

Python 100.00%

bloompy's Introduction

An implementation of 4 kinds of Bloom Filter in Python3.中文

bloompy includes the standard BloomFilter,CountingBloomFilter,ScalableBloomFilter,ScalableCountingBloomFilter. It's Update from pybloom.

Installation

pip install bloompy

Usage

There's 4 kinds of BloomFilter you can use by bloompy.

  • standard bloom filter

the standard one can only query or add elements in it.

>>> import bloompy
>>> bf = bloompy.BloomFilter(error_rate=0.001,element_num=10**3)

# query the status of the element inside the bf immediately 
# and add it into the bf.False returned indicates the element
# does not inside the filter.
>>> bf.add(1) 
False
>>> bf.add(1)
True
>>> 1 in bf
True
>>> bf.exists(1)
True
>>> bf.add([1,2,3])
False
>>> bf.add([1,2,3])
True
>>> [1,2,3] in bf
True
>>> bf.exists([1,2,3])
True

# store the bf into a file.
>>> bf.tofile('filename.suffix')

# recover a bf from a file.Auto recognize which kind of filters it is.
>>> recovered_bf = bloompy.get_filter_fromfile('filename.suffix')

# or you can use a classmethod 'fromfile' of the BloomFilter Class to get
# a BloomFilter instance from a file.Same as other kind of filter Classes .
>>> recovered_bf = bloompy.BloomFilter.fromfile('filename.suffix')

# return the total number of the elements inside the bf.
>>> bf.count
2

# the capacity of the current filter.
>>> bf.capacity
1000

# the bits array of the current filter. 
>>> bf.bit_array
bitarray('00....')

# the total length of the bitarray.
>>> bf.bit_num
14400

# the hash seeds inside the filter.
# they are prime numbers by default.It's modifiable.
>>> bf.seeds
[2, 3, 5, 7, 11,...]

# the amount of hash functions 
>>> bf.hash_num
10
  • counting bloom filter

The counting bloom filter is a subclass of the standard bloom filter.But it supports the delete operation. It is set inside that 4bits represent a bit of the standard bf. So it costs more momery than the standard bf, it's 4 times the standard one.

>>> import  bloompy
>>> cbf  = bloompy.CountingBloomFilter(error_rate=0.001,element_num=10**3)

# same as the standard bf at add operation.
>>> cbf.add(12)
False
>>> cbf.add(12)
True
>>> 12 in cbf
True
>>> cbf.count
1

# query the status of the element inside the cbf immediately 
# if the element inside the cbf,delete it.
>>> cbf.delete(12)
True
>>> cbf.delete(12)
False
>>> 12 in cbf
False
>>> cbf.count
0

# recover a cbf from a file.Same as the bf.
>>> recovered_cbf = bloompy.CountingBloomFilter.fromfile('filename.suffix')

You can do any operations of the BloomFilter on it as well.

  • scalable bloom filter

Auto increase the capacity of the filter if the current amount of inserted elements is up to the limits. It's set 2times the pre capacity inside by default.

>>> import bloompy
>>> sbf = bloompy.ScalableBloomFilter(error_rate=0.001,initial_capacity=10**3)

# at first, the sbf is at 1000 capacity limits.
>>> len(sbf)
0
>>> 12 in sbf
False
>>> sbf.add(12)
False
>>> 12 in sbf 
True
>>> len(sbf)
1
>>> sbf.filters
[<bloompy.BloomFilter object at 0x000000000B6F5860>]
>>> sbf.capacity
1000

# when the amount of inserting elements surpass the limits 1000.
# the sbf appends a new filter inside it which capacity 2times 1000.
>>> for i in range(1000):
        sbf.add(i)
>>> 600 in sbf
True
>>> len(sbf)
2
>>> sbf.filters
[<bloompy.BloomFilter object at 0x000000000B6F5860>, <bloompy.BloomFilter object at 0x000000000B32F748>]
>>> sbf.capacity
3000

# recover a sbf from a file.Same as bf.
>>> recovered_sbf = bloompy.ScalableBloomFilter.fromfile('filename.suffix')

You can do any operations of the BloomFilter on it as well.

  • scalable counting bloom filter

It's a subclass of the ScalableBloomFilter.But it supports the delete operation. You can do any operations of the ScalableBloomFilter on it as well.

>>> import bloompy
>>> scbf = bloompy.SCBloomFilter(error_rate=0.001,initial_capacity=10**3)

>>> scbf.add(1)
False
>>> 1 in scbf
True
>>> scbf.delete(1)
True
>>> 1 in scbf
False
>>> len(scbf)
1
>>> scbf.filters
[<bloompy.CountingBloomFilter object at 0x000000000B6F5828>]

# add elements in sbf to make it at a capacity limits
>>> for i in range(1100):
        scbf.add(i)
>>> len(scbf)
2
>>> scbf.filters
[<bloompy.CountingBloomFilter object at 0x000000000B6F5828>, <bloompy.CountingBloomFilter object at 0x000000000B6F5898>]

# recover a scbf from a file.Same as bf.
>>> recovered_scbf = bloompy.SCBloomFilter.fromfile('filename.suffix')

Store and recover

As shown in the standard bloom filter.You can store a filter in 2 ways:

  • classmethod 'fromfile'
  • get_filter_fromfile()

if you do clearly know that there is a BloomFilter stored in a file. you can recover it with:

bloompy.BloomeFilter.fromfile('filename.suffix')

or it's a CountingBloomFilter inside it:

bloompy.CountingBloomFilter.fromfile('filename.suffix')

Same as others.

But if you don't know what kind of filter it is stored in the file.Use:

bloompy.get_filter_fromfile('filename.suffix')

It will auto recognize the filter stored inside a file.Then you can do something with it.

bloompy's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.