Git Product home page Git Product logo

bloomfilterredis's Introduction

基于Redis的布隆过滤器

简介

  • BloomFilterRedis:使用Redis的Bitmap作为位数组构建起来的可扩展的布隆过滤器,位数组的默认长度为2^23,哈希函数默认为八个。
  • orange:Scrapy工程,以“橘子水”为出发点的爬取百度百科的爬虫,配置了基于BloomFilterRedis的过滤器。

关于Bitmap以及其它介绍详见我的博文基于Redis的布隆过滤器的实现

开发环境

  • python 2.7.12
  • Redis 3.2.8
  • python-redis
  • scrapy 1.3.3

使用方法

from BloomFilterRedis import BloomFilterRedis

bloomFilterRedis = BloomFilterRedis("bloom")
bloomFilterRedis.do_filter("one item to check")

Scrapy中的使用方法

  1. BloomFilterRedis和复制到工程文件夹下,将BloomRedisDupeFilter.py复制到与settings.py同一目录下。
  2. 在settings.py中配置以下字段:
# 配置过滤器为基于redis的布隆过滤器
DUPEFILTER_CLASS = 'orange.BloomRedisDupeFilter.BloomRedisDupeFilter'
# reids中bitmap的key,默认为‘bloom’
# BLOOM_REDIS_KEY = 'bloom'
# redis的连接配置,默认为本机
# BLOOM_REDIS_HOST = '127.0.0.1'
# BLOOM_REDIS_PORT = 6379
# 布隆过滤器的哈希列表,默认为8个,定义在GeneralHashFunctions中
# BLOOM_HASH_LIST = ["rs_hash", "js_hash", "pjw_hash", "elf_hash", "bkdr_hash", "sdbm_hash", "djb_hash", "dek_hash"]

bloomfilterredis's People

Contributors

kongtianyi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

bloomfilterredis's Issues

hash 出来的只有负数,redis offset 是会报错的

Hi 天逸,

想根据你的这个 python 实现,转成 c# 的实现,发现 hash 出来的值是有负数的,redis offset 是不能接受的。
简单用 abs 来解决也不行,会增加碰撞的概率,不知道你有没有遇到过这样的问题?

谢谢

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.