Git Product home page Git Product logo

mailfilter's Introduction

#mailFilter V1.2

原理简介

基于贝叶斯推断的垃圾邮件过滤器。通过8000封正常邮件和8000封垃圾邮件“训练”过滤器: 解析所有邮件,提取每一个词,然后,计算每个词语在正常邮件和垃圾邮件中的出现频率。

  1. 当收到一封未知邮件时,在不知道的前提下,我们假定它是垃圾邮件和正常邮件的概率各 为50%,p(s) = p(n) = 50%

  2. 解析该邮件,提取每个词,计算该词的p(s|w),也就是受该词影响,该邮件是垃圾邮件的概率

     			p(sw)             p(w|s)p(s)
     p(s|w) = -----------  =   ----------------------
     			p(w)        p(s)p(w|s) + p(n)p(w|n)
    
  3. 提取该邮件中p(s|w)最高的15个词,计算联合概率。

     			p(s|w1)p(s|w2)...p(s|w15)
     p = ---------------------------------------------------------------
     	p(s|w1)p(s|w2)...p(s|w15) + (1-p(s|w1))(1-p(s|w2)...(1-p(s|w15)))			
    
  4. 设定阈值 p > 0.9 :垃圾邮件
    p < 0.9 :正常邮件

注:如果新收到的邮件中有的词在史料库中还没出现过,就假定p(s|w) = 0.4

使用

  1. 解压data.rar到当前文件夹

  2. 启动一个终端,模拟邮件服务器

     cd mailFilter
     python server.py
    
  3. 等到出现 "Waiting for clients...",启动另一终端,模拟邮件发送端

     cd mailFilter
     python client.py emaillocation
    

注意使用Python 2.7版本

参考资料

http://www.ruanyifeng.com/blog/2011/08/bayesian_inference_part_two.html
http://en.wikipedia.org/wiki/Bayesian_spam_filtering

mailfilter's People

Contributors

lvwangbeta avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.