Git Product home page Git Product logo

naive-bayes-classifier's Introduction

**数据:**搜狗文本分类语料库

**分类器:**朴素贝叶斯分类器 NBC(Naive Bayesian Classifier)

**编程语言:**Python+jieba分词库+nltk+sklearn

改进:

1. 应该在处理每个文本的时候,应该去除一些杂乱信息,减少内存占用等  
2. 如果在事先有词典的情况下,可以直接提取文本特征  
3. 没有词典的时候,应该自己构造词典,甚至在大量样本中学习词典。由于没有事先的词典dict,把所有文档的分词结果放到一个dictionary里面,然后根据词频从高到低排序。由于处理每个文档的时候,就没有去除一些杂乱信息,比如标点符号、无意义的数字等,所以在试验中构造最终词典(固定选取1000个词)的时候,逐渐去除词典的部分高频项,观察正确率的变化  
4. 特征维数的选取,在本文中固定1000维,可以做正确率关于维数的变化    
5. 特别说明:因为分类器用的是朴素贝叶斯,所以文本特征是[TRUE, FALSE, ...]。文本是否包含字典中词的判别p(feature_i | C_k) = ...如果是使用SVM,那么特征应该是词频或者TDIDF等  
6. 可以采用nltk或sklearn,注意其中选取的特征格式不同。nltk要求特征为dict格式,sklearn要求特征为list  

naive-bayes-classifier's People

Contributors

lining0806 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

naive-bayes-classifier's Issues

交流请教

作者您好,请问您是做什么方向的?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.