observerss / textfilter Goto Github PK
View Code? Open in Web Editor NEW敏感词过滤的几种实现+某1w词敏感词库
敏感词过滤的几种实现+某1w词敏感词库
很短但是觉得挺有用的东东 所以单独立了个项目备份一下 USAGE: >>> f = DFAFilter() >>> f.add("sexy") >>> f.filter("hello sexy baby") hello **** baby
1、pirnt -> pirnt()
2、unicode的报警
py3 的字符串与 py2 的区别说穿了就是很简单的对三种数据类型的处理。py2 的方式意味着字符串跟字节流是相同的东西。而unicode字符串是某种独特的类型。bytes==strunicode!=strpy3 的方式意味着字符串跟unicode字符串是相同的东西,而字节流是某种独特的类型。unicode==strbytes!=strunicode是什么呢?是某种特定编码的字节流,是bytes的子集。这就意味着:所有的unicode都能放进bytes,但某些bytes无法放进unicode。C 程序员最难接受的就是无法将一串字节流放进字
Python3里str是unicode,所有需要和人交互的地方都应该用str
作者:pansz
链接:https://www.zhihu.com/question/60231684/answer/173871080
来源:知乎
著作权归作者所有。商业转载请联系作者获得授权,非商业转载请注明出处。
gfw = DFAFilter()
gfw.parse("keywords2") #keywrods2 包含敏感词:1989年
print gfw.filter("1989","*")
过滤后的结果:
989
“然后” 也是违禁词?
牛逼牛逼
运行以下代码的时候,
from filter import DFAFilter
显示:
Traceback (most recent call last):
File "", line 1, in 0
ImportError: No module named filter
如果在terminal运行
pip install filter
又说
Could not find a version that satisfies the requirement filter (from versions: )
No matching distribution found for filter
求告诉运行方法,谢谢!
200多k的敏感词哪找的?
去哪更新啊?
keywords文件中,10357行有空格,关键字“搞死”、“拉案)”附近
对中文支持不好,python3对utf8支持很好了,建议修改,open(filename,'r',encoding='utf8')
我的项目:https://github.com/toolgood/ToolGood.Words
支持 java,C#, python, js, go
C#语言,使用StringSearchEx2.Replace过滤,在48k敏感词库上的过滤速度超过3亿字符每秒。(cpu i7 8750h)
def parse(self, path):
with open(path, 'rb') as f:
self.keywords = [x.decode('utf8').strip() for x in f.readlines()]
def filter(self, message, repl="*"):
for kw in self.keywords:
message = message.replace(kw, len(kw)*repl)
return message
之前的版本是2.7的我运行不了,我自己根据python3.7改了下NaiveFilter的function。(我也把敏感词那个文件改成txt文件了)
keywords为啥被删除了?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.