Git Product home page Git Product logo

sensicheck's Introduction

SensiCheck

SensiCheck 是一款基于【AC自动机】算法实现的高性能敏感词工具。

在线体验

创作理由&目的

荔知AI助手 提供一个高性能的、开箱即用的敏感词检测工具。

特性

变更日志

点此查看

快速开始

准备

  • JDK17+

  • Maven 3.x+

Maven 引入

<dependency>
    <groupId>io.github.brath1024</groupId>
    <artifactId>sensi-check</artifactId>
    <version>0.0.1</version>
</dependency>

Maven Central Repository

https://central.sonatype.com/artifact/io.github.brath1024/sensi-check

核心方法

SensiCheckUtil 作为敏感词的工具类,核心方法如下:

方法 参数 返回值 说明
check String text String 单字符串检测,默认替换值为"*"
check String text, String replaceValue String 单字符串检测,自定义替换值
check String text, SensiCheckType type String 单字符串检测,自定义过滤策略
check String text, String replaceValue, SensiCheckType type String 单字符串检测,自定义替换值和过滤策略
multiStringChecks List texts List 多字符串检测,默认替换值为"*",默认过滤策略为REPLACE
multiStringChecks List texts, SensiCheckType type List 多字符串检测,自定义过滤策略,默认替换值为"*"
multiStringChecks List texts, String replaceValue List 多字符串检测,自定义替换值,默认过滤策略为REPLACE
multiStringChecks List texts, String replaceValue, SensiCheckType type List 多字符串检测,自定义替换值和过滤策略
contains String value boolean 字符串是否包含敏感词

如何使用?

判断是否包含敏感词

String text = "你妹的";

SensiCheckUtil.contains(text);

检测并替换字符串中的敏感词

String text = "你妹的";

String check = SensiCheckUtil.check(text);

System.out.println(check);

Output:
**的

检测并替换字符串中的敏感词,自定义替换符号

String text = "你妹的";

String check = SensiCheckUtil.check(text, "#");

System.out.println(check);

Output:
##的

检测并替换字符串中的敏感词,使用异常策略

String text = "你妹的";

String check = SensiCheckUtil.check(text, SensiCheckType.ERROR);

System.out.println(check);

Output:
SenException(message=文本内容检测到敏感词,已进行删除处理。为了维护社区网络环境,请不要出现带有敏感政治、暴力倾向、不健康色彩的内容! 可能涉及到的敏感词:[你妹]
, value=你妹的, code=0)

性能测试

环境

不同环境会有差异,但是比例基本稳定
处理器	12th Gen Intel(R) Core(TM) i5-1240P   1.70 GHz
机带 RAM 24.0 GB (23.7 GB 可用)
系统类型 64 位操作系统, 基于 x64 的处理器

测试效果记录

测试数据:100+ 字符串,循环 100W 次

@Test
public void test1() {
    String randomText = RandomUtil.randomString("1234567890bcdefghiJKLMNOPQRSTUVWXYZ你他妈的", 100);

    long start = System.currentTimeMillis();
    for (int i = 0; i < 100_0000; i++) {
        SensiCheckUtil.check(randomText);
    }
    long end = System.currentTimeMillis();

    System.err.println("------------------ COST: " + (end - start));
    //------------------ COST: 1317
}
序号 场景 耗时
1 敏感词,默认替换字符 1317ms,约 70W QPS

已在Nexus发布

image-20231228163256440

sensicheck's People

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.