Git Product home page Git Product logo

proces's Introduction

Proces

Pypi MIT License stars

🐨 文本预处理。

1 安装

⚠️ 注意:

  1. 本地安装仅支持 Python 的 3.6 以上版本;
  2. 尽可能使用 proces 的最新版本。

使用 pip 安装

pip install proces -U

从代码库安装

git clone https://github.com/Ailln/proces.git

cd proces && python setup.py install

2 使用

from proces import preprocess

# 默认会按照顺序执行,处理空白字符、大写转小写、繁体转简体、全角转半角
result = preprocess("Today, 你 幹 什 麼 !")
# result: today,你干什么!

# 配置 pipeline,比如只去除空白字符
result = preprocess("Today, 你 幹 什 麼 !", pipelines=["handle_blank_character"])
# result: Today,你幹什麼!

# 单独使用子方法
from proces import filter_unusual_characters, filter_
from proces import handle_blank_character
from proces import uppercase_to_lowercase
from proces import traditional_to_simplified
from proces import full_angle_to_half_angle
from proces import handle_substitute

# 删除不常见字符
result = filter_unusual_characters("【你是个恶魔😈啊�】")
# result: 【你是个恶魔啊】
# 也可以使用短方法 filter_
result = filter_("【你是个恶魔😈啊�】")
# result: 【你是个恶魔啊】

# 处理空白字符
result = handle_blank_character("空 白 字 符")
# result: 空白字符
result = handle_blank_character("空 白 字 符", ",")
# result: 空,白,字,符

# 大写转小写
result = uppercase_to_lowercase("UP to low")
# result: up to low

# 繁体转简体
result = traditional_to_simplified("我幹什麼不干你事")
# result: 我干什么不干你事

# 全角转半角
result = full_angle_to_half_angle("你好!")
# result: 你好!

# 替换一些字符
result = handle_substitute("你好!/:-", r"/:-", "表情")
# result: 你好!表情
## 敏感信息过滤
from proces import mask_phone, mask_address

# 过滤手机号
result = mask_phone("手机号 13397238231")
# result: 手机号 133********

# 过滤地址
result = mask_address("我在浙江杭州余杭区")
# result: 我在浙江杭州***

3 TODO

  • add get all methods of preprocess
  • 装饰器

4 许可

proces's People

Contributors

ailln avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar

proces's Issues

No module named 'ruamel'

.tox/py311/lib/python3.11/site-packages/proces/__init__.py:9: in <module>
    from .masking import mask_phone
.tox/py311/lib/python3.11/site-packages/proces/masking.py:3: in <module>
    from proces.util.data import get_city_pattern
.tox/py311/lib/python3.11/site-packages/proces/util/data.py:4: in <module>
    from proces.util.conf import get_yaml
.tox/py311/lib/python3.11/site-packages/proces/util/conf.py:4: in <module>
    from ruamel.yaml import YAML
E   ModuleNotFoundError: No module named 'ruamel'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.