Git Product home page Git Product logo

chinese2digits's People

Contributors

wall-ee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

chinese2digits's Issues

小数部分的转换有bug

放截图:
image
点后面前几位如果都是零的话将不会处理,猜测应该是不小心将小数部分的匹配和整数部分的匹配一样了,导致开头的零被省略。

互转功能

您好!希望能够提供数字和中文的互转功能。谢谢!

有个识别的BUG,会报错

在识别“我想打电话我要陆兆显” 这句话时,就会报错,只要里面带了人名,人名包含中文数字就报错

特殊字符『〇』识别错误

输入: 国家主席发表二〇二三年新年贺词

期望输出:国家主席发表2023年新年贺词

实际输出:{'inputText': '国家主席发表二〇二三年新年贺词', 'replacedText': '国家主席发表2〇23年新年贺词', 'CHNumberStringList': ['二', '二三'], 'digitsStringList': ['2', '23'], 'errorWordList': [], 'errorMsgList': []}

takeNumberFromString('百/千/万/百万/千万')

{'inputText': '百/千/万/百万/千万',
'replacedText': '百/千/万/百万/千万',
'CHNumberStringList': [],
'digitsStringList': [],
'errorWordList': [],
'errorMsgList': []}

提取出来100/1000/10000/1000000/10000000更好吧?

安装后不能导入

D:\>pip show chinese2digits
Name: chinese2digits
Version: 1.0
Summary: 最好的汉字数字(中文数字)-阿拉伯数字转换工具。包含"点二八""负百分之四十"等众多汉语表达方法。NLP,机器人工程必 备! The Best Tool of Chinese Number to Digits
Home-page: https://github.com/Wall-ee/chinese2digits
Author: Wa-llee
Author-email: [email protected]
License: Apache License 2.0
Location: d:\python\lib\site-packages
Requires:
You are using pip version 9.0.3, however version 19.3.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.

D:\>python
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import chinese2digits as c2d
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'chinese2digits'
>>>

繁体字转换拾时有bug

放截图:
image
这个报错了,感觉可能是因为将拾转换只是当作了一个数值计量单位,并没有处理只有一个它的时候的情况,感觉这里的繁体字替换可能太灵活了,不知道在转换前直接一对一将繁体字直接转换为简体字再处理可不可行。
略述愚见,如果不正确,还请见谅。

一些不合理case反馈

假设我有一个标准的文本:

明天见 我的车牌号是7G02D 白色的INSPIRE

如果使用:takeChineseNumberFromString,这个方法,会先提取文本中的数字,转成中文。

CHNumberStringListTemp = takingChineseDigitsMixRERules.findall(convertedCHString)
['7', '02']
['七', '零二']

会被转换成

[('02', '2', 2), ('7', '7', 1)]

最终转换的结果变成,

明天见 我的车牌号是7G2D 白色的INSPIRE 

你看看这块要不要加一个判断,如果是字母和数字共同出现,不需要转换。如果是\d+[wk],可以转换

convertedCHString = traditionalTextConvertFunc(chText,traditionalConvert)
    """
    字符串 汉字数字字符串切割提取
    正则表达式方法
    """
    # TODO  check digits-alphabet-mixed, skip
    
    CHNumberStringListTemp = takingChineseDigitsMixRERules.findall(convertedCHString)
    print(CHNumberStringListTemp)
    #检查是不是  分之 切割不完整问题
    CHNumberStringListTemp = checkNumberSeg(CHNumberStringListTemp,convertedCHString)
    print(CHNumberStringListTemp)
    #检查末位是不是正负号
    CHNumberStringListTemp = checkSignSeg(CHNumberStringListTemp)
    print(CHNumberStringListTemp)
    #备份一个原始的提取,后期处结果的时候显示用
    OriginCHNumberTake = CHNumberStringListTemp.copy()

还是说建议自己在使用前做好替换,跳过转换,而不是交给这个包做决策

bugs report

issues with some abbreviated statements

  • takeChineseNumberFromString('二千十七')--> 2007
    should be 2017
  • takeChineseNumberFromString('二千七')--> 2007
    should be 2700
  • takeChineseNumberFromString('两千') --> None
    should be 2000

需要单独点一份沙茶酱/点两份沙茶酱的错误识别说明

使用takeChineseNumberFromStr 的默认参数,
“需要单独点一份沙茶酱”会被识别为“需要单独0.1份沙茶酱”
“点两份沙茶酱”的“两”不能被识别成数字,即使将traditional convert 开为true,也还是不能被识别为数字。

事实上,我想将“点一份”识别为“点1份”,“点两份”识别为“点2份”

字典BUG

代码:
import chinese2digits as c2d
q = "一兆韦德"
q = c2d.takeChineseNumberFromString(q)["replacedText"]

会报错:
File "/Users/chinese2digits.py", line 64, in coreCHToDigits
if val >= 10 and i == 0: #应对 十三 十四 十*之类,说明为十以上的数字,看是不是十三这种
TypeError: '>=' not supported between instances of 'NoneType' and 'int'

似乎是因为识别到了一兆但是字典common_used_ch_numerals中没有转换兆的key,希望尽快修改

无法识别300万

digits+chinese混合的情况还是挺常见的,这种数字暂时还不能识别。

有内存泄露的问题和一些BUG

您好!您的工具非常好用 ,但是我在用的时候会出现内存泄露(一直往上涨)的问题。另一方面,如果出现“X(中文)分之X(阿拉伯)(如十分之3,百分之3)”会报错,代码会报错来着...

阿拉伯和中文混合的情况识别问题:”5万3“、”六千7“

阿拉伯和中文混合的情况识别问题:
”5万3“ 识别成了 ['50000', '3'],实际上想要的是 53000
”六千7“ 识别成了 ['6000', '7'],实际上想要的是 6700

但是有时候也能出来想要的结果:

'6千七' 识别成 6700

感觉有点不一致,这种情况是如何控制的呢?

无法导入chinese2digits.c2d,应该怎样解决?

Python 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.25.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: from chinese2digits import c2d

ImportError Traceback (most recent call last)
in
----> 1 from chinese2digits import c2d

ImportError: cannot import name 'c2d' from 'chinese2digits' (c:\python\lib\site-packages\chinese2digits_init_.py)

关于Go版本的异常处理问题

你好~
floatResult, err := strconv.ParseFloat(convertResult, 32)
if err != nil {
panic(err)
}

go版本这一段代码如果超出范围会稳定抛出panic

用户的输入不太好控制 这块是否可以改为返回err或者做出其他异常处理

测试输入为:
10000000000000000000000000000000000000000000连

结果:
panic: strconv.ParseFloat: parsing "10000000000000000000000000000000000000000000": value out of range

github.com/Wall-ee/chinese2digits/chinese2digits.convertDigitsStringToFloat(0xc000028900, 0x2c, 0x2c)
1|main | /root/go/pkg/mod/github.com/!wall-ee/[email protected]/chinese2digits/chinese2digits.go:298 +0x283
1|main | github.com/Wall-ee/chinese2digits/chinese2digits.ChineseToDigits(0xc000028900, 0x2c, 0x1, 0x832a80, 0x9688b0, 0x1, 0x1)
1|main | /root/go/pkg/mod/github.com/!wall-ee/[email protected]/chinese2digits/chinese2digits.go:336 +0xc6f
1|main | github.com/Wall-ee/chinese2digits/chinese2digits.TakeChineseNumberFromString(0xc0000288d0, 0x2c, 0xc001c5f570, 0x4, 0x4, 0xc001c5f588, 0x4)
1|main | /root/go/pkg/mod/github.com/!wall-ee/[email protected]/chinese2digits/chinese2digits.go:793 +0x5d2
1|main | github.com/Wall-ee/chinese2digits/chinese2digits.TakeNumberFromString(0xc0000288d0, 0x2c, 0x0, 0x0, 0x0, 0x2, 0x0)
1|main | /root/go/pkg/mod/github.com/!wall-ee/[email protected]/chinese2digits/chinese2digits.go:860 +0xff

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.