wall-ee / chinese2digits Goto Github PK
View Code? Open in Web Editor NEW最好的汉字数字(中文数字)-阿拉伯数字转换工具。包含"点二八","负百分之四十"等众多汉语表达方法。NLP,机器人工程必备! The Best Tool of Chinese Number to Digits
最好的汉字数字(中文数字)-阿拉伯数字转换工具。包含"点二八","负百分之四十"等众多汉语表达方法。NLP,机器人工程必备! The Best Tool of Chinese Number to Digits
如标题,分别提取了7千 和3百 导致错误识别
大佬,有打算上js或者cpp版本的吗
一百03: 识别错误
识别结果:
10003
期望结果:
103
您好!希望能够提供数字和中文的互转功能。谢谢!
在识别“我想打电话我要陆兆显” 这句话时,就会报错,只要里面带了人名,人名包含中文数字就报错
到扰您一下
会替换成 打扰您1下
是否有地方配置这些不替换的中文词?
输入: 国家主席发表二〇二三年新年贺词
期望输出:国家主席发表2023年新年贺词
实际输出:{'inputText': '国家主席发表二〇二三年新年贺词', 'replacedText': '国家主席发表2〇23年新年贺词', 'CHNumberStringList': ['二', '二三'], 'digitsStringList': ['2', '23'], 'errorWordList': [], 'errorMsgList': []}
复现代码:c2d.takeNumberFromString("拾")
个位会追加1 你循环个10w 试试大部分都对不上 个位个位会追加1
百位数 会增加十位
输入 中文 十万 直接返回一个 0
你看看这个包
https://github.com/pkumza/numcn
{'inputText': '百/千/万/百万/千万',
'replacedText': '百/千/万/百万/千万',
'CHNumberStringList': [],
'digitsStringList': [],
'errorWordList': [],
'errorMsgList': []}
提取出来100/1000/10000/1000000/10000000更好吧?
例如
result = c2d.takeNumberFromString("33.5万")
结果为
{'inputText': '33.5万', 'replacedText': '33.5', 'CHNumberStringList': ['33.5万'], 'digitsStringList': ['33.5'], 'errorWordList': [], 'errorMsgList': []}
D:\>pip show chinese2digits
Name: chinese2digits
Version: 1.0
Summary: 最好的汉字数字(中文数字)-阿拉伯数字转换工具。包含"点二八","负百分之四十"等众多汉语表达方法。NLP,机器人工程必 备! The Best Tool of Chinese Number to Digits
Home-page: https://github.com/Wall-ee/chinese2digits
Author: Wa-llee
Author-email: [email protected]
License: Apache License 2.0
Location: d:\python\lib\site-packages
Requires:
You are using pip version 9.0.3, however version 19.3.1 is available.
You should consider upgrading via the 'python -m pip install --upgrade pip' command.
D:\>python
Python 3.6.5 (v3.6.5:f59c0932b4, Mar 28 2018, 17:00:18) [MSC v.1900 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import chinese2digits as c2d
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ModuleNotFoundError: No module named 'chinese2digits'
>>>
input = "一千八百万"
res = c2d.takeChineseNumberFromString(s, method='normal')
print(res.get("replacedText"))
结果为1008000000
假设我有一个标准的文本:
明天见 我的车牌号是7G02D 白色的INSPIRE
如果使用:takeChineseNumberFromString,这个方法,会先提取文本中的数字,转成中文。
CHNumberStringListTemp = takingChineseDigitsMixRERules.findall(convertedCHString)
['7', '02']
['七', '零二']
会被转换成
[('02', '2', 2), ('7', '7', 1)]
最终转换的结果变成,
明天见 我的车牌号是7G2D 白色的INSPIRE
你看看这块要不要加一个判断,如果是字母和数字共同出现,不需要转换。如果是\d+[wk],可以转换
convertedCHString = traditionalTextConvertFunc(chText,traditionalConvert)
"""
字符串 汉字数字字符串切割提取
正则表达式方法
"""
# TODO check digits-alphabet-mixed, skip
CHNumberStringListTemp = takingChineseDigitsMixRERules.findall(convertedCHString)
print(CHNumberStringListTemp)
#检查是不是 分之 切割不完整问题
CHNumberStringListTemp = checkNumberSeg(CHNumberStringListTemp,convertedCHString)
print(CHNumberStringListTemp)
#检查末位是不是正负号
CHNumberStringListTemp = checkSignSeg(CHNumberStringListTemp)
print(CHNumberStringListTemp)
#备份一个原始的提取,后期处结果的时候显示用
OriginCHNumberTake = CHNumberStringListTemp.copy()
还是说建议自己在使用前做好替换,跳过转换,而不是交给这个包做决策
issues with some abbreviated statements
使用takeChineseNumberFromStr 的默认参数,
“需要单独点一份沙茶酱”会被识别为“需要单独0.1份沙茶酱”
“点两份沙茶酱”的“两”不能被识别成数字,即使将traditional convert 开为true,也还是不能被识别为数字。
事实上,我想将“点一份”识别为“点1份”,“点两份”识别为“点2份”
代码:
import chinese2digits as c2d
q = "一兆韦德"
q = c2d.takeChineseNumberFromString(q)["replacedText"]
会报错:
File "/Users/chinese2digits.py", line 64, in coreCHToDigits
if val >= 10 and i == 0: #应对 十三 十四 十*之类,说明为十以上的数字,看是不是十三这种
TypeError: '>=' not supported between instances of 'NoneType' and 'int'
似乎是因为识别到了一兆但是字典common_used_ch_numerals中没有转换兆的key,希望尽快修改
digits+chinese混合的情况还是挺常见的,这种数字暂时还不能识别。
您好!您的工具非常好用 ,但是我在用的时候会出现内存泄露(一直往上涨)的问题。另一方面,如果出现“X(中文)分之X(阿拉伯)(如十分之3,百分之3)”会报错,代码会报错来着...
阿拉伯和中文混合的情况识别问题:
”5万3“ 识别成了 ['50000', '3'],实际上想要的是 53000
”六千7“ 识别成了 ['6000', '7'],实际上想要的是 6700
但是有时候也能出来想要的结果:
'6千七' 识别成 6700
感觉有点不一致,这种情况是如何控制的呢?
Python 3.9.6 (tags/v3.9.6:db3ff76, Jun 28 2021, 15:26:21) [MSC v.1929 64 bit (AMD64)]
Type 'copyright', 'credits' or 'license' for more information
IPython 7.25.0 -- An enhanced Interactive Python. Type '?' for help.
ImportError Traceback (most recent call last)
in
----> 1 from chinese2digits import c2d
ImportError: cannot import name 'c2d' from 'chinese2digits' (c:\python\lib\site-packages\chinese2digits_init_.py)
你好~
floatResult, err := strconv.ParseFloat(convertResult, 32)
if err != nil {
panic(err)
}
go版本这一段代码如果超出范围会稳定抛出panic
用户的输入不太好控制 这块是否可以改为返回err或者做出其他异常处理
测试输入为:
10000000000000000000000000000000000000000000连
结果:
panic: strconv.ParseFloat: parsing "10000000000000000000000000000000000000000000": value out of range
github.com/Wall-ee/chinese2digits/chinese2digits.convertDigitsStringToFloat(0xc000028900, 0x2c, 0x2c)
1|main | /root/go/pkg/mod/github.com/!wall-ee/[email protected]/chinese2digits/chinese2digits.go:298 +0x283
1|main | github.com/Wall-ee/chinese2digits/chinese2digits.ChineseToDigits(0xc000028900, 0x2c, 0x1, 0x832a80, 0x9688b0, 0x1, 0x1)
1|main | /root/go/pkg/mod/github.com/!wall-ee/[email protected]/chinese2digits/chinese2digits.go:336 +0xc6f
1|main | github.com/Wall-ee/chinese2digits/chinese2digits.TakeChineseNumberFromString(0xc0000288d0, 0x2c, 0xc001c5f570, 0x4, 0x4, 0xc001c5f588, 0x4)
1|main | /root/go/pkg/mod/github.com/!wall-ee/[email protected]/chinese2digits/chinese2digits.go:793 +0x5d2
1|main | github.com/Wall-ee/chinese2digits/chinese2digits.TakeNumberFromString(0xc0000288d0, 0x2c, 0x0, 0x0, 0x0, 0x2, 0x0)
1|main | /root/go/pkg/mod/github.com/!wall-ee/[email protected]/chinese2digits/chinese2digits.go:860 +0xff
Can you send me the Python code for chinese2digits
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.