Comments (9)
@DoubleAix 你這樣做應該是沒問題的。
供你參考:我之前遇到繁體字的統一做法是在所有處理之前全部在程序裡轉為簡體。因為除了jieba以外,還有很多諸如word embedding的資源大都是針對簡體;直接在初始化階段全部轉為簡體可以省去之後的很多麻煩。
from rasa_nlu_chi.
@DoubleAix 这个问题最后有解决吗?欢迎你发pull request把代码整合进来。
from rasa_nlu_chi.
jieba_defaultdict = glob.glob("./*.big")
if len(jieba_defaultdict) == 0:
print("No Jieba Default Dictionary found")
elif len(jieba_defaultdict) == 1:
print("Setting Jieba Default Dictionary at " + str(jieba_defaultdict[0]))
jieba.set_dictionary(jieba_defaultdict[0])
else:
print("The number of Jieba Default Dictionaries has to be one only")
from rasa_nlu_chi.
沒有很清楚你的需求,你應該只需要把對應的唯一字典放去./jieba_userdict就可以了。
from rasa_nlu_chi.
Sorry,我解釋我的狀況
因為預設jieba的主詞典是簡體,因為我這邊是繁體的,所以要利用jieba.set_dictionary將主詞典由dict.txt更換成dict.txt.big(目的是分詞出繁體常用的字詞)
之後才利用jieba.load_userdict添加自己客製的詞典(目的是分詞出特定領域常用的字詞)
from rasa_nlu_chi.
那方便整合進去嗎?
因為那個切詞幾乎就決定後面的表現
謝謝!!
from rasa_nlu_chi.
沒有,我假日試試看pull request ,我沒有試過在github整合
另外,我看到rasa_nlu的代碼已經把你的tokenizer整合進去,跟這邊的有什麼不一樣呢?
from rasa_nlu_chi.
rasa_nlu整合了最基础的tokenizer部分。这个repo还是希望能在中文NLU上面再多尝试一些事情。
from rasa_nlu_chi.
我想順便把字典寫在設定檔的問題解決
但 @crownpku 你目前的版本是 '0.12.0a1',最新版是'0.12.2'
它們在設定檔上有極大的差異
你要先升級上去,還是我覆蓋掉
from rasa_nlu_chi.
Related Issues (20)
- AttributeError: 'str' object has no attribute 'get' HOT 1
- invalid model requested. Using default HOT 1
- "error": "No project found with name 'rasa_nlu_test'." HOT 2
- 为什么我识别不到total_word_feature_extractor_zh.dat文件
- 错误Expecting value: line 1 column 1 (char 0) json格式不正确 HOT 2
- 请问如何训练自己需要的语句
- 安装启动server后查询意图报错。 HOT 5
- 按照教程发送请求后,返回"error": "y should be a 1d array, got an array of shape (1, 5) instead." HOT 1
- 下载rasa_nlu_chi之后使用 python setup.py install 报错 HOT 1
- Error found when running python setup.py install HOT 1
- entities must span whole tokens. Wrong entity end. HOT 1
- 我在训练的时候一直报这个错误是什么原因? ImportError: cannot import name 'PROTOCOL_TLS'
- 在测试时为啥这个是null无效的
- ERROR __main__ - bad input shape (1, 5) HOT 9
- 能否添加look up table 在中文下的应用 HOT 8
- 请问intent识别后 怎么进行和rasa core的集成呢
- error sklearn
- UnicodeDecodeError: 'utf-8' codec can't decode byte 0xce in position 3: invalid continuation byte HOT 2
- 您好, 根据教程测试的时候报如下错误 HOT 1
- 支持比较新的rasa版本 HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from rasa_nlu_chi.