import glob import jieba jieba_userdicts = glob.glob("./jieba_userdict/*")

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

我想順便把字典寫在設定檔的問題解決但 <a class="user-mention notranslate" data-hovercard-type="user"

關於設置字典的問題 about rasa_nlu_chi HOT 9 OPEN

crownpku commented on May 25, 2024

關於設置字典的問題

from rasa_nlu_chi.

Comments (9)

crownpku commented on May 25, 2024 1

@DoubleAix 你這樣做應該是沒問題的。
供你參考：我之前遇到繁體字的統一做法是在所有處理之前全部在程序裡轉為簡體。因為除了jieba以外，還有很多諸如word embedding的資源大都是針對簡體；直接在初始化階段全部轉為簡體可以省去之後的很多麻煩。

from rasa_nlu_chi.

crownpku commented on May 25, 2024 1

@DoubleAix 这个问题最后有解决吗？欢迎你发pull request把代码整合进来。

from rasa_nlu_chi.

DoubleAix commented on May 25, 2024

jieba_defaultdict = glob.glob("./*.big")
if len(jieba_defaultdict) == 0:
print("No Jieba Default Dictionary found")
elif len(jieba_defaultdict) == 1:
print("Setting Jieba Default Dictionary at " + str(jieba_defaultdict[0]))
jieba.set_dictionary(jieba_defaultdict[0])
else:
print("The number of Jieba Default Dictionaries has to be one only")

from rasa_nlu_chi.

crownpku commented on May 25, 2024

沒有很清楚你的需求，你應該只需要把對應的唯一字典放去./jieba_userdict就可以了。

from rasa_nlu_chi.

DoubleAix commented on May 25, 2024

Sorry,我解釋我的狀況
因為預設jieba的主詞典是簡體，因為我這邊是繁體的，所以要利用jieba.set_dictionary將主詞典由dict.txt更換成dict.txt.big(目的是分詞出繁體常用的字詞)
之後才利用jieba.load_userdict添加自己客製的詞典(目的是分詞出特定領域常用的字詞)

from rasa_nlu_chi.

DoubleAix commented on May 25, 2024

那方便整合進去嗎？
因為那個切詞幾乎就決定後面的表現

謝謝！！

from rasa_nlu_chi.

DoubleAix commented on May 25, 2024

沒有，我假日試試看pull request ，我沒有試過在github整合

另外，我看到rasa_nlu的代碼已經把你的tokenizer整合進去，跟這邊的有什麼不一樣呢？

from rasa_nlu_chi.

crownpku commented on May 25, 2024

rasa_nlu整合了最基础的tokenizer部分。这个repo还是希望能在中文NLU上面再多尝试一些事情。

from rasa_nlu_chi.

DoubleAix commented on May 25, 2024

我想順便把字典寫在設定檔的問題解決
但 @crownpku 你目前的版本是 '0.12.0a1'，最新版是'0.12.2'
它們在設定檔上有極大的差異

你要先升級上去，還是我覆蓋掉

from rasa_nlu_chi.

Recommend Projects

關於設置字典的問題 about rasa_nlu_chi HOT 9 OPEN

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent