Git Product home page Git Product logo

Comments (9)

crownpku avatar crownpku commented on May 25, 2024 1

@DoubleAix 你這樣做應該是沒問題的。
供你參考:我之前遇到繁體字的統一做法是在所有處理之前全部在程序裡轉為簡體。因為除了jieba以外,還有很多諸如word embedding的資源大都是針對簡體;直接在初始化階段全部轉為簡體可以省去之後的很多麻煩。

from rasa_nlu_chi.

crownpku avatar crownpku commented on May 25, 2024 1

@DoubleAix 这个问题最后有解决吗?欢迎你发pull request把代码整合进来。

from rasa_nlu_chi.

DoubleAix avatar DoubleAix commented on May 25, 2024

jieba_defaultdict = glob.glob("./*.big")
if len(jieba_defaultdict) == 0:
print("No Jieba Default Dictionary found")
elif len(jieba_defaultdict) == 1:
print("Setting Jieba Default Dictionary at " + str(jieba_defaultdict[0]))
jieba.set_dictionary(jieba_defaultdict[0])
else:
print("The number of Jieba Default Dictionaries has to be one only")

from rasa_nlu_chi.

crownpku avatar crownpku commented on May 25, 2024

沒有很清楚你的需求,你應該只需要把對應的唯一字典放去./jieba_userdict就可以了。

from rasa_nlu_chi.

DoubleAix avatar DoubleAix commented on May 25, 2024

Sorry,我解釋我的狀況
因為預設jieba的主詞典是簡體,因為我這邊是繁體的,所以要利用jieba.set_dictionary將主詞典由dict.txt更換成dict.txt.big(目的是分詞出繁體常用的字詞)
之後才利用jieba.load_userdict添加自己客製的詞典(目的是分詞出特定領域常用的字詞)

from rasa_nlu_chi.

DoubleAix avatar DoubleAix commented on May 25, 2024

那方便整合進去嗎?
因為那個切詞幾乎就決定後面的表現

謝謝!!

from rasa_nlu_chi.

DoubleAix avatar DoubleAix commented on May 25, 2024

沒有,我假日試試看pull request ,我沒有試過在github整合

另外,我看到rasa_nlu的代碼已經把你的tokenizer整合進去,跟這邊的有什麼不一樣呢?

from rasa_nlu_chi.

crownpku avatar crownpku commented on May 25, 2024

rasa_nlu整合了最基础的tokenizer部分。这个repo还是希望能在中文NLU上面再多尝试一些事情。

from rasa_nlu_chi.

DoubleAix avatar DoubleAix commented on May 25, 2024

我想順便把字典寫在設定檔的問題解決
@crownpku 你目前的版本是 '0.12.0a1',最新版是'0.12.2'
它們在設定檔上有極大的差異
image
你要先升級上去,還是我覆蓋掉

from rasa_nlu_chi.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.