Git Product home page Git Product logo

cc0-sentences's People

Contributors

dennis-goldcardtw avatar flyinglimao avatar irvin avatar samittan avatar supaplextw avatar tongcydai avatar typingmonk avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

cc0-sentences's Issues

清理 ChhoeTaigi 資料檔中的重複詞句

https://github.com/moztw/cc0-sentences/blob/master/nan-TW/ChhoeTaigi_iTaigiHoataiTuichiautian_尚待清理.txt

目前上述檔案中有下列約 600 個重複詞句:
重複詞.txt

需進行以下處理:

  • 移除同音詞句:假設重複詞句間的差異羅馬字部分,包含連字符、或者調號的有無,此時請保留拼音較為正確的那一個,並刪掉重複的部分
  • 合併有多重發音的詞句:假設重複的原因來自於一詞有多種發音(含口音或發音差異),請以「xxx (發音1 | 發音2 | 發音3)」等格式合併為一行

幫忙整理 g0v Slack #rand0m 頻道的句子

chatlog 的句子拆開成一句一句,去識別話,移掉冷僻難句,打散順序,然後 commit 到 repo

整理的標準請參考目前檔案中的句子。
現在句庫收錄至截至 2019/6 的範圍,請一次認領一個月。

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.