Git Product home page Git Product logo

kesi's Introduction

KeSi

PyPI version Build Status Quality Gate Status

Tâi-bûn NLP ke-si.

Tàu

pip install KeSi

Iōng

Ku, TuiBeTse, normalize_taibun, kam_haphuat, PIAUTIAM

Ku

分析台文,而且做書寫轉換。

class Ku(hanlo=None, lomaji=None)

建立台文ê句,做相關操作。 hanlo是主要ê台文,ē-tàng傳漢羅、全漢、全羅攏會用得。若台文有全羅對照,ē-tàng傳lomaji變數,kui-ê句會照lomaji來斷詞、標輕聲。若是hanlo kah lomaji字數bô-kâng,會傳TuiBeTse例外。

hanji

得tio̍h tshiâu過ê台文,有tshiâu khàng-pe̍h、Unicode NFC、教育部造字碼換做正式Unicode碼。其中若輕聲詞攏有輕聲符。

lomaji

得tio̍h tshiâu過ê羅馬字,有tshiâu khàng-pe̍h、Unicode NFC、教育部造字碼換做正式Unicode碼。其中若輕聲詞攏有輕聲符。

kiphanlo

得tio̍h tshiâu過ê台文,有tshiâu khàng-pe̍h、Unicode NFC、教育部造字碼換做正式Unicode碼。其中若輕聲詞頭字是漢字,袂有輕聲符。

KIP(), TL()

換做正式教育部羅馬字。

KIP數字調轉KIP:

>>> from kesi import Ku
>>> Ku("Gâu5-tsa2").KIP().hanlo
'Gâu-tsá'

POJ轉KIP:

>>> from kesi import Ku
>>> Ku("Gâu-chá").KIP().hanlo
'Gâu-tsá'

漢字、連字符、輕聲符lóng會好好留落來。

>>> from kesi import Ku
>>> Ku("看--起-來chiâⁿ媠。").KIP().hanlo
'看--起-來tsiânn媠。'

修改記錄:1.4.3版以前POJ轉KIP函式號做TL();1.5.0版以後改號做KIP(),tsit-má兩款函式lóng支援。未來KIP()會取代TL()。

POJ()

換做白話字。

KIP轉POJ:

>>> from kesi import Ku
>>> Ku("Gâu-tsá").POJ().hanlo
'Gâu-chá'

漢字、連字符、輕聲符lóng會好好留落來。

>>> from kesi import Ku
>>> Ku("看--起-來tsiânn媠。").POJ().hanlo
'看--起-來chiâⁿ媠。'

POJ數字調轉POJ:

>>> from kesi import Ku
>>> Ku("Gâu5-cha2").POJ().hanlo
'Gâu-chá'

iter()

回傳句內下底全部Suêiter

len()

回傳句內下底有幾ê Su

thianji()

回傳句內下底全部Jiêiter

class Su

hanji

得tio̍h tshiâu過ê台文。其中若輕聲詞攏有輕聲符。

lomaji

得tio̍h tshiâu過ê羅馬字。其中若輕聲詞攏有輕聲符。

kiphanlo

得tio̍h tshiâu過ê台文。其中若輕聲詞頭字是漢字,袂有輕聲符。

KIP(), TL()

換做正式教育部羅馬字。

修改記錄:1.4.3版以前POJ轉KIP函式號做TL();1.5.0版以後改號做KIP(),tsit-má兩款函式lóng支援。未來KIP()會取代TL()。

POJ()

換做白話字。

iter()

回傳句內下底全部Jiêiter

len()

回傳句內下底有幾ê Ji

class Ji

hanji

得tio̍h tshiâu過ê台文。其中若輕聲詞攏有輕聲符。

lomaji

得tio̍h tshiâu過ê羅馬字。其中若輕聲詞攏有輕聲符。

kiphanlo

得tio̍h tshiâu過ê台文。其中若輕聲詞頭字是漢字,袂有輕聲符。

KIP(), TL()

換做正式教育部羅馬字。

修改記錄:1.4.3版以前POJ轉KIP函式號做TL();1.5.0版以後改號做KIP(),tsit-má兩款函式lóng支援。未來KIP()會取代TL()。

POJ()

換做白話字。

class TuiBeTse

Ku(hanlo, lomaji)hanlo kah lomaji字數bô-kâng ê時,回傳ê例外。

def normalize_taibun(taibun)

有tshiâu Unicode NFC、教育部造字碼換做正式Unicode碼。

>>> from kesi import normalize_taibun
>>> normalize_taibun('a\u0301') == '\u00e1'
True
>>> normalize_taibun('\u00e1') == '\u00e1'
True

def kam_haphuat(tsit_ji_lomaji)

判斷tsit_ji_lomaji敢是合法教育部羅馬字抑是白話字。若是數字調、調符、教育部傳統版,攏會當做合法。

>>> from kesi import kam_haphuat
>>> kam_haphuat('tsiânn')
True
>>> kam_haphuat('tsiann5')
True
>>> kam_haphuat('chiâⁿ')
True
>>> kam_haphuat('tsiâⁿ')
True

PIAUTIAM

含半型、全型標點符號ê set()

其他

算字數

$ echo '我是Tâi-gí ê ke-si' | python le/sng_jisoo.py
# 字數= 7

Khai-huat

tox -e behave

kesi's People

Contributors

sih4sing5hong5 avatar niauah avatar a8568730 avatar

Stargazers

潘科元 avatar Gianni Hong avatar Kisaragi Hiu avatar  avatar Patrick Lu avatar

Watchers

James Cloos avatar  avatar  avatar Kostas Georgiou avatar

kesi's Issues

Phe ê jī-sòo bô-kâng

Hàn-jī khah tn̂g.

我是董--ê。
Guá táng--ê.

Lô-má-jī khah tn̂g.

我董--ê。
Guá sī táng--ê.

Ku('bere3').POJ() 撇做 bère

>>> from kesi import Ku
>>> Ku('bere3').TL().lomaji
'berè'

>>> Ku('bere3').POJ().lomaji
'bère'

Iri mā án-ne:

>>> Ku('Chhirinn2').TL().lomaji
'Tshirínn'

>>> Ku('Chhirinn2').POJ().lomaji
'Chhíriⁿ'

修bdist_wheel 設定

#32 有結論ài wheel,
Tsit-má CI 有設定煞bô出來wheel

CI設定

distributions: sdist bdist_wheel

結果

https://travis-ci.com/github/i3thuan5/KeSi/jobs/484454597#L304-L336

Deploying application
running sdist
running check
warning: sdist: standard file not found: should have one of README, README.txt
reading manifest template 'MANIFEST.in'
writing manifest file 'MANIFEST'
creating KeSi-1.1.0
creating KeSi-1.1.0/kesi
creating KeSi-1.1.0/kesi/butkian
creating KeSi-1.1.0/kesi/susia
making hard links in KeSi-1.1.0...
hard linking LICENSE -> KeSi-1.1.0
hard linking README.md -> KeSi-1.1.0
hard linking panpun.py -> KeSi-1.1.0
hard linking setup.py -> KeSi-1.1.0
hard linking kesi/__init__.py -> KeSi-1.1.0/kesi
hard linking kesi/butkian/__init__.py -> KeSi-1.1.0/kesi/butkian
hard linking kesi/butkian/ji.py -> KeSi-1.1.0/kesi/butkian
hard linking kesi/butkian/kongiong.py -> KeSi-1.1.0/kesi/butkian
hard linking kesi/butkian/ku.py -> KeSi-1.1.0/kesi/butkian
hard linking kesi/butkian/su.py -> KeSi-1.1.0/kesi/butkian
hard linking kesi/susia/POJ.py -> KeSi-1.1.0/kesi/susia
hard linking kesi/susia/TL.py -> KeSi-1.1.0/kesi/susia
hard linking kesi/susia/__init__.py -> KeSi-1.1.0/kesi/susia
hard linking kesi/susia/kongke.py -> KeSi-1.1.0/kesi/susia
hard linking kesi/susia/pio.py -> KeSi-1.1.0/kesi/susia
creating dist
Creating tar archive
removing 'KeSi-1.1.0' (and everything under it)
Uploading distributions to https://upload.pypi.org/legacy/
Uploading KeSi-1.1.0.tar.gz
100%|██████████| 12.8k/12.8k [00:01<00:00, 11.9kB/s]

Ku('一', 'tsi̍t') 無法度轉 TL

漢羅文 kah 羅馬字對照 ē-īng-eh kā 轉做其他書寫

>>> Ku('一', 'tsi̍t').TL()

Exception of tsuanTL():   tsit-khuán im-tsiat

Tàn hun-sû kiàn-li̍p Kù 擲分詞建立句

Hun-sû sī sannh?

逐-家|tak8-ke1 做-伙|tso3-hue2 來|lai5 𨑨-迌|tshit4-tho5

Sī án-ná su-iàu hun-sû?

Huan-i̍k, piān-sik ài khuànn ta̍k-ê sû ê hàn-lô tuì-tsiàu.

藉詞建立句物件

  Scenario: 藉詞建立句物件
    Given 兩句 <hanlo> kah <lomaji> 做伙建立一 ê 句物件
     Then hanlo是 <kiatko_hanlo>
      And 選擇   "2" ê 詞斷做兩句
     Then 第1句taibun是 "我是"
      And 第2句taibun是 "Ke-si"

數字0去hoo當做調號

ku = Ku(句物件.看語句(), 句物件.看音()).TL()
  File "/usr/local/lib/python3.8/site-packages/kesi/butkian/ku.py", line 156, in TL
    sin_ku.thiam(su.TL())
  File "/usr/local/lib/python3.8/site-packages/kesi/butkian/su.py", line 85, in TL
    sin_su.thiam(ji.TL())
  File "/usr/local/lib/python3.8/site-packages/kesi/butkian/ji.py", line 45, in TL
    tsuanTL(hanlo), tsuanTL(lomaji),
  File "/usr/local/lib/python3.8/site-packages/kesi/susia/TL.py", line 10, in tsuanTL
    siann, un, tiau, tuasiosia = thiah(bun)
  File "/usr/local/lib/python3.8/site-packages/kesi/susia/POJ.py", line 60, in thiah
    siannun, tiau = theh_sianntiau(lomaji)
  File "/usr/local/lib/python3.8/site-packages/kesi/susia/POJ.py", line 76, in theh_sianntiau
    return nfd[:-1], tiauho_tsuan_tiauhu(nfd[-1])
  File "/usr/local/lib/python3.8/site-packages/kesi/susia/POJ.py", line 127, in tiauho_tsuan_tiauhu
    return TIAUHO_TIAUHU_PIO[tiauho]
KeyError: '0'

Exception of tsuanTL(): Bô tsit-khuán im-tsiat 應該ài顯示是啥物音節

這警告根本 m̄ 知是 tá 1 ê 音節 m̄-tio̍h...

$ python -m unittest tshi.test詞表試驗.Supio.test_有小寫
Exception of tsuanTL():  Bô tsit-khuán im-tsiat
E
======================================================================
ERROR: test_有小寫 (tshi.test詞表試驗.Supio)
----------------------------------------------------------------------
...
----------------------------------------------------------------------
Ran 1 test in 0.005s

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.