file name | comment |
---|---|
data/big5_2003-b2u.txt | 19582字 |
data/big5-bopomofo.txt | 13053字 big5 對應注音符號表 |
data/big5-bopomofo-u8.txt | 13053字 big5 對應注音符號表 save as UTF-8 format |
data/bpmf.dat | stored by zh2bpmf.pl |
data/cp950-b2u.txt | windows codepage 950 big5 to unicode table (13503) |
data/gb2312-normalized-pinyin.txt | (6727) GB2312 |
big5.txt | 13739字 |
unihan.txt | 取出的 big5 字元有 13062 字 (kBigFive 欄位) *1 |
diff-out.txt | cp950-b2u.txt 與 kBigFive 的差異比較檔 (440) |
file name | comment |
---|---|
gb18030_4b.pl | GB18030 vs UCS4 table (39339) |
gb18030_table.pl | GB18030 vs UCS2 table (21873) |
big2003.pl | (19583) big5-2003 |
bigfive2003table.pl | (19582) big5-2003 |
ucs2tab.pl | n/a |
from wiki: 大五碼普遍被認為包含13,053字,但在計算0xA259-0xA261的度量衡單位用字 (兙兛兞兝兡兣嗧瓩糎) ,再減去重收了兩次的「兀」(0xC94A)和「嗀」(0xDDFC)後, 應為13,060字。
可由下列 URL 取得 unihan.txt (updated to v11.0.0)
$ curl -O https://www.unicode.org/Public/zipped/11.0.0/Unihan.zip
$ unzip Unihan.zip
$ grep -i kbigfive unihan.txt > kbigfive.txt
2014-08-18 After fetch unihan unicode 7.0.0, it splits into multiple files.
for documentation: http://www.unicode.org/reports/tr44/