mifunetoshiro / kanjium Goto Github PK
View Code? Open in Web Editor NEWThe ultimate kanji resource
License: Other
The ultimate kanji resource
License: Other
I'm interested in the type
data from the kanjidict
table.
The first entry I looked at, for the row where kanji="栄"
, has type Pictograph. However the 漢字源 entry for 栄/榮 says 「会意兼形声」 which would be phono-semantic compound.
I haven't gone much deeper, but scanning down the column, there are others marked as Pictograph that look like compounds.
What was your original source for this information?
Thanks so much for this project.
I don't see anything in the readme about where the pitch accent information is sourced from. Would you mind adding that information? Thanks!
In the kanjidict table the kanji: "戸", "所", "扇", "扉", "房", "戻", "扈", "戾" are said to have the radical 戸. However 戸 does not exist as a radical in the radicals table, it only exists as a "radvar" for 戶.
Shouldn't these kanji all use the 戶 radical instead since that's the real radical?
(The kanji 戶 is the only one that lists 戶 as it's radical in kanjidict)
The kanjidict
table has 6505 kanken
kanji:
SELECT count(kanji) FROM kanjidict WHERE kanken IS NOT NULL;
⇒ 6505
However the official 漢検 kanji dictionary only has 5647 distinct kanji plus 642 kyujitai (for 6289 total kanji).
brief introduction: 30yo french software engineer living in Japan 6 years, got jlpt n3 a couple years ago, very bad with kanji learning, working in a japanese company somewhat communicating at business level. Tried many times to learn Kanji, and forgot them quickly as soon as I'd stop for a while (mnemonic-based, radicals+mnemonics, etymology+radicals+mnemonics).
From my experience with kanji learning, I understand that there are 3 schools of learning kanjis:
I now reached a point where I believe the following method to be the most efficient way of learning japanese is through jukugo:
street 道 + path 路 = highway 道路
), learn its kanji by etymology, egA path 路 is paved with rocks to allow each 各 foot 足 to walk backwards
, both bushu are ideograms)each 各 mouth 口 freezes during winter 夂
)That's because learning all the kunyomi and onyomi independently is much harder and less useful than through vocabulary.
As of today, no tool exists that applies this methodology, and building such tool would obviously be a long and tedious task.
Still, I should be able to hack something quickly for my personal usage and using existing tools (eg a script to filter jukugo, a memrise deck and the android kanji study app).
Sorry for this very long post. I simply found your repo and its jukugo.txt file, but it doesn't appear to be sorted by frequency. A frequency list would definitely be a plus for your ultimate kanji resource, do you think you can find it from somewhere ?
resources:
Thank you for your support. Also, any feedback on my thoughts will be appreciate. Who knows, if I find enough like-minded people, that project mentioned above could become a reality at some point
In the sqlite database, the Big5 code point for kanji 畷 is given as de c.05
.
The correct value is DE C5
.
There seems to be some problems with the glyphs in the MultiElements font. They don't show up when using the font on a web page, but when trying to use the (.ttf) font on Android some characters are drawn outside their box.
The screenshots below shows some of the "hen" elements. The red ones are drawn with the MultiElements font. As you can see several are drawn incorrectly (U+706B, 火, is one example).
I've opened MultiElements.ttf in FontForge and noticed that the glyphs that are drawn correctly all have a width of 2048 and are placed completely inside 0 - 2048. Glyphs that doesn't have any width are all drawn incorrectly.
If I modify MultiElements.ttf by assigning all glyphs a width of 2048 and either centered the glyphs in the available width, or placed them slightly to the left the glyphs are rendered correctly on Android.
Hello,
I wanted to see if I could get some clarity on the source of the pitch accent data provided with this repo. The readme references a "free database by Uros O", but I am unclear as to if this free database is this very repo, or if it is referring to another database.
If it's another database, I'd love to know where it is located. The reason for this is that I'd like to look up the original license for the data, as well as look into how the data was generated, what rules were followed, marking regarding the pitch drop following the ends of words in a sentence, etc.
Thanks!
Here are Japanese kanji phonetic components from Remembering the Kanji that are not present in the phonetics
table:
卜刃冊厉弗句㐱争杀丞曳危后赫亜困助貝匊蚤帛阜昏奄炎冥発畏馬扇準芻郭康虚御筑殿爾
Some of these produce groups of only two, but others (like 杀
) have several members.
丫
has Japanese readings ア and あげまき.
面区点: 2-01-06
Unicode: U+4E2B
Is there any way to access the multiple pitch accents of compound words?
For example, with 一子相伝, this is what is displayed in the NHK pitch accent dictionary:
And this is the data in accents.txt
:
term reading accents
一子相伝 いっしそうでん 1,0
So, is there any way to at least know that the word actually consists of multiple pitch accents?
字 | 音読み | current | correct |
---|---|---|---|
絶 | ゼツ | null | 呉音 |
These other readings (from the onyomi
column) are either variants of 音読み, or are 熟字訓・当て字 and should be dropped:
字 | 音読み |
---|---|
王 | ノウ |
音 | ノン |
草 | ゾウ |
日 | ニ |
文 | モ |
北 | ペ |
来 | タイ |
港 | コン |
香 | ホン |
応 | ノウ |
皇 | ノウ |
砂 | ジャ |
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.