Git Product home page Git Product logo

kanjium's People

Contributors

gregorbg avatar mifunetoshiro avatar precondition avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kanjium's Issues

Kanji table type column issue

I'm interested in the type data from the kanjidict table.

The first entry I looked at, for the row where kanji="栄", has type Pictograph. However the 漢字源 entry for 栄/榮 says 「会意兼形声」 which would be phono-semantic compound.

I haven't gone much deeper, but scanning down the column, there are others marked as Pictograph that look like compounds.

What was your original source for this information?

Pitch Accent source information

Thanks so much for this project.

I don't see anything in the readme about where the pitch accent information is sourced from. Would you mind adding that information? Thanks!

戸 specified as radical for several kanji instead of 戶in the kanjidict table

In the kanjidict table the kanji: "戸", "所", "扇", "扉", "房", "戻", "扈", "戾" are said to have the radical 戸. However 戸 does not exist as a radical in the radicals table, it only exists as a "radvar" for 戶.

Shouldn't these kanji all use the 戶 radical instead since that's the real radical?

(The kanji 戶 is the only one that lists 戶 as it's radical in kanjidict)

Inaccurate kanken data

The kanjidict table has 6505 kanken kanji:

SELECT count(kanji) FROM kanjidict WHERE kanken IS NOT NULL;
⇒ 6505

However the official 漢検 kanji dictionary only has 5647 distinct kanji plus 642 kyujitai (for 6289 total kanji).

Jukugo frequency list

hi @mifunetoshiro

brief introduction: 30yo french software engineer living in Japan 6 years, got jlpt n3 a couple years ago, very bad with kanji learning, working in a japanese company somewhat communicating at business level. Tried many times to learn Kanji, and forgot them quickly as soon as I'd stop for a while (mnemonic-based, radicals+mnemonics, etymology+radicals+mnemonics).

From my experience with kanji learning, I understand that there are 3 schools of learning kanjis:

  • The japanese scholar way: rote learning by repetition
  • The foreigner way: learning by mnemonics aka fake stories to make the shape & sound memorable. At the next level, the foreigner will learn radicals (including fake radicals = grapheme without meaning nor sound, but shared across several kanji) to make smarter mnemonics.
  • The archaeologist way: learn the etymology of kanji (oracle bone style, bronze ware style, ten style, kyuji and shinji), then make educated guesses as to why they changed the way they did over time (bushu used as ideogram or phonogram, etc).

I now reached a point where I believe the following method to be the most efficient way of learning japanese is through jukugo:

  1. Get a frequency list of jukugo and learn it through classical SRS system.
  2. For each jukugo (eg street 道 + path 路 = highway 道路), learn its kanji by etymology, eg
  3. Reinforce that learning with a mnemonic using its biggest components (A path 路 is paved with rocks to allow each 各 foot 足 to walk backwards, both bushu are ideograms)
  4. If you don't already know about a component, learn it with its radicals (eg each 各 mouth 口 freezes during winter 夂)
  5. Whenever relatively frequent jukugo exist that use 2 kanji previously learnt, jump these jukugo to the top of the learning list to strengthen your knowledge on known kanji.

That's because learning all the kunyomi and onyomi independently is much harder and less useful than through vocabulary.

As of today, no tool exists that applies this methodology, and building such tool would obviously be a long and tedious task.
Still, I should be able to hack something quickly for my personal usage and using existing tools (eg a script to filter jukugo, a memrise deck and the android kanji study app).
Sorry for this very long post. I simply found your repo and its jukugo.txt file, but it doesn't appear to be sorted by frequency. A frequency list would definitely be a plus for your ultimate kanji resource, do you think you can find it from somewhere ?

resources:

Thank you for your support. Also, any feedback on my thoughts will be appreciate. Who knows, if I find enough like-minded people, that project mentioned above could become a reality at some point

畷 code point

In the sqlite database, the Big5 code point for kanji 畷 is given as de c.05.

The correct value is DE C5.

Many glyphs in MultiElements.ttf is missing width

There seems to be some problems with the glyphs in the MultiElements font. They don't show up when using the font on a web page, but when trying to use the (.ttf) font on Android some characters are drawn outside their box.

The screenshots below shows some of the "hen" elements. The red ones are drawn with the MultiElements font. As you can see several are drawn incorrectly (U+706B, 火, is one example).

image

I've opened MultiElements.ttf in FontForge and noticed that the glyphs that are drawn correctly all have a width of 2048 and are placed completely inside 0 - 2048. Glyphs that doesn't have any width are all drawn incorrectly.

If I modify MultiElements.ttf by assigning all glyphs a width of 2048 and either centered the glyphs in the available width, or placed them slightly to the left the glyphs are rendered correctly on Android.

Question regarding source of pitch accent data

Hello,

I wanted to see if I could get some clarity on the source of the pitch accent data provided with this repo. The readme references a "free database by Uros O", but I am unclear as to if this free database is this very repo, or if it is referring to another database.

If it's another database, I'd love to know where it is located. The reason for this is that I'd like to look up the original license for the data, as well as look into how the data was generated, what rules were followed, marking regarding the pitch drop following the ends of words in a sentence, etc.

Thanks!

Additional phonetics

Here are Japanese kanji phonetic components from Remembering the Kanji that are not present in the phonetics table:

卜刃冊厉弗句㐱争杀丞曳危后赫亜困助貝匊蚤帛阜昏奄炎冥発畏馬扇準芻郭康虚御筑殿爾

Some of these produce groups of only two, but others (like ) have several members.

Missing kanji: 丫

has Japanese readings ア and あげまき.

面区点: 2-01-06
Unicode: U+4E2B

Pitch accent of compound words?

Is there any way to access the multiple pitch accents of compound words?

For example, with 一子相伝, this is what is displayed in the NHK pitch accent dictionary:
image

And this is the data in accents.txt:

term     reading	accents
一子相伝  いっしそうでん  1,0

So, is there any way to at least know that the word actually consists of multiple pitch accents?

Onyomi data issues

音読み current correct
ゼツ null 呉音

These other readings (from the onyomi column) are either variants of 音読み, or are 熟字訓・当て字 and should be dropped:

音読み
ノウ
ノン
ゾウ
タイ
コン
ホン
ノウ
ノウ
ジャ

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.