Git Product home page Git Product logo

kanjivg's Introduction

kanjivg's People

Contributors

agatti avatar bdusell avatar benkasminbullock avatar captaindario avatar cayennes avatar chrisvasselli avatar eichhirn avatar gnurou avatar melissaboiko avatar nix6839 avatar ospalh avatar scriptin avatar sebimoe avatar siikamiika avatar syt0r avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kanjivg's Issues

Two possible fixes for 理/07406-KaishoVtLst.svg

The strokes 9, 10 and 11 are inconsistent.
I don't know what would be the right way to fix this. Maybe we should split this into two variants.

  • 1954a22, the VtLstVtLst variant. Draw the vertical stroke last on both halves.
  • 7ad9792, the VtLstVt5 variant. The vertical is drawn last in the left half, but in the right half the vertical is draw as its fifth stroke.

Btw, the strokes 5 and 6 i have corrected in another commit, c650ed5.

蛻 strokes

Strokes 7 and 8 are put as different types in the XML and the SVG in kanji 86fb.

Some characters' stroke numbers don't match stroke order

I've found some characters where the stroke numbers are placed next to different strokes than the order defined by the order of the stroke paths:

共: 1 and 2 are swapped
主-VtLst: 3 and 4 are swapped.
住-VtLst: 1 and 2 are swapped and 6 and 7 are swapped
庫: Strokes 2 and 3
炭: Strokes 4 and 5

I intend to figure out how to use github at some point, so I can submit actual changes--I'm assuming it's the order of the path elements that's correct--but I thought I'd mention what I've found in case I don't get around to it.

Is it correct to edit this item when I find more or open new issues?

璢, 澑.

It's more logical to have the two vertical lines come first in the right part, like how it is in kakijun. z-one March 01, 2011, at 11:56 PM

潟 stroke 6 should be written from left to right

There are two stroke orders I found for radical 臼:
A. http://kakijun.main.jp/page/usu200.html
B. as a component in http://sugiura5.gsid.nagoya-u.ac.jp/cgi-bin/komori/ww2k.cgi?code=1967

Kakijun says for 臼:
三画目、四画目の順序は逆でも構いません
(It's OK for 3-rd, 4-th strokes to be reversed)
A similar thing is written for 潟 (6, 7 can be reversed).

I think the stroke order can be left as it is (in order B) but the 6-th stroke should be written from left to right.

Check for changes in JIS X 0213:2004

There have been 168 form changes in JIS X 0213:2004. Kanjivg seems to be based on JIS X 0208 and doesn't cover them.
Wikipedia has a list of the items:
http://ja.wikipedia.org/wiki/JIS_X_0213#JIS_X_0213:2004.E3.81.AE.E6.94.B9.E6.AD.A3
Jis x 013 standard:
http://www.itscj.ipsj.or.jp/ISO-IR/233.pdf
The older Jis x 208:
http://www.itscj.ipsj.or.jp/ISO-IR/087.pdf

A table of the characters is given below.

Practical note:
On Windows systems, MS Meiryo should be able to display them correctly, while MS Mincho displays the old forms (probably depends on version).

The characters are:
5026 倦, 50C5 僅, 5132 儲, 514E 兎, 51A4 冤, 537F 卿, 53A9 厩, 53C9 叉
53DB 叛, 53DF 叟, 54AC 咬, 54E8 哨, 55B0 喰, 5632 嘲, 5642 噂, 564C 噌
56C0 囀, 5835 堵, 5A29 娩, 5C51 屑, 5C60 屠, 5DF7 巷, 5E96 庖, 5EDF 廟
5EFB 廻, 5F98 徘, 5FBD 徽, 6062 恢, 6108 愈, 6241 扁, 633A 挺, 633D 挽
6357 捗, 6372 捲, 63C3 揃, 647A 摺, 64B0 撰, 64E2 擢, 65A7 斧, 6666 晦
6753 杓, 6756 杖, 6897 梗, 68D8 棘, 6962 楢, 696F 楯, 698A 榊, 6994 榔
69CC 槌, 6A0B 樋, 6A3D 樽, 6A59 橙, 6ADB 櫛, 6B4E 歎, 6C72 汲, 6DEB 淫
6EA2 溢, 6EBA 溺, 6F23 漣, 7015 瀕, 701E 瀞, 7026 瀦, 7058 灘, 7078 灸
707C 灼, 7149 煉, 714E 煎, 717D 煽, 723A 爺, 724C 牌, 7259 牙, 727D 牽
72E1 狡, 7337 猷, 7511 甑, 7515 甕, 7526 甦, 75BC 疼, 77A5 瞥, 7941 祁
7947 祇, 795F 祟, 79B0 禰, 79E4 秤, 7A17 稗, 7A7F 穿, 7AC8 竈, 7B08 笈
7B75 筵, 7BAD 箭, 7BB8 箸, 7BC7 篇, 7BDD 篝, 7C3E 簾, 7C7E 籾, 7C82 粂
7FEB 翫, 7FF0 翰, 8171 腱, 817F 腿, 818F 膏, 8258 艘, 8292 芒, 82A6 芦
8328 茨, 845B 葛, 84EC 蓬, 8511 蔑, 853D 蔽, 85A9 薩, 85AF 薯, 85F7 藷
8654 虔, 86F8 蛸, 8703 蜃, 8755 蝕, 87F9 蟹, 8805 蠅, 8956 襖, 8A0A 訊
8A1D 訝, 8A3B 註, 8A6E 詮, 8AB9 誹, 8AFA 諺, 8B0E 謎, 8B2C 謬, 8C79 豹
8CED 賭, 8FBB 辻, 8FBF 辿, 8FC2 迂, 8FC4 迄, 8FE6 迦, 9017 逗, 9019 這
9022 逢, 903C 逼, 9041 遁, 905C 遜, 9061 遡, 912D 鄭, 914B 酋, 91DC 釜
9306 錆, 9375 鍵, 939A 鎚, 9453 鑓, 9699 隙, 9744 靄, 9771 靱, 9784 鞄
9798 鞘, 97AD 鞭, 98F4 飴, 9905 餅, 990C 餌, 9910 餐, 9957 饗, 99C1 駁
9A19 騙, 9BAB 鮫, 9BD6 鯖, 9C2F 鰯, 9C52 鱒, 9D09 鴉, 9D60 鵠
(all in kanjivg) and
5C62 屢 (not in kanjivg)

The following among these are Jouyou-Kanji and may have higher priority:
50C5 僅, 5632 嘲, 6357 捗, 6897 梗, 6DEB 淫, 6EBA 溺, 714E 煎, 7259 牙
7BB8 箸, 8328 茨, 845B 葛, 8511 蔑, 853D 蔽, 8A6E 詮, 8B0E 謎, 8CED 賭
905C 遜, 9061 遡, 91DC 釜, 9375 鍵, 9699 隙, 9905 餅, 990C 餌

Misplaced and missing stroke numbers

There are a number of files where the order of strokes and numbers does not match. In the majority of cases, it is the numbers that are placed incorrectly, although there might be a small number of cases where the actual stroke order is wrong.

In 274 files some or all of the stroke numbers are missing. This affects all kana and latin characters. A majority of them was probably lost in the conversion to the combined xml/svg format ("kanji" directory). They still exist in the old "SVG" directory:
https://github.com/KanjiVG/kanjivg/tree/5e8ff1bed36d8e11866f83b67f5f5b5e5af384e0

The topic of misplaced stroke numbers was discussed a while ago in
https://groups.google.com/forum/?fromgroups=#!topic/kanjivg/-0qmqfLj_aE
Repeating my last post there:

  1. My stroke number placement checker reports 1644 files as possibly buggy, among them
    Stroke swaps = 1503 Wrong stroke direction = 542 Missing numbers = 274
    The output of the Stroke number placement checker is here:
    https://gist.github.com/3132779
  2. Another program of mine that places stroke numbers automatically can be used to check the manual stroke numbers. The output is here:
    https://docs.google.com/open?id=0B-TA0GJ6dksVVnRiZExtTkdpSU0

This issue also covers #34, #30, #29, #28, #27, #19, #15, #14 (partial), #9, #8, #6, #2.

XML namespace undefined.

The kanjivg.xml file produced by kvg.py produces an error when you try to parse it, for example with Pythons ElementTree, xml.etree.ElementTree.parse(kanjiVgFile) (with the file open and xml imported): “xml.etree.ElementTree.ParseError: unbound prefix: line 425, column 0

The reason appears to be that the “kvg:” XML namespace isn’t defined.
Adding a xmlns to the kanjivg tag seems to help:

index 5ad048b..85dd1e6 100755
--- a/kvg.py
+++ b/kvg.py
@@ -75,7 +75,7 @@ def release():
        out.write(licenseString)
        out.write("\nThis file has been generated on %s, using the latest KanjiVG data\nto this date." % (dat
        out.write("\n-->\n")
-       out.write("<kanjivg>\n")
+       out.write('<kanjivg xmlns:kvg="http://kanjivg.tagaini.net/format.html">\n')
        for f in files:
                data = open(os.path.join(datadir, f)).read()
                data = data[data.find("<svg "):]

(See the W3C recommendation for tons of boring details on XML namespaces.)

Missing dots

Some latin letters and punctuation marks (! . ; i j ?) have missing dots:
00021
0002e
0003a
0003b
00069
0006a
030fb
0ff01
0ff1a
0ff1f
(4 of them already reported in issue #36)

Latin letters, punctuation, kana should be centered

These characters look like they are left-justified now. This affects files in range
00021-0007a (latin letters and punctuation)
03041-03096 (hiragana)
030a1-030fa (katakana)
0ff01-0ff1f (full-width latin punctuation)

Note that there are only 3 files in the full-width latin range (?) Wouldn't it make more sense to use only the full-width range and drop the standard latin range? As I understand it, full-width letters are drawn on a square cell like kanji, and they are used to fit western and characters into eastern typesetting. Normal latin letters are drawn on a rectangular cell.
(Example: JRJR)

隻 unordered

It seems to be OK but the strokes are unordered and does not correspond with the numbers.

kvg:type and other attributes wrong.

I fixed the stroke numbers for 漕/06f15-KaishoVtLst.svg and 膿/081bf-KaishoVtLst.svg and while doing so noticed that some of the kvg:NN attributes not only in those files but also in the one i used as reference, 糟/07cdf-KaishoVtLst.svg are wrong.

Some strokes that are horizontal have kvg:type="㇑", that is, the file claims they are vertical. The grouping (<g kvg:element="NN">...</g>) doesn't make a lot of sense, either.

I think this may effect several kanji variants with an "曲" element with the two verticals drawn last. That seem to be these:

05102-KaishoVtLst.svg
066f2-KaishoVtLst.svg
066f9-KaishoVtLst.svg
069fd-KaishoVtLst.svg
06f15-KaishoVtLst.svg
06fc3-KaishoVtLst.svg
079ae-KaishoVtLst.svg
07cdf-KaishoVtLst.svg
081bf-KaishoVtLst.svg
0825a-KaishoVtLst.svg
08276-KaishoVtLst.svg
08c4a-KaishoVtLst.svg
08ec6-KaishoVtLst.svg
08fb2-KaishoVtLst.svg
0906d-KaishoVtLst.svg
091b4-KaishoVtLst.svg
09c67-KaishoVtLst.svg

I don't want to muck around in there to fix them all by hand. (Some seem to be OK. I din't check all of them.)
Maybe this could be done by a (somewhat ad-hoc-ish) script of some sort.

虐 wrong stroke order

Strokes 8, 9 should be interchanged.

http://www.sp.cis.iwate-u.ac.jp/icampus/u/akanji.jsp?k=%E8%99%90
http://www.yookoso.com/pages/kanji.php?file=display&jisdec=13652
http://sugiura5.gsid.nagoya-u.ac.jp/cgi-bin/komori/ww2k.cgi?code=2152
http://kakijun.main.jp/page/09190200.html

This character seems to be a variation of
http://www.mdbg.net/chindict/chindict.php?page=chardict&cdcanoce=0&cdqchi=%E8%99%90
in which the last horizontal stroke intersected the vertical line and was written last. In the current Japanese character it does not intersect and the stroke order has changed.

期 component

shouldn't the right component be 月 instead of 肉? (better fit with the meaning)

Kanas are not centered

I noticed that hiragana & katakana defined by KanjiVG are not centered in their square. They appear on the left of the square while the kanji are perfectly centered.

半 Stroke #2 out of sequence

The Tangorin project uses KanjiVG to render stroke order diagrams. When I looked up 半 there, I found that it draws the second stroke—the upper-right diagonal stroke—last, with all other strokes in their proper sequence. Oddly enough, when I looked at 半 on jisho.org (another project which uses KanjiVG), it has the correct stroke order.

63b4+848b Jinmei have codepoints of their own

The files need to be renamed and un-jinmeied, since they have their own unicode codepoints.
63b4-jinmei is 6451 and
848b-jinmei is 8523.
This is a random find, I didn't do a systematic check of the jinmei variants.

Merge strokes 9, 10 into one in 搜 and 嫂

The mirrored strokes resembling "E" are connected in these two kanji; the strokes 9 and 10 should be merged into one. When the two "E" are separated, they must be written first whole left E then whole right E, so the order is definitely wrong.

These two kanjis seem to be the ones that did not undergo a change with JIS X0213/2004 (others, like 溲, 艘, 叟 had the mirrored "E"s separated).

The only reference I have:
http://kakijun.main.jp/page/u_j064200.html
http://kakijun.main.jp/page/soua13200.html

攻 and all similar

even if the radical is 工 (48) 攴 (66) does it not make more sense to use the so-called 攵 nobun/ノ-文 variant as the Component? Side note why do so many components not show up in component search display ;_; additional ones will show up depending on previous components selected. Also maybe should note that some users may be used to different styles of chinese characters. Such as 令 as mentioned above. (interesting fact calligraphy is the only legal use of traditional characters in China) Tae August 21, 2010, at 07:02 AM

授 Numbers 2 and 3 swapped

The stroke paths are correct. But the text numbers are wrong: 2 should be on the down stroke, 3 on the lower diagonal stroke on the left.

viewer.py imports from missing createsvgfiles

createsvgfiles.py was deleted in commit f056bce

I have no clue if viewer.py is useful for anything or if it is obsolete too, just tried to run it and was puzzled by missing reference which google knows nothing about :)

Some kanji to check

澀 Is the 16 strokes variant more used than the 17 strokes variant? z-one March 01, 2011, at 04:41 AM
梍 The last part is either 七 or 匕. Currently 匕 is shown in the structure but then shouldn't the stroke direction be the opposite? z-one March 01, 2011, at 12:16 AM
懋 The top middle element is the first one in according to kakijun. z-one February 27, 2011, at 11:25 AM
顏 The stroke order for the part that looks like an X is most probably mixed, because in other characters the order is always the opposite. z-one February 22, 2011, at 02:26 AM
Many kanji that include 瓦 (hex codes: 0x74e7 0x74e9 0x74ee 0x74f0 0x74f1 0x74f2 0x74f7 0x74f8 0x7503 0x7504 0x7505 0x750c 0x750d 0x750e 0x7511 0x7513 0x7515). Stroke 2 of the component should maybe be split into two distinct strokes.
first radical of 捌 is inconsistent with the above/below kanji
in 喚, maybe ㇟a should be ㇟a/㇏
Characters like 餅 and 遡 don't show JIS X 0213:2004 standard forms. This is a case of being old rather than being wrong, but they ought to be updated eventually. December 16, 2010, at 07:20 AM
筆 the 6th stroke is incorrect. November 2, 2010
爨 first the top left, then the top middle and then the top right. z-one August 23, 2010, at 08:52 PM
叟 is 10 strokes and not 9. Because of this error the kanji 艘 is missing a stroke. (The last stroke in the list does nothing.) z-one April 17, 2010, at 10:20 PM
The stroke order of kanji like 羸, 蠃 and 贏 that has 月 and 凡 might not have the correct stroke order. The middle element between these two should come first. z-one April 15, 2010, at 04:08 AM
尨 The top dot of 尤 should come last. z-one April 15, 2010, at 04:08 AM Need to be double-checked Gnurou April 22, 2010, at 05:11 PM
慥 The part that looks like 牛 should follow the part's stroke order with the two horizontal lines first. z-one April 15, 2010, at 04:46 AM Doubt it - according to kakijun it is, but the same site agrees with KanjiVG for the stroke order of the right component. Gnurou April 22, 2010, at 05:11 PM
陞 The part in top right corner might not have the correct stroke order, as it differs from 升. z-one April 21, 2010, at 01:00 PM
襾 (and kanji containing it, like 覊), the short horizontal stroke at the middle should be last in my opinion (though it is a bit difficult to confirm). z-one January 16, 2010, at 04:18 AM
黽 As far as I know the 5. and 6. strokes are swapped, and the 7. and 8. strokes as well z-one January 12, 2010, at 07:26 AM
縛 and 博: their component is said to be 尃, but shouldn't it be 専 instead? Gnurou October 19, 2009, at 08:57 AM
滿 and compounds Gnurou October 19, 2009, at 08:28 AM
厖 Gnurou June 10, 2009, at 10:06 PM
叟 and all kanji that use it - are there two ways to draw this kanji? Gnurou June 10, 2009, at 10:05 PM No, looks like the way used in 搜 is the right one, as the XML data and other sources seem to confirm.
頃 Gnurou June 10, 2009, at 10:05 PM
嚢 Gnurou June 10, 2009, at 10:05 PM
鋏 Gnurou June 10, 2009, at 10:05 PM
令 and all kanji using it: bottom part looks different on some fonts? Which one is right? Gnurou June 10, 2009, at 10:05 PM
禸 and all kanji that use it: missing component information for ム and 冂? Gnurou June 24, 2009, at 12:23 AM
As reported by a Tagaini user: there is aproblem with the keys: 黑 and 黒. they are the same key but they are listed differently. i.e:黙 is listed only under 黑 Gnurou July 13, 2009, at 01:31 AM
飴 the 3rd stroke is incorrect (vertical, but should be horizontal) and the 7th and 8th stroke are wrong eno October 08, 2009, at 08:56 AM

Stroke numbers lost for kana and latin letters, as well as dots in latin characters

As reported by Jan Eichhorn:

Maybe the new thing is that some where lost in the xml=>svg unification. Kanji are affected too. There is also a smaller problem with dots missing in latin characters (ij.;!). These are already missing in the first version found on github. I don't know what came before that. If the dots have ever existed, they might have been a victim of conversion too (someone assuming there can be no "circle" elements).

Missing official Jouyou characters 剝 (0x525D) and 0x20B9F

剝 (0x525D) and 0x20B9F are given as the main variant in the official jouyou list (other variants are allowed). They are variants of the much more common
剥 (0x5265)
叱 (0x53F1)
The differences are in the pig snout radical 彑 and the spoon radical 匕. For both I couldn't find any match of the required variant in kanjivg.

Static Download Link

Would it be possible to change the package download links to a static one? Or create separate links? This would make it easier to sync to the latest. All the other dictionary sources like jmdict, jmnedict, kanjidicand etc provide the latest static links.

For example, change:

 https://github.com/downloads/KanjiVG/kanjivg/kanjivg-20120219.xml.gz

to:

 https://github.com/downloads/KanjiVG/kanjivg/kanjivg-current.xml.gz

Under the link already includes the upload date, so people won't be confused.

Thank you.

05618-xxx displaced to the right

The 4 variants of 05618 are all displaced to the right (start points have an offset of ~200 points). 05618.svg is not affected.

Move latin letters to full-width range

Latin letters from the normal range (00020-...) should be moved to the full-width range (0FF00-...), since they graphically represent full-width characters, being drawn on a square canvas. The normal range should be abandoned.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.