jmdict-kindle / jmdict-kindle Goto Github PK

View Code? Open in Web Editor NEW

209.0 9.0 17.0 4.31 MB

Japanese - English dictionary for Kindle based on the JMdict / EDICT database

License: Other

Makefile 4.55% Python 65.95% HTML 29.08% CSS 0.42%

kindle japanese dictionary jmdict

jmdict-kindle's Introduction

About

This is a Japanese-English dictionary based on the JMdict and JMnedict and Tatoeba database for e-Ink Kindle devices.

Features:

lookup of inflected verbs.
lookup for Japanese names.
Example sentences
Pronunciation
the dictionaries can be downloaded as separate files or as one big dictionary

Supported Devices

The dictionary has been tested on Kindle Paperwhite and Kindle Oasis. It should also work well with other e-ink Kindle devices

The dictionary will not work well on Kindle Fire or Kindle Android App, or any Android based Kindle, because the Kindle software on those platforms does not support inflection lookups.

Download

You can download the latest version of the dictionary from here.

Install

e-Ink Kindle

There are in total 3 dictionaries:

jmdict.mobi: Contains data from the JMedict database, with additional examples. It does not contain proper names.
jmnedict.mobi: Contains Japanese proper names from the JMnedict databse.
combined.mobi: Contains the data from both of the above dictionaries. Please note that a lot of features are missing from the combined dictionary (sentences, pronunciation, ...) due to size constraints. Therefore, it is not suggested to use this dictionary.

To install any of the dictionaries (you can also install all three of them) into your device follow these steps:

for 1st-generation Kindle Paperwhite devices, ensure you have firmware version 5.3.9 or higher as it includes improved homonym lookup for Japanese;
connect your Kindle device via USB;
copy the the .mobi file for the dictionary you want to use to the documents/dictionaries sub-folder;
eject the USB device;
on your device go to Home > Settings > Device Options > Language and Dictionaries > Dictionaries and set JMdict Japanese-English Dictionary as the default dictionary for Japanese.

Kindle Android App

NOTE: Unfortunately the Kindle Android App does not support dictionary inflections, yielding verbs lookup practically impossible. No known workaround.

rename jmdict.mobi or any of the other two dictionaries as B005FNK020_EBOK.prc
connect your Android device via USB
copy B005FNK020_EBOK.prc into Internal Storage/Android/data/com.amazon.kindle/files/ or /sdcard/android/data/com.amazon.kindle/files

This will override the default Japanese-Japanese dictionary.

Kindle iOS App

The steps for iOS App are similar the Android App above. Unfortunately the Kindle iOS App seems to suffer from the same limitations regarding inflections.

Pitch accent information

The pitch accent information is encoded in the following way:

Underline for Low
No Formatting for High
ꜜ for a sudden Drop in pitch
° for a Nasal sound
If no formatting whatsoever is present then we do not have pitch information for that particular entry

Examples:

じたい means L-H-H
ねが°ꜜい means L-Hꜜ-L
ぜんしん means L-H-H-H
ひとꜜ means L-Hꜜ-(L) [The (L) means the next sound after ひと will be low. E.g. ひとが (L-H-L)]

For more information see Japanese pitch accent - Wikipedia

Building from source

Requirements:

Linux, Windows with Cygwin or WSL (might also work on macOS with a few changes)
Kindle Previewer if building on Windows or WSL Kindle Previewer
- Kindle Previewer has to be added to PATH. If normally installed add it by executing (for this change to take effect, please close all cmd and powershell windows):
```
Set-ItemProperty -Path 'Registry::HKEY_CURRENT_USER\Environment' -Name PATH -Value ((Get-ItemProperty -Path 'Registry::HKEY_CURRENT_USER\Environment' -Name PATH).path + ";$env:APPDATA\Amazon")
```
Python version 3
- Pycairo
- Pillow
- htmlmin

Inside of the makefile you can change the max number of sentences per entry, compression, as well as which sentences to include:

# The Kindle Publishing Guidelines recommend -c2 (huffdic compression),
# but it is excruciatingly slow. That's why -c1 is selected by default.
# Compression currently is not officially supported by Kindle Previewer according to the documentation
COMPRESSION ?= 1

# Sets the max sentences per entry only for the jmdict.mobi.
# It is ignored by combined.mobi due to size restrictions.
# If there are too many sentences for the combined dictionary,
# it will not build (exceeds 650MB size limit). The amount is limited to 0 in this makefile for the combined.mobi
SENTENCES ?= 5

# This flag determines wheter only good and verified sentences are used in the
# dictionary. Set it to TRUE if you only want those sentences.
# It is only used by jmdict.mobi
# It is ignored bei combined.mobi. There it is always true
# This is due to size constraints.
ONLY_CHECKED_SENTENCES ?= FALSE

# If true adds pronunciations to entries. The combined dictionary ignores this flag due to size constraints
PRONUNCIATIONS ?= TRUE

# If true adds additional information to entries. The combined dictionary ignores this flag due to size constraints
ADDITIONAL_INFO ?= TRUE

Build with make to create all 3 dictionaries (Note the combined dictionary will not build with Kindle Previewer due to size constraints):

make

or use any of the following commands to create a specific one:

make jmdict.mobi
make jmnedict.mobi
make combined.mobi

If you build it on WSL the commands are as follows:

make ISWSL=TRUE

or use any of the following commands to create a specific one:

make jmdict.mobi ISWSL=TRUE
make jmnedict.mobi ISWSL=TRUE
make combined.mobi ISWSL=TRUE

Create a Pull Request

Before making a pull request please ensure the formatting of your python code is correct. To do this please install black and run

black .

To do

Leverage more of the JMdict data:
- cross references
Add Furigana to example sentences
Create better covers

Credits

Jim Breen and the JMdict/EDICT project as well as the ENAMDICT/JMnedict
The Tatoeba project
John Mettraux for his EDICT2 Japanese-English Kindle dictionary
Choplair-network for their Nihongo conjugator
javdejong for the pronunciation data and the parser
mifunetoshiro for the additional pronunciation data

Alternatives

jmdict-kindle's People

Contributors

Stargazers

Watchers

Forkers

l4u uikit0 hpsoar u20024804 lippmann mnemonica markvdvelde mymro nanoman91 kartoffel0br itsupera thelegend5 xelloss1012 kelvinyrb tokyoshare jgasteiz oldmerkum

jmdict-kindle's Issues

Invert order of combined dictionary

@jrfonseca
I'm using the combined dictionary on a Kindle, and I noticed that the name lookups are always given as first result, before the regular dictionary lookups.

This is annoying in the case where a common word also happens to be a name. This is because 9 times out of 10, the regular dictionary definition is what you are actually looking for, and the name match is just a coincidence. After a few hours reading, I can report that this happens very frequently, forcing you to swipe after the name definition(s) until you get to the actual intended result.

It would be much better if the common words had precedence over the names, so the word is first looked up in the regular dictionary, and if it is not found there, as a fallback, it is looked up in the name dictionary.

Please, let me know your thoughts. Thanks for the great job!

Issue with reading for some words

Lookup for some words don't provide the common reading e.g.
人間の住める環境とはいえない場所に...
人間 lookup is じんかん instead of にんげん

[bug] ぐ past tense inflection is wrong

See: https://github.com/jmdict-kindle/jmdict-kindle/blob/master/inflections.py#L112

The past inflection is specified as いた even though it should be いだ.

Brilliant dictionary! Looking forward to a version included Japanese proper names

Thank you so much for bring us such a great J-E dictionary. In my opinion, this is the best Japanese dictionary online for Kindle .

A minor advice, it would be even better if it includes Japanese proper names from ENAMDICT. I know you've been working on it. Hope you could release it soon.

Cheers.

Making other translation than english

Hello, could you help me with that? I don't understand programming.
I would like to final have a Japanese > Polish dictionary, but I don't know how to edit the source files where I can convert English to Polish to be able to final create a new dictionary on kindle that will work, I want you to explain it to me so I can already translate it myself

no .mobi files in the download

do I need a program to open the files in the zip drive in order to find the .mobi dictionary files?

Kindle Compatibility

Can this plugin be used on kindle device outside paperwhite?

Dictionary loaded onto Kindle Paperwhite is not working

I loaded the combined.mobi from this release into my Kindle Paperwhite (Kindle 5.13.4) but it is not shown in the list of Japanese dictionaries:

I have already tried restarting the Kindle.

Pitch information README

I love the new addition of pitch information in the dictionary, however I have troubles understanding it.
The up and down arrows are very clear, but the underscore really confuses me.
I tried to google but it is not helping.

I think it would be useful if you could add a section in the readme that briefly explains the notation.

Remove combined dictionary from Windows build

The windows build sometimes randomly fails due to the size of the combined dictionary. We might have to remove the combined dictionary from the windows build in the future if the failed builds happen too often.

Add さ form inflection (eg 太い -> 太さ)

Thanks for your work by the way, this dictionary gets heavy use from me!

Does Jmdict work properly on IOS kindle app?

Hello,

I am a heavy user of your dictionary on kindle e-ink and i am thinking of getting a tablet, so i want to know if this part is also True for IOS kindle app.

"NOTE: Unfortunately the Kindle Android App does not support dictionary inflections, yielding verbs lookup practically impossible. No known workaround."

Thank you for the dictionary, awesome work!

Add Pitch information from https://github.com/mifunetoshiro/kanjium

This dictionary currently uses pitch information from one source. mifunetoshiro/kanjium has around 125.000 words with pitch information. It would increase the entries with accent data dramatically

Improve pitch diagram

I understand if this isn't something you want to consider, but I find the pitch notation used in the dictionary unintuitive and think it can be improved.

For example:

I think the reason I find this difficult is because I'm used to seeing the lines on the top rather than the bottom, so what's shown in this dictionary is the opposite of my expectation. For example, here's what Yomichan does:

Is it possible to show the lines on top as Yomichan does, and if so would you consider changing to this? (I feel like this is the more standard way to show it when using lines, so I suspect it's simply not possible, but if it is possible that would be great.)

Regardless of the above, would you consider adding in the other fairly standard notation of [X] in addition to the line notation (as shown in the above screenshots)? For example, [0] represents heiban and in all other cases the number in the brackets represents the mora after which the pitch drops. For those of us familiar with pitch accent, this notation is succinct and can be understood intuitively.

wget -nv -N http://ftp.monash.edu.au/pub/nihongo/JMdict_e.gz
2020-06-30 12:09:49 URL:http://ftp.monash.edu.au/pub/nihongo/JMdict_e.gz [8510545/8510545] -> "JMdict_e.gz" [1]
wget -nv -N http://downloads.tatoeba.org/exports/sentences.tar.bz2
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
2020-06-30 12:10:28 URL:https://downloads.tatoeba.org/exports/sentences.tar.bz2 [133869927/133869927] -> "sentences.tar.bz2" [1]
wget -nv -N http://downloads.tatoeba.org/exports/jpn_indices.tar.bz2
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
2020-06-30 12:10:35 URL:http://downloads.tatoeba.org/exports/jpn_indices.tar.bz2 [2813686/2813686] -> "jpn_indices.tar.bz2" [1]
wget -nv -N https://kindlegen.s3.amazonaws.com/kindlegen_linux_2.6_i386_v2_9.tar.gz
Loaded CA certificate '/etc/ssl/certs/ca-certificates.crt'
2020-06-30 12:10:43 URL:https://kindlegen.s3.amazonaws.com/kindlegen_linux_2.6_i386_v2_9.tar.gz [10813137/10813137] -> "kindlegen_linux_2.6_i386_v2_9.tar.gz" [1]
tar -xzf kindlegen_linux_2.6_i386_v2_9.tar.gz kindlegen
touch kindlegen
python3 jmdict.py -a -s 5 -d j
Parsing JMdict_e.gz...
error: ばっちー[adj-i] should end with い, but ends with ー
error: ばっちぃ[adj-i] should end with い, but ends with ぃ
error: んとす[vs-i] should end with 為る/する, but ends with とす
error: むとす[vs-i] should end with 為る/する, but ends with とす
error: おいちー[adj-i] should end with い, but ends with ー
Created 188454 entries
Adding sentences...
Sentences added: 89419
Creating files for JMdict...
./kindlegen JMdict.opf -c1 -verbose -dont_append_source -o jmdict.mobi

*************************************************************
 Amazon kindlegen(Linux) V2.9 build 1028-0897292 
 A command line e-book compiler 
 Copyright Amazon.com and its Affiliates 2014 
*************************************************************

Info:I9006:option: -c1: Standard DOC compression
Info:I9014:option: -verbose: Verbose output
Info:I9018:option: -donotaddsource: Source files will not be added
Info(prcgen):I1047: Added metadata dc:Title        "JMdict Japanese-English Dictionary"
Info(prcgen):I1047: Added metadata dc:Date         "2019-05-08"
Info(prcgen):I1047: Added metadata dc:Creator      "Electronic Dictionary Research & Development Group"
Info(prcgen):I1002: Parsing files  0000245
Info(prcgen):I1003: Parsing file     URL: JMdict-frontmatter.html
Info(prcgen):I1003: Parsing file     URL: entry-JMdict-あ.html
Warning(parser8):W26001: Index not supported for enhanced mobi.
Info(prcgen):I1003: Parsing file     URL: entry-JMdict-い.html
Info(parser8):I12001: Enhanced mobi generation suppressed.
Info(prcgen):I1036: Mobi file built successfully

You will notice that the last file parsed is entry-JMdict-い.html but I would expect it to be entry-JMdict-ン.html. Am I missing something?