Git Product home page Git Product logo

kensaku's Introduction

kensaku

検索, pronounced kensaku, means "search" or "retrieval" in Japanese. This project aims to make it easier for Japanese learners to find information about kanji, radicals, and vocabulary. It does this by bundling information from the JMdict, kanjidic2, and kradfile projects into an SQLite database. While excellent resources, the complicated XML schemas and reliance on state tracking during parsing make the first two projects awkward to work with. Others have made this information available in more convenient formats like JSON, but the issues of memory usage and lookup performance with such large files remain. A number of database generators for a subset of this data exist, but to my knowledge this is the only project that provides simple and comprehensive access to all of it.

Status

This project is under active development. It currently features a dotnet console application written in F# which downloads and parses the aforementioned files and uses them to populate an SQLite database. The database is already in a usable state and contains all of the information made available by the different projects. However, the exact schema is still subject to change.

For now, queries must be made directly using SQL, but a library making common tasks available directly from dotnet code is planned as well as a command line application and possibly a GUI.

Example usage

select k.Value, r.Value, g.Value
from Entries as e
join KanjiElements as k on k.EntryId = e.id
join ReadingElements as r on r.EntryId = e.id
join Senses as s on s.EntryId = e.Id
join Glosses as g on g.SenseId = s.Id
where g.Language = "eng" and k.Value = "検索"
kanji hiragana meaning
検索 けんさく looking up (e.g. a word in a dictionary)
検索 けんさく retrieval (e.g. data)
検索 けんさく searching for
検索 けんさく referring to

Why not just use Jisho?

Valid question. For those who don't know, jisho.org is a fantastic online Japanese-English dictionary which is built on top of the same datasets as this project. I have 3 main reasons:

  1. Offline access and latency. A native application can offer faster and more reliable results than a web application. When learning a new language, it is common to make dozens of searches a day, and the longer it takes to get results, the more one's reading immersion is broken.
  2. More customizable and powerful search. Jisho lets you lookup unknown kanji by inputting a stroke count and selecting radicals from a table. This is really useful, but scanning through the table to find the radical you're looking for can take a long time. This project aims to let users refer to radicals by commonly used mnemonics. For example, one could search for by typing power and shellfish instead of clicking on and .
  3. Easy integration into other software. Other dotnet applications should be able to easily build features for rich Japanese language support on top of this library without reimplementing everything from scratch.

kensaku's People

Contributors

literacyfanatic avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.