Git Product home page Git Product logo

dragonmasher's People

Contributors

tsroten avatar

Stargazers

 avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

dragonmasher's Issues

Overlapping / similar projects

nice to meet you @tsroten, my name is Tony Narlock and have come to find and admire your CJK libraries. I have a similar project at https://cihai.git-pull.com. It's still in early phases

Goal is to be a successor to cjklib

Some notes on the approach I'm taking / state of dealing with data:

  • backing data with sqlalchemy (like CJKLib did)
  • datasets pulled in are being distributed as data packages (for the sake of conservancy and the time it takes to normalize them, may as well make them available agnostically)
  • backed by downloading UNIHAN (using https://unihan-etl.git-pull.com)
    some notes on that: i found that UNIHAN actually bakes in quite a lot of structured information into its fields. but i also believe it makes the ideal "spine" to pull in the rest of the CJK data sources against.
  • re: UNIHAN
    • it's also not trivial to use rdbms as a backend, it's not simple as dumping data in, there's some parts of UNIHAN such as variants that need to be read into to calculate joins
    • UNIHAN is inherently unique and not a flat, tabular dataset at all. so cihai doesn't handle it like it would a flat source like CEDict. A lot of effort is going into building a front-end to store/query against it.
  • for other data sources, considering baking in https://github.com/frictionlessdata/datapackage-py to the system.
  • risk: some other data sets which come up may not lend themselves to RDBMS. So I plan to document trial/error with it so no one exerts their effort twice
  • if my sqlalchemy / unihan expedition falls flat, i may end up back here. there's a need for a unified front for cjk in python
  • bikeshed: using MIT license (was formerly BSD)
  • CFDict has gone missing from the internet

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.