Git Product home page Git Product logo

duguo-chinese-reading-app's Introduction

DuGuo

DuGuo is under construction - will be prototyping some ideas, if you're interested in contributing then let me know and happy to collaborate!

docs: 0.2.0 License: AGPL

DuGuo demo gif

Overview

DuGuo is an open-source web application that allows users to read Chinese text in an interactive learning environment. The main features include:

  • Phonetic support (Pinyin + Zhuyin) and phrase lookup via CC-CEDICT
  • Phrase tokenization via spaCy
  • Text-to-speech via the SpeechSynthesis API
  • Transposition between Simplified + Traditional Chinese text
  • ... other ideas tbd - view + contribute in the Issues tab!

This app is designed in particular for L2 (second-language) learners, though hopefully it is useful for all levels of Chinese learning!

Tech Stack

The app has 2 microservices:

  1. A web server written in Rust using Rocket
  2. An NLP tokenization service written in Python primarily using spaCy's Chinese module (which builds on top of jieba)

For data persistance, mongoDB and Redis are used.

Tokenized words are looked-up in the CC-CEDICT which is generously available under a Creative Commons license. Radical information (for saved vocab) is sourced from this web API and can be quickly accessed using the accompanying Hemiola Chinese Character Browser.

Motivation

Learning Chinese as a second language is hard for many reasons. To start, Chinese characters are logographic whereas English characters are alphabetic - this necessitates a fundamentally different approach to phrase memorization. Additionally, phrase pronunciation requires learning technical phonetic syntax (e.g. pinyin) which is rarely used by natives and virtually non-existant in practice.

While there are many more nuanced approaches to Chinese learning (e.g. the HSK framework), one simplified view is that there are 3 levels of Chinese reading mastery:

  1. Almost entirely pinyin-dependent (for beginners and L2 learners that can speak but can't read, like myself...)
  2. Some pinyin needed (roughly grade-school level for native Chinese speakers)
  3. Almost no pinyin needed (adult level - phrases are either memorized or able to be intuited based on the context)

Below are images to provide a visual reference. While for natives the jump from tier 1 to 3 is trivial, for L2 learners it can feel insurmountable!

  1. A beginner-level Chinese textbook with pinyin included for all words ('Tier 1').
  2. An intermediate-level Chinese textbook with pinyin for some words ('Tier 2'). In practice, this is grade-school level for natives!
  3. A native-level article from a Chinese newspaper ('Tier 3'). No pinyin is used at all, since natives don't really need it!

Contextual Learning

Contextual learning is arguably the best way to learn a language. People remember things that are linked to experiences or assorted significant pieces of information. For natives, learning Chinese is essential. However for L2 learners, finding the urgency to learn is uniquely difficult without an external driving force (e.g. living in a Chinese-speaking country).

Barring the ability to live in a foreign country, DuGuo hopes to offer the next-best thing by allowing users to pick what they want to read (improving contextual relevance) and saving contextual references for "learned" phrases (adding contextual triggers).

Other Existing Tools

There are several existing tools that provide similar functionality, including (but not limited to): Zhongwen Chrome Extension, Purple Culture Pinyin Converter, Du Chinese (mobile), mdbg.net, Hànzì Analyzer, pin1yin1, etc.

The main differentiators DuGuo hopes to provide with this project are improved UX, progress persistance (via accounts), document difficulty scoring (in progress), and Duey! Ultimately this is provided as an additional tool to help users learn Chinese, so definitely use the combination of tools that best supplements your learning experience.

Acknowledgement

This project was adopted from Martin Kess's previous CS6460 final project, the Chinese Reading Machine (中文读机). He provided the starter code (in Python Flask) and a strong existing framework to build on. The images for Duey came from Dzaky Taufik (his Upwork linked here). 感谢 and 大家加油!

Duey! Confused Duey? Surprised Duey Worried Duey :-( Happy Duey!

duguo-chinese-reading-app's People

Contributors

ericpan64 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.