Git Product home page Git Product logo

taibun's Introduction

Typing SVG

LinkedIn Mail Website Visitors

Andrei Harbachov

πŸ‘‹ Hello, everyone! I'm Andrei, a dedicated university student majoring in Computer Science with a keen interest in the fascinating world of Artificial Intelligence. My daily routine revolves around experimentation, learning, and coding, fueled by copious amounts of tea. Let's connect, collaborate, and embark on this thrilling AI adventure together! πŸš€


Coding

  • πŸŽ“ Pursuing a Bachelor's degree in Computer Science with a specialisation in Artificial Intelligence and Visual and Interactive Computing at Simon Fraser University.

  • πŸ‘¨β€πŸ’» Currently enhancing the representation of Taiwanese Hokkien by leveraging a Neural Machine Translator and developing Programming Toolboxes that simplify the creation of programming applications for the language.

  • πŸ“š Learning Natural Language Processing, Computer Vision, and Machine Learning.

  • ❀️ Passionate about exploring Computer Science, studying foreign languages, and coding.

  • πŸ’¬ Feel free to reach out on LinkedIn or by Mail.

GitHub Stats GitHub Langs

Languages and Tools

Python Java C C++ Rust Haskell

PyTorch NumPy TensorFlow Keras MATLAB Pandas

HTML CSS JavaScript TypeScript Angular React

Git Unity C# Android Firebase MySQL

taibun's People

Contributors

andreihar avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

taibun's Issues

Efficient Zhuyin conversion

Describe the feature
Current conversion to Zhuyin requires a 800 word long csv file with all possible syllables. It is preferred to have a conversion table to be much smaller, enabling to move the table to a dictionary within the code.

Expected behavior
Converter with Zhuyin system produces the same results as currently, but the chart within zhuyin.json file is decreased in size and moved moved to the in-code Python dictionary.

Relevant methods

  • __tailo_to_zhuyin

Incorrect Pingyim

Describe the bug
The converter throws an error when converting words with "m" as the nucleus (e.g. ζ―‹). The transliteration of initial "n-" in incorrect (e.g. 耐 as nΓ’i instead of lnΓ’i). The conversion of final "-t", "-k", "-p" is incorrect (converts to "-d", "-g", "-b").

To Reproduce
Steps to reproduce the behavior:

  1. Initialise Converter with 'Pingyim' as the system
  2. Prompt 'ζ―‹'
  3. See error
  4. Prompt '耐'
  5. See incorrect transliteration of initial 'n-'
  6. Prompt '法', '色', '硦'
  7. See incorrect transliteration of final '-t', '-k', '-p'

Expected behavior
Initial "n-" should convert to "ln-", final "-t", "-k", "-p" should convert to same values instead of "-d", "-g", "-b". Conversion of characters with "m" as the nucleus shouldn't throw an error.

Dataset improvement

Describe the feature
The current dataset doesn't include ~10k words that were not processed from the Taiwanese-Chinese Online Dictionary and doesn't include a big chunk of the iTaigi Chinese-Taiwanese Comparison Dictionary. Additionally, many proper nouns in the dataset can be derived using other entries and can yield unsatisfactory tokenisation results (e.g. δΈ­εœ‹ζ–‡εŒ–ε€§ε­Έ should be tokenised as δΈ­εœ‹ ζ–‡εŒ– 倧學, not δΈ­εœ‹ζ–‡εŒ–ε€§ε­Έ).

Expected behavior
More words are added and unnecessary proper nouns whose romanisation can be derived using other entries in the data are removed.

Relevant files
*data/words.json

Proper Sandhi rules

Describe the feature
Currently, the sandhi flag applies sandhi rules locally within a word, i.e. changes tone within the word of every single syllable except for the first one. This is different from real sandhi rules, where changes are applied to every single syllable of the sentence, not just single words. Sandhi rules of the library should be modified to reflect the actual sandhi rules of the Taiwanese language.

Expected behavior
Properly applies sandhi rules to words depending on their position in the sentence.

Relevant methods

  • __tone_sandhi

Additional context
Taiwanese Sandhi rules

Test cases

Describe the feature
Current test cases are sufficient enough to test for the correctness of Tailo and Zhuyin systems but lack in testing other systems. Similar test cases should be added for other transliteration systems.

Expected behavior
Test cases are added for POJ, TLPA, Pingyim, and Tongiong transliteration systems.

Relevant files

  • tests

Transliteration conversion

Describe the feature
Currently, it is possible to convert to different transliteration systems only by converting from Chinese characters. Should add the possbility from converting from Tailo to other transliteration systems, bypassing Chinese characters.

Expected behavior
There exists a method/class that will convert from Tailo to other transliteration systems

Style Improvement

Is your feature request related to a problem? Please describe.
The code is cluttered and can be organised better

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    πŸ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. πŸ“ŠπŸ“ˆπŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❀️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.