Git Product home page Git Product logo

et-g2p's Introduction

Grapheme to phoneme rules for Estonian

Implements a naive rule-based G2P converter for Estonian.

Note:

  • Compound words should have their compound constituents separated with and underscore in input (e.g. 'hamba_kivi')

Compile

ant 

Test

ant test

Run

$ echo "kesk_pank" | ./run.sh 
kesk_pank	k e s k p a n kk
$

The script reads words from stdin and writes words with their pronunciations back to stdout. Usually it's used to process a long list of words at a time:

$ ./run.sh < example/sample.vocab 
kana    k a n a
park    p a r kk
näinud  n ae i n u t
näinud(2)   n ae i n t
kontsa  k o n t s a
kesk_pank   k e s k p a n kk
OECD    o o e t s e e t e e
ETV24   e t e e v e e k a k s k ue m m e n t n e l i
ETV24-le    e t e e v e e k a h e k ue m n e n e l j a l e
NATO    n a tt o
ABC-pood    a a p e e t s e e p o o t
René    r e n e
Poincaré    p o i n kk a r e
$

The script can be also executed from any other directory:

~/tools/et-g2p/run.sh < vocab.txt

The tool supports user-defined transliteration dictionaries. This allows to define pronunciations for words that are pronounced differently from the Estonian rules and which are not defined in the built-in exception list.

Each line in the dictionary contains the original word and its transliteration (i.e., its probable ortographic form as if it was an Estonian word). Multiple transliterations can be given, seperated by commas. For example:

Jules      žül
Henri      henri, anrii
Poincaré   puangarree

Use the -dict option to set the user dictionary:

echo "Henri" | ./run.sh -dict tmp.dict
Henri	a n r i i
Henri(2)	h e n r i

Bugs

Pronunciation of foreign names is based on a short list of exceptions. As a result, for most English and French names, pronunciation is generated according to Estonian rules, which is of course wrong (see Poincaré above).

Use as a library

See src/java_test for samples.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.