Git Product home page Git Product logo

language-resources's Issues

Error in creating labels from festival utts

I manually placed my utts created in festival into labels/cmu_us_my_voice/festival/utts.

When trying to make labels SIOD error is coming and tmp file is not created. The error comes as follows under each of the file name:

** SIOD ERROR: unbound variable : eof
gawk: fatal: cannot open file ./full_context_labels/labels//tmp' for reading (No such file or directory) gawk: fatal: cannot open file ./full_context_labels/labels//tmp' for reading (No such file or directory) **

Can someone please help me with this issue?

Error while trying to build prompts for Tamil language

In my lexicon file, words are in the format as follows
("அ" nil (((a) 1)))
I'm actually trying to build Tamil voices using festival.
But when I'm trying to build prompts using " ./bin/do_build $PARALLEL build_prompts " , it gives errors as " ta_1566 PROMPTS
LEXICON: Word ஆண்டாள் (plus features) not found in lexicon
" for all the sentences in txt.done.data file.
But all those words are in my lexicon.scm file. How to get over with this error?

Why 'markup' pattern in .tsv files

I see many .tsv files containing following text for example in Sundanese:
CARDINAL_MARKUP cardinal|integer:-1| mineus hiji

And also the .grm files have corresponding grammar definitions to parse these patterns in .tsv files.
Where are these patterns "cardinal|integer:-1|" useful in real world text normalisation?
Why not .grm files just export rules to parse real world examples like "-1" instead of markup?

Detection of font encoding

Hi,

Thanks for this library. I am wondering if there is a function to check if the input is Zawgyi or Unicode encoding.

Need en text normalization resources

Really nice work and strong baseline of text normalization!

I am looking for a tool to do english text normalization and find sparrowhawk to solve my problem. But only a en_toy in documentation in sparrowhawk repo.

Counld you please provide a better english grm resources for sparrowhawk text normalization?

Thanks a lot !

unicode.fst model

Please let me know how can I build unicode.fst model? Many thanks.

how to build phoneset for a language using phonology.json

I'm trying to build phoneset for chinese language, still stuck on how to use apply_phonology.py. I've tried to use it and getting issues with regex

INST_LANG_VOX = re.compile(r'.*/([^_]+_[^_]+)_([^_]+)_phoneset.scm$')

Here is what I'm getting when I try to build necessary files.

$ python apply_phonology.py cantonese_phonology.json cantonese/data/
Traceback (most recent call last):
  File "apply_phonology.py", line 774, in <module>
    main(sys.argv)
  File "apply_phonology.py", line 739, in main
    assert len(phoneset_paths) == 1
AssertionError

Error in Training a Bangla Voice

Hi,
I was trying to build a Bangla Festival voice according to this guideline. But I got an error in the training phase. The error messages are quite long, but the first one was this:
xargs: <path to festival>/build/festvox/src/ehmm/bin/FeatureExtraction: No such file or directory

Can anyone point out the reasons for this error and/or methods to solve this problem.

Thanks in advance.

Duration of speech recordings in datasets?

Are the total durations of the speech recordings in these datasets available anywhere? I'd love to know that without downloading them and figuring it out for myself, if possible.

speaker level annotation in the speech datasets

Crowdsourced high-quality multi-speaker speech data sets, I see there are line_index_{gender}.tsv file.

In the tsv files, I can see the following data,

guf_03209_00443170675	જગતમાં કોઈ જીવ ન્નસ જંગમ છે અને કોઈ સ્થાવર.
guf_01414_00718082800	ગામમાં ભગવાન સ્વામિનારાયણનું મંદિર છે

Here, what are 03209 and 00443170675 for the first sample? Are they speaker and utterance id?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.