Git Product home page Git Product logo

few-shot-transformer-tts's People

Contributors

dy-octa avatar mutiann avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

few-shot-transformer-tts's Issues

How to synthesis a wav file.

It's a great news that you release your pretrained models.
I have download checkpoints, and json files. How to synthesis a wav file, given a text and language? Specially in the case of code-switched text, like "Hello, 我是AI助手,很高兴认识你, nice to meet you."
Thanks in advance.

Setup instructions are incomplete: No module named 'hyperparams'

I'm trying to do the initial preprocess step, but the run errors out with No module named 'hyperparams'.

python corpora/process_corpus.py 
Traceback (most recent call last):
  File "/opt/data1/muksihs/git/Cherokee-TTS-fst/corpora/process_corpus.py", line 3, in <module>
    from hyperparams import hparams as hp
ModuleNotFoundError: No module named 'hyperparams'

I tried doing pip -e for the root directory of the project, but that complains: ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode:

Assistance would be appreciated.

How to finetune on my data.

I have downloaded VCTK(English) and ST-CMDS(Chinese) dataset, and want to synthesis Chinese-English code-switched speech. What should I do to finetune base on 1160000 checkpoint? Must I download all the dataset list in readme, and then train with --adapt_languages en-us:zh-cn? Many thanks in advance.

Export to onnx

Thanks for your great work, I have trained a zh-en code-switching model finetuning on checkpoint 1160000. But inference speed is no fast enough. I want to export model to onnx, and speed up with onnx_runtime, so the problem is how to export this model to onnx, any advice please. BTW, it's applied in Huggingface transformes.

Speaker information

Thanks for your excellent work. It helps me a lot for code-switched TTS. I'm trying to synthesis ZH-EN code-switched speech, the female voice with speaker of databaker is very good. Is there Chinese male speaker in all speakers? Could you please provide speaker's information such as dataset, language?

quick question:confirmation of procedure to utilize and run project

I had a some samples of japanese voice which i want to adapt to this tts.probably around 5-10minute

i had downloaded the t3 pretrained model
next step should be download the repo and unpack.
then run:pip install -r requirement.txt
then run:python -m torch.distributed.launch --nproc_per_node=NGPU train.py --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=DATA_DIR --training_languages=en-us:de-de:fr-fr:ru-ru:en-uk:es-es:uk-ua:pl-pl:it-it:ja-jp:zh-cn:nl-nl:fi-fi: ko-kr:eu-es:pt-br:hu-hu:jv-id:gl-es:gu-in:kn-in:da-dk:su-id:ta-in:ca-es:ml-in:te-in:my-mm:yo-ng:km-kh:mr-in:ne-np: bn-bd:bn-in:si-lk --adapt_languages=ja-jp --downsample_languages=ja-jp:100 --ddp=True --hparams="warmup_steps=800000" --restore_from=MODEL_DIR/model.ckpt-700000

how should the data-dir folder layout be?

where should i put my obtained voice sample?is it in data-dir?

I assume all my voice sample need to be re-encoded to mono,wav,22.05kHz,right?

Optimization of model for inference

I was able to train a model for my endangered language but I was wondering if there was any recommend methods in optimizing the model for faster inference, possibly quantization but i'm not sure how i would do that with this model.

Thanks

Looking forward to pre-trained models

Congratulations to your excellent work! It will be really helpful in multi language TTS systems.
Looking forward to your pre-trained models, which could save so much time for making up the user model. ^^

Few-shot adaptation for the low resource language Cherokee

I'm working on trying get speech synthesis working for the Cherokee language.

However, looking at the corpora that was used to train the published weights, the amount of data to download exceeds my Internet's and System's capability.

Would the few-shot adaptation from the published weights work if only provided, say, "en-US" ?

As an aside... how much GPU ram is required, I only have a GTX 3070 available here.

python -m torch.distributed.launch --nproc_per_node=1 train.py --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=DATA_DIR --training_languages=en-us --adapt_languages=chr-w-mco --ddp=True --hparams="warmup_steps=800000" --restore_from=T3_MODEL_DIR/model.ckpt-700000

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.