mutiann / few-shot-transformer-tts Goto Github PK

View Code? Open in Web Editor NEW

89.0 89.0 18.0 112 KB

Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.

License: MIT License

Python 100.00%

speech-synthesis

few-shot-transformer-tts's People

Contributors

Stargazers

Watchers

Forkers

tsaifangsheng johndpope whitefu ishine liujingxiu23 maxmax2016 yingfenging cherokeelanguage taalua tricky61 gheyret showgan chester-w-xie kurnianggoro sigmaquan gongchenghhu silenticymoon satoshirobatofujimoto

few-shot-transformer-tts's Issues

How to synthesis a wav file.

It's a great news that you release your pretrained models.
I have download checkpoints, and json files. How to synthesis a wav file, given a text and language? Specially in the case of code-switched text, like "Hello, 我是AI助手，很高兴认识你, nice to meet you."
Thanks in advance.

Setup instructions are incomplete: No module named 'hyperparams'

I'm trying to do the initial preprocess step, but the run errors out with No module named 'hyperparams'.

python corpora/process_corpus.py 
Traceback (most recent call last):
  File "/opt/data1/muksihs/git/Cherokee-TTS-fst/corpora/process_corpus.py", line 3, in <module>
    from hyperparams import hparams as hp
ModuleNotFoundError: No module named 'hyperparams'

I tried doing pip -e for the root directory of the project, but that complains: ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode:

Assistance would be appreciated.

How to finetune on my data.

I have downloaded VCTK(English) and ST-CMDS(Chinese) dataset, and want to synthesis Chinese-English code-switched speech. What should I do to finetune base on 1160000 checkpoint? Must I download all the dataset list in readme, and then train with --adapt_languages en-us:zh-cn? Many thanks in advance.

The scottish_english_male archive for Google appears to have changed the extension for line_index.tsv to line_index.csv

Traceback (most recent call last):
File "google.py", line 135, in
merge()
File "google.py", line 51, in merge
lines = open(os.path.join(f, "line_index.tsv"), "r", encoding='utf-8').read().splitlines()
FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/google/scottish_english_male/line_index.tsv'

Export to onnx

Thanks for your great work, I have trained a zh-en code-switching model finetuning on checkpoint 1160000. But inference speed is no fast enough. I want to export model to onnx, and speed up with onnx_runtime, so the problem is how to export this model to onnx, any advice please. BTW, it's applied in Huggingface transformes.

Speaker information

Thanks for your excellent work. It helps me a lot for code-switched TTS. I'm trying to synthesis ZH-EN code-switched speech, the female voice with speaker of databaker is very good. Is there Chinese male speaker in all speakers? Could you please provide speaker's information such as dataset, language?

quick question:confirmation of procedure to utilize and run project

I had a some samples of japanese voice which i want to adapt to this tts.probably around 5-10minute

i had downloaded the t3 pretrained model
next step should be download the repo and unpack.
then run:pip install -r requirement.txt
then run:python -m torch.distributed.launch --nproc_per_node=NGPU train.py --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=DATA_DIR --training_languages=en-us:de-de:fr-fr:ru-ru:en-uk:es-es:uk-ua:pl-pl:it-it:ja-jp:zh-cn:nl-nl:fi-fi: ko-kr:eu-es:pt-br:hu-hu:jv-id:gl-es:gu-in:kn-in:da-dk:su-id:ta-in:ca-es:ml-in:te-in:my-mm:yo-ng:km-kh:mr-in:ne-np: bn-bd:bn-in:si-lk --adapt_languages=ja-jp --downsample_languages=ja-jp:100 --ddp=True --hparams="warmup_steps=800000" --restore_from=MODEL_DIR/model.ckpt-700000

how should the data-dir folder layout be?

where should i put my obtained voice sample?is it in data-dir?

I assume all my voice sample need to be re-encoded to mono,wav,22.05kHz,right?

Would the few-shot adaptation from the published weights work if only provided, say, "en-US" ?

As an aside... how much GPU ram is required, I only have a GTX 3070 available here.

python -m torch.distributed.launch --nproc_per_node=1 train.py --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=DATA_DIR --training_languages=en-us --adapt_languages=chr-w-mco --ddp=True --hparams="warmup_steps=800000" --restore_from=T3_MODEL_DIR/model.ckpt-700000

mutiann / few-shot-transformer-tts Goto Github PK

few-shot-transformer-tts's People

Contributors

Stargazers

Watchers

Forkers

few-shot-transformer-tts's Issues

How to synthesis a wav file.

Setup instructions are incomplete: No module named 'hyperparams'

How to finetune on my data.

The scottish_english_male archive for Google appears to have changed the extension for line_index.tsv to line_index.csv

Export to onnx

Speaker information

quick question:confirmation of procedure to utilize and run project

Optimization of model for inference

If I only adaptation few-shot language, should I download the whole train dataset?

Looking forward to pre-trained models

Few-shot adaptation for the low resource language Cherokee

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent