mutiann / few-shot-transformer-tts Goto Github PK
View Code? Open in Web Editor NEWByte-based multilingual transformer TTS for low-resource/few-shot language adaptation.
License: MIT License
Byte-based multilingual transformer TTS for low-resource/few-shot language adaptation.
License: MIT License
It's a great news that you release your pretrained models.
I have download checkpoints, and json files. How to synthesis a wav file, given a text and language? Specially in the case of code-switched text, like "Hello, 我是AI助手,很高兴认识你, nice to meet you."
Thanks in advance.
I'm trying to do the initial preprocess step, but the run errors out with No module named 'hyperparams'
.
python corpora/process_corpus.py
Traceback (most recent call last):
File "/opt/data1/muksihs/git/Cherokee-TTS-fst/corpora/process_corpus.py", line 3, in <module>
from hyperparams import hparams as hp
ModuleNotFoundError: No module named 'hyperparams'
I tried doing pip -e for the root directory of the project, but that complains: ERROR: File "setup.py" or "setup.cfg" not found. Directory cannot be installed in editable mode:
Assistance would be appreciated.
I have downloaded VCTK(English) and ST-CMDS(Chinese) dataset, and want to synthesis Chinese-English code-switched speech. What should I do to finetune base on 1160000 checkpoint? Must I download all the dataset list in readme, and then train with --adapt_languages en-us:zh-cn? Many thanks in advance.
Traceback (most recent call last):
File "google.py", line 135, in
merge()
File "google.py", line 51, in merge
lines = open(os.path.join(f, "line_index.tsv"), "r", encoding='utf-8').read().splitlines()
FileNotFoundError: [Errno 2] No such file or directory: '../data/datasets/google/scottish_english_male/line_index.tsv'
Thanks for your great work, I have trained a zh-en code-switching model finetuning on checkpoint 1160000. But inference speed is no fast enough. I want to export model to onnx, and speed up with onnx_runtime, so the problem is how to export this model to onnx, any advice please. BTW, it's applied in Huggingface transformes.
Thanks for your excellent work. It helps me a lot for code-switched TTS. I'm trying to synthesis ZH-EN code-switched speech, the female voice with speaker of databaker is very good. Is there Chinese male speaker in all speakers? Could you please provide speaker's information such as dataset, language?
I had a some samples of japanese voice which i want to adapt to this tts.probably around 5-10minute
i had downloaded the t3 pretrained model
next step should be download the repo and unpack.
then run:pip install -r requirement.txt
then run:python -m torch.distributed.launch --nproc_per_node=NGPU train.py --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=DATA_DIR --training_languages=en-us:de-de:fr-fr:ru-ru:en-uk:es-es:uk-ua:pl-pl:it-it:ja-jp:zh-cn:nl-nl:fi-fi: ko-kr:eu-es:pt-br:hu-hu:jv-id:gl-es:gu-in:kn-in:da-dk:su-id:ta-in:ca-es:ml-in:te-in:my-mm:yo-ng:km-kh:mr-in:ne-np: bn-bd:bn-in:si-lk --adapt_languages=ja-jp --downsample_languages=ja-jp:100 --ddp=True --hparams="warmup_steps=800000" --restore_from=MODEL_DIR/model.ckpt-700000
how should the data-dir folder layout be?
where should i put my obtained voice sample?is it in data-dir?
I assume all my voice sample need to be re-encoded to mono,wav,22.05kHz,right?
I was able to train a model for my endangered language but I was wondering if there was any recommend methods in optimizing the model for faster inference, possibly quantization but i'm not sure how i would do that with this model.
Thanks
Thanks for your great work! As the title said, let's say I want to adapt on Greek language. is it ok for me to only download Greek language dataset, and then run the few-shot adaptation part? Looking forward to your reply.
Congratulations to your excellent work! It will be really helpful in multi language TTS systems.
Looking forward to your pre-trained models, which could save so much time for making up the user model. ^^
I'm working on trying get speech synthesis working for the Cherokee language.
However, looking at the corpora that was used to train the published weights, the amount of data to download exceeds my Internet's and System's capability.
Would the few-shot adaptation from the published weights work if only provided, say, "en-US" ?
As an aside... how much GPU ram is required, I only have a GTX 3070 available here.
python -m torch.distributed.launch --nproc_per_node=1 train.py --model-dir=MODEL_DIR --log-dir=LOG_DIR --data-dir=DATA_DIR --training_languages=en-us --adapt_languages=chr-w-mco --ddp=True --hparams="warmup_steps=800000" --restore_from=T3_MODEL_DIR/model.ckpt-700000
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.