vlomme / multi-tacotron-voice-cloning Goto Github PK

View Code? Open in Web Editor NEW

381.0 33.0 97.0 1009 KB

Phoneme multilingual(Russian-English) voice cloning based on

Home Page: https://github.com/CorentinJ/Real-Time-Voice-Cloning

License: Other

Jupyter Notebook 0.98% Python 99.02%

deep-learning pytorch tensorflow tts voice-cloning g2p tacotron wavernn russian

multi-tacotron-voice-cloning's Introduction

Multi-Tacotron Voice Cloning

This repository is a phonemic multilingual (Russian-English) implementation based on Real-Time-Voice-Cloning. it is a four-stage deep learning framework that allows to create a numerical representation of a voice from a few seconds of audio, and to use it to condition a text-to-speech model. If you only need the English version, please use the original implementation.

Этот репозиторий является многоязычной(русско-английской) фонемной реализацией, основанной на Real-Time-Voice-Cloning. Она состоит из четырёх нейронных сетей, которые позволяют создавать числовое представление голоса из нескольких секунд звука и использовать его для создания модели преобразования текста в речь

Example

Quick start

Use the colab online demo

Requirements

You will need the following whether you plan to use the toolbox only or to retrain the models.

≥Python 3.6.

PyTorch (>=1.0.1).

Run pip install -r requirements.txt to install the necessary packages.

A GPU is mandatory, but you don't necessarily need a high tier GPU if you only want to use the toolbox.

Pretrained models

Download the latest here.

Datasets

Name	Language	Link	Comments	My link	Comments
Phoneme dictionary	En, Ru	En,Ru	Phoneme dictionary	link	Совместил русский и английский фонемный словарь
LibriSpeech	En	link	300 speakers, 360h clean speech
VoxCeleb	En	link	7000 speakers, many hours bad speech
M-AILABS	Ru	link	3 speakers, 46h clean speech
open_tts, open_stt	Ru	open_tts, open_stt	many speakers, many hours bad speech	link	Почистил 4 часа речи одного спикера. Поправил анотацию, разбил на отрезки до 7 секунд
Voxforge+audiobook	Ru	link	Many speaker, 25h various quality	link	Выбрал хорошие файлы. Разбил на отрезки. Добавил аудиокниг из интернета. Получилось 200 спикеров по паре минут на каждого
RUSLAN	Ru	link	One speaker, 40h good speech	link	Перекодировал в 16кГц
Mozilla	Ru	link	50 speaker, 30h good speech	link	Перекодировал в 16кГц, Раскидал разных пользователей по папкам
Russian Single	Ru	link	One speaker, 9h good speech	link	Перекодировал в 16кГц

Toolbox

You can then try the toolbox:

python demo_toolbox.py -d <datasets_root>
or
python demo_toolbox.py

Wiki

Pretrained models

Тренировка (и для других языков)

Training (and for other languages)

Contribution

for any questions, please email me

Papers implemented

URL	Designation	Title	Implementation source
1806.04558	SV2TTS	Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis	CorentinJ
1802.08435	WaveRNN (vocoder)	Efficient Neural Audio Synthesis	fatchord/WaveRNN
1712.05884	Tacotron 2 (synthesizer)	Natural TTS Synthesis by Conditioning Wavenet on Mel Spectrogram Predictions	Rayhane-mamah/Tacotron-2
1710.10467	GE2E (encoder)	Generalized End-To-End Loss for Speaker Verification	CorentinJ

multi-tacotron-voice-cloning's People

Contributors

Stargazers

Watchers

Forkers

meelement desklop miksdigital vivekkalyanarangan30 blockspacer pandagst mrlebovsky grukz a170785 neuralbotnetworks sergeytimoshin theyellowdiary kstroll dedkoster rowbottomn gromina hiyoung-asr roniemartinez celine201 bornbai bashkapro ramanova wahidmounir missaaoo jokecorleone gtpse suvrajeet01 cnstntn-kndrtv jojocorleone ed-asriyan 5l1v3r1 wolfwarr1or roo4l wintdkyo kimjeacheol juzone mohammedgomaa ivan-khomich hobbit19 olegjakushkin gr686 sshuster reterno12 moriartydev dwtcourses hehe123008 crazycharles6 marcelgoya go24 holttechnologycorporation sfrhaxor turu anatolyart mark-rtb pavlikk-morozov sergeylocal dimwap tim775tim ks-sav netindiapro paulhb7 lcsouzamenezes antonlyshko childx chrisbward ivanovys kglnsk jakeyasha x-radon lenarakhmadeev ukrbloger testville ktsoev andyst75 echosystemmark msgerasyov cybersys ilyushin cayman88 robert-werner trucker-su speech-synthesis monilgandhi dotwasi flamenoire sonmyson dostavalovid oleksiiyevtushenko melodeiro giwinax brasd99 lupenarsen cloddy pockerhead timkar164 vergotten

multi-tacotron-voice-cloning's Issues

No module named 'tensorflow.contrib'

Tryed to run google collab, but there is an error. Please help.

How to implement it on local Computer?

Не видит dataset

Привет, у меня не видит dataset RU. LibriSpeech видит, а русский dataset не видит. Что делать? Спасибо

pretrained Tocotron2

Hi. Thank you.
Is it possible to not train your own Tacotron2 and use pretrained model on russian language?
This for example https://github.com/alphacep/tn2-wg

use tacotron2 trained on russian?

first, thanks for such a complete pipeline.
second, would you integrate e.g. this repo for native russian support?

Training problem

First of all, thank you for sharing the open-source of Multi-Tacotron-Voice-Cloning. I also just started learning about natural language processing programming. And I also started learning Python programming.
-I put the software in the directory: D: \ SV2TTS
-I put the dataset in the directory: D: \ Datasets, I have D: \ Datasets \ book and D: \ Datasets \ LibriSpeech

When using the code you provided, I had some training issues:

I have finished the steps

Run python encoder_preprocess.py D: \ Datasets
and the result is
Arguments:
datasets_root: D: \ Datasets
out_dir: D: \ Datasets \ SV2TTS \ encoder
datasets: ['preprocess_voxforge']
skip_existing: False
Done preprocessing book.

Run visdom
But I could not continue

Run python encoder_train.py my_run D: \ Datasets
because the notice appeared
C: \ Users \ Admin \ anaconda3 \ envs \ [Test_Voice] \ lib \ site-packages \ umap \ spectral.py: 4: NumbaDeprecationWarning: No direct replacement for 'numba.targets' available. Visit https://gitter.im/numba/numba-dev to request help. Thanks!
import numba.targets
usage: encoder_train.py [-h] [--clean_data_root CLEAN_DATA_ROOT]
[-m MODELS_DIR] [-v VIS_EVERY] [-u UMAP_EVERY]
[-s SAVE_EVERY] [-b BACKUP_EVERY] [-f]
[--visdom_server VISDOM_SERVER] [--no_visdom]
run_id
encoder_train.py: error: unrecognized arguments: D: \ Datasets

My question: How can I fix this problem?

Thanks again for your sharing!!!

Вопрос по поводу поддержки

Здравствуйте. У меня видео-карта GeForce GT 630M. Уж очень нужно запустить программу, но там у PyTorch минимум compute capability 3.0, а на моей видеокарте 2.1
Нужен именно интерфейс, для изучения графиков, на Colab только ввиде командной строки.
Если устанавливаю версию только CPU, то пишет, что не поддерживается.
Может быть есть способ отключить использование GPU и перейти на CPU?

Multi-Language Training

Hi,

I'm slightly confused in G2P model. Let's suppose If I need to train a model which specifically translates from Russian to English (Only). Do I still need to add dictionary or train G2P model ?

Also, I'm not able to catch the significance of G2P model here. We have the synthesizer, which is already doing the same work.

Thanks!

WaveGlow

Is it possible to use WaveGlow?
And not to train your own model but use pretrained?

Обучение сети для одного голоса.

Предположим нужно добиться хорошего звучания всего одного голоса(русский). Куда нажимать(инструкция для хомяков).

Training for other language other than Russian? English -> Other Language?

Hello, I am looking for a way So I can able to make a Japanese speaker speaks English, is it possible?

missed file "g2p/en.dic"

https://github.com/vlomme/Multi-Tacotron-Voice-Cloning/blob/master/g2p/train.py#L20

How can i add Arabic Language ??

Use of language embedding

Hi @vlomme,

Great work here, and thanks for open-sourcing it. I'm trying to understand how this works so that I can replicate it. I've gone through the code and don't see any language embedding, which I thought would be how you separate the speaker from the language.

Can you please explain how language-speaker independence is achieved?

Есть альтернативные модели клонирующие лучше чем эта?(оригинальная английская модель не в счет)

Any Samples?

Hi I am doing similar work like yours, my datasets is "En + Chineses".
I have tried the pretrained model offered by CorentinJ, and also finetune on the pretrained model, but i have not achieve good results till now. I am still training the encoder model now. And I wonder if you have some good results to share?

При запуске вне коллаба ошибка

OSError: [WinError 127] Не найдена указанная процедура. Error loading "C:\Users\admin\PycharmProjects\pythonProject\venv\lib\site-packages\torch\lib\caffe2_detectron_ops_gpu.dll" or one of its dependencies.

Пробовал разные методы и подходы, в общем ничего не помогает... Даже не знаю что и делать.

P.S. tensorflow-gpu поддерживается только до python-3.7, всё что выше будет писать ошибку что не нашло нужной версии.

Может кто сталкивался?

Could not find a version that satisfies the requirement PyQt5

  Could not find a version that satisfies the requirement PyQt5 (from -r requirements.txt (line 13)) (from ver
sions: )
No matching distribution found for PyQt5 (from -r requirements.txt (line 13))

Обучение нейронки

Доброго времени суток!
Как начать С НУЛЯ обучать нейронку? (т.е не нужен pretrained model)

Speech2Speech вместо Text2Speech

Спасибо больше за проект, я ищу какое-то быстрое решение для себя, которое могло бы делать voice style transfer (пародировать записанный голос по семплу), можно ли применить ваш синтезатор для этой задачи?

Could not find a version

❯ python3 -m pip install -r requirements.txt
ERROR: Could not find a version that satisfies the requirement tensorflow-gpu<=1.14.0 (from -r requirements.txt (line 1)) (from versions: 2.2.0rc1, 2.2.0rc2, 2.2.0rc3, 2.2.0rc4, 2.2.0, 2.2.1, 2.3.0rc0, 2.3.0rc1, 2.3.0rc2, 2.3.0, 2.3.1)
ERROR: No matching distribution found for tensorflow-gpu<=1.14.0 (from -r requirements.txt (line 1))

❯ python --version
Python 3.8.5

Unable to clone any voice other then the provided "ex.wav"

Hello,
I tried using the demo with the pre-trained network.
See the attached example. The output is just pure noise.
Is there anything wrong with the voice sample i provided as input?
I tried both male / female.

Thanks,
orig_16.zip

Training encoder

Thanks for work! Help me to train an encoder. How is it possible to add new custom voices to train datasets, or only fixed (like LibriSpeech: train-other-500, VoxCeleb1...) are available through the interface of commands:
python encoder_preprocess.py <datasets_root>
and
python encoder_train.py my_run <datasets_root>/SV2TTS/encoder

If possible, than how i should keep files, in root data directory or subfolders, in what formats? I tried to add my voice to subfolder but got an error like:

"Python encoder_preprocess.py data
Arguments:
datasets_root: data
out_dir: data/SV2TTS/encoder
datasets: ['preprocess_voxforge']
skip_existing: False

Preprocessing preprocess_voxforge
Couldn't find data/book, skipping this dataset"

I looked at the source and found that there are fixed funcs that preprocess different formats of train data (like preprocess22,preprocess44...) What do they mean? Maybe i should use one of them?
Thank you.

How to run on new voices?

Hello,
Amazing work.
I am running inference using your models on 2080 gpu. your example is perfect. But when I give a new audio clip (in English) and make it say the same Russian sentence, the output audio isn't good. There's lot of noise, and cloning is not even of good quality.

My question is:

Can I use pretrained models(from this repo) to clone a new speaker, and make it speak Russian? or Should I train every thing(g2p, encoder, synthesizer, vocoder) on new speaker(assuming I obtain hours of this speaker's audio)? Please advise.

Thanks,
S

Не работает.

Запустил демо-версию, не работает.

Твоя версия лучше чем Real-Time-Voice-Cloning?

Интересует твоё оценочное суждение, может сэмплы накидаешь, чтобы поиметь представление?
Я пробовал https://github.com/CorentinJ/Real-Time-Voice-Cloning и результатом не впечатлён, куча каких то шумов, на пару тонов выше речь.

pretrained model

Здравствуйте, экспериментировал с вашей моделью, но лишь некоторые записи дают хороший результат. На Хабре вы писали, что пробовали так же обучать модель только для русского языка и она работала лучше, у вас не осталось натренерованной модели? Если да, не могли бы вы поделиться ей?

OOM Error by additional text to be spoken

I have a problem inserting additional text to be spoken into the toolbox. The additional lines cause the vocoder to crash with out of memory error. Trying with the original code from CorentinJ and code from here I found that activating g2p in toolbox / __ init__ caused this error.
Apparently the g2p binds the resources that are important for other neural networks and does not release them when it has finished its task.
Can you fix that somehow?

Thank you

ModuleNotFoundError: No module named 'distance'

File "...\g2p\train.py", line 4, in
from distance import levenshtein
ModuleNotFoundError: No module named 'distance'

vlomme / multi-tacotron-voice-cloning Goto Github PK

multi-tacotron-voice-cloning's Introduction

Multi-Tacotron Voice Cloning

Quick start

Requirements

Pretrained models

Datasets

Toolbox

Wiki

Contribution

Papers implemented

multi-tacotron-voice-cloning's People

Contributors

Stargazers

Watchers

Forkers

multi-tacotron-voice-cloning's Issues

Recommend Projects

Recommend Topics

Recommend Org