Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Code for this paper Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Shaojin Ding, Guanlong Zhao, Ricardo Gutierrez-Osuna

Accepted by INTERSPEECH 2020

This is a TensorFlow + Pytorch implementation. This implementation is adapted from the Real Time Voice Clone implementation at https://github.com/CorentinJ/Real-Time-Voice-Cloning.

Dataset:

VCTK
- Audio samples.
- Trained model.

Requirements

Python 3.7 or newer
PyTorch with CUDA enabled
TensorFlow 1.13.1
Run pip install -r requirements.txt

Data preprocessing

We use the speaker encoder model and vocoder model from here. We only train the voice conversion model (i.e., synthesizer).

Before running, put the speaker encoder and vocoder at encoder/saved_models/pretrained.pt and vocoder/saved_models/pretrained/pretrained.pt

Download and uncompress the VCTK dataset.
Manually split the train and test set (there is no official data split). Put them as <dataset_root>/VCTK/train/p227 and <dataset_root>/VCTK/test/p228
Run python synthesizer_preprocess_audio.py <datasets_root>
Run python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer_train
Run python synthesizer_preprocess_embeds.py <datasets_root>/SV2TTS/synthesizer_test

Training and inference

To launch training:

$ python synthesizer_train.py vc_adversarial <datasets_root>/SV2TTS/synthesizer_train

To run inference, use synthesis_ppg_script.py. Change the syn_dir to the path of the trained model, e.g., synthesizer/saved_models/logs-train_adversarial_vctk/taco_pretrained

Acknowledgement

The code is adapted from CorentinJ / Real-Time-Voice-Cloning.

Cite the work

@article{dingimproving,
  title={Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition},
  author={Ding, Shaojin and Zhao, Guanlong and Gutierrez-Osuna, Ricardo}
}

entn-at / adversarial-many-to-many-vc Goto Github PK

adversarial-many-to-many-vc's Introduction

Improving the Speaker Identity of Non-Parallel Many-to-Many Voice Conversion with Adversarial Speaker Recognition

Dataset:

Requirements

Data preprocessing

Training and inference

Acknowledgement

Cite the work

adversarial-many-to-many-vc's People

Contributors

Watchers

Forkers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent