Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion

Code for this paper Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion

Shaojin Ding, Ricardo Gutierrez-Osuna

In INTERSPEECH 2019

This is a Pytorch implementation. This implementation is based on the VQ-VAE-WaveRNN implementation at https://github.com/mkotha/WaveRNN.

Dataset:

VCTK
- Audio samples.
- Trained model.

Preparation

The preparation is similar to that at https://github.com/mkotha/WaveRNN. We repeat it here for convenience.

Requirements

Python 3.6 or newer
PyTorch with CUDA enabled
librosa
apex if you want to use FP16 (it probably doesn't work that well).

Create config.py

cp config.py.example config.py

Preparing VCTK

You can skip this section if you don't need a multi-speaker dataset.

Download and uncompress the VCTK dataset.
python preprocess_multispeaker.py /path/to/dataset/VCTK-Corpus/wav48 /path/to/output/directory
In config.py, set multi_speaker_data_path to point to the output directory.

Usage

To run Group Latent Embedding:

$ python wavernn.py -m vqvae_group --num-group 41 --num-sample 10

The -m option can be used to tell the the script what model to train. By default, it trains a vanilla VQ-VAE model.

Trained models are saved under the model_checkpoints directory.

By default, the script will take the latest snapshot and continues training from there. To train a new model freshly, use the --scratch option.

Every 50k steps, the model is run to generate test audio outputs. The output goes under the model_outputs directory.

When the -g option is given, the script produces the output using the saved model, rather than training it.

--num-group specifies the number of groups. --num-sample specifies the number of atoms in each group. Note that num-group times num-sample should be equal to the total number of atoms in the embedding dictionary (n_classes in class VectorQuantGroup in vector_quant.py)

Acknowledgement

The code is based on mkotha/WaveRNN.

Cite the work

@inproceedings{Ding2019,
  author={Shaojin Ding and Ricardo Gutierrez-Osuna},
  title={{Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion}},
  year=2019,
  booktitle={Proc. Interspeech 2019},
  pages={724--728},
  doi={10.21437/Interspeech.2019-1198},
  url={http://dx.doi.org/10.21437/Interspeech.2019-1198}
}

jonojace / globallatentembedding Goto Github PK

globallatentembedding's Introduction

Group Latent Embedding for Vector Quantized Variational Autoencoder in Non-Parallel Voice Conversion

Dataset:

Preparation

Requirements

Create config.py

Preparing VCTK

Usage

Acknowledgement

Cite the work

globallatentembedding's People

Contributors

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent