Git Product home page Git Product logo

vall-e's Introduction

VALL-E

An unofficial PyTorch implementation of VALL-E, based on the EnCodec tokenizer.

"Buy Me A Coffee"

Get Started

A toy Google Colab example: Open In Colab. Please note that this example overfits a single utterance under the data/test and is not usable. The pretrained model is yet to come.

Requirements

Since the trainer is based on DeepSpeed, you will need to have a GPU that DeepSpeed has developed and tested against, as well as a CUDA or ROCm compiler pre-installed to install this package.

Install

pip install git+https://github.com/enhuiz/vall-e

Or you may clone by:

git clone --recurse-submodules https://github.com/enhuiz/vall-e.git

Note that the code is only tested under Python 3.10.7.

Train

  1. Put your data into a folder, e.g. data/your_data. Audio files should be named with the suffix .wav and text files with .normalized.txt.

  2. Quantize the data:

python -m vall_e.emb.qnt data/your_data
  1. Generate phonemes based on the text:
python -m vall_e.emb.g2p data/your_data
  1. Customize your configuration by creating config/your_data/ar.yml and config/your_data/nar.yml. Refer to the example configs in config/test and vall_e/config.py for details. You may choose different model presets, check vall_e/vall_e/__init__.py.

  2. Train the AR or NAR model using the following scripts:

python -m vall_e.train yaml=config/your_data/ar_or_nar.yml

You may quit your training any time by just typing quit in your CLI. The latest checkpoint will be automatically saved.

Export

Both trained models need to be exported to a certain path. To export either of them, run:

python -m vall_e.export zoo/ar_or_nar.pt yaml=config/your_data/ar_or_nar.yml

This will export the latest checkpoint.

Synthesis

python -m vall_e <text> <ref_path> <out_path> --ar-ckpt zoo/ar.pt --nar-ckpt zoo/nar.pt

TODO

  • AR model for the first quantizer
  • Audio decoding from tokens
  • NAR model for the rest quantizers
  • Trainers for both models
  • Implement AdaLN for NAR model.
  • Sample-wise quantization level sampling for NAR training.
  • Pre-trained checkpoint and demos on LibriTTS
  • Synthesis CLI

Notice

  • EnCodec is licensed under CC-BY-NC 4.0. If you use the code to generate audio quantization or perform decoding, it is important to adhere to the terms of their license.

Citations

@article{wang2023neural,
  title={Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers},
  author={Wang, Chengyi and Chen, Sanyuan and Wu, Yu and Zhang, Ziqiang and Zhou, Long and Liu, Shujie and Chen, Zhuo and Liu, Yanqing and Wang, Huaming and Li, Jinyu and others},
  journal={arXiv preprint arXiv:2301.02111},
  year={2023}
}
@article{defossez2022highfi,
  title={High Fidelity Neural Audio Compression},
  author={Défossez, Alexandre and Copet, Jade and Synnaeve, Gabriel and Adi, Yossi},
  journal={arXiv preprint arXiv:2210.13438},
  year={2022}
}

vall-e's People

Contributors

enhuiz avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.