Git Product home page Git Product logo

sifigan's Introduction

Source-Filter HiFi-GAN (SiFi-GAN)

This repo provides official PyTorch implementation of SiFi-GAN, a fast and pitch controllable high-fidelity neural vocoder.
For more information, please see our DEMO.

Environment setup

$ cd SiFiGAN
$ pip install -e .

Please refer to the Parallel WaveGAN repo for more details.

Folder architecture

  • egs: The folder for projects.
  • egs/namine_ritsu: The folder of the Namine Ritsu project example.
  • sifigan: The folder of the source codes.

The dataset preparation of Namine Ritsu database is based on NNSVS. Please refer to it for the procedure and details.

Run

In this repo, hyperparameters are managed using Hydra.
Hydra provides an easy way to dynamically create a hierarchical configuration by composition and override it through config files and the command line.

Dataset preparation

Make dataset and scp files denoting paths to each audio files according to your own dataset (e.g., egs/namine_ritsu/data/scp/namine_ritsu.scp).
List files denoting paths to the extracted features are automatically created in the next step (e.g., egs/namine_ritsu/data/scp/namine_ritsu.list).
Note that scp/list files for training/validation/evaluation are needed.

Preprocessing

# Move to the project directory
$ cd egs/namine_ritsu

# Extract acoustic features (F0, mel-cepstrum, and etc.)
# You can customize parameters according to sifigan/bin/config/extract_features.yaml
$ sifigan-extract-features audio=data/scp/namine_ritsu_all.scp

# Compute statistics of training data
$ sifigan-compute-statistics feats=data/scp/namine_ritsu_train.list stats=data/stats/namine_ritsu_train.joblib

Training

# Train a model customizing the hyperparameters as you like
$ sifigan-train generator=sifigan discriminator=univnet train=sifigan data=namine_ritsu out_dir=exp/sifigan

Inference

# Decode with several F0 scaling factors
$ sifigan-decode generator=sifigan data=namine_ritsu out_dir=exp/sifigan checkpoint_steps=400000 f0_factors=[0.5,1.0,2.0]

Analysis-Synthesis

# WORLD analysis + Neural vocoder synthesis
$ sifigan-anasyn generator=sifigan in_dir=your_own_input_wav_dir out_dir=your_own_output_wav_dir stats=pretrained_sifigan/namine_ritsu_train_no_dev.joblib checkpoint_path=pretrained_sifigan/checkpoint-400000steps.pkl f0_factors=[1.0]

Pretrained model

I provide a pretrained SiFiGAN model HERE which is trained on the Namine Ritsu corpus in the same training manner described in the paper. You can download and place it in your own directory. Then set the appropriate path to the pretrained model and the command should work.

However, since the Namine Ritsu corpus includes a single female Japanese singer, there is a possibility that the model would not work well especially for male singers. I am planning to publish another pretrained model trained on larger dataset including many speakers.

Due to being trained on the code before bug fixes, I have decided to cancel the release of the model trained on the Namine Ritsu database. Instead, a model trained on the following large-scale dataset is available.

A pretrained model on 24 kHz speech + singing datasets is available HERE. We used train-clean-100 and train-clean-360 in LibriTTS-R, and NUS-48E for training. Two speakers, ADIZ and JLEE in NUS-48E, were excluded from the training data for evaluation. Also, the wav data of NUS-48E were divided into clips of approximately one second each before the feature extraction step.

The feature preprocessing and training commands are as follows:

sifigan-extract-features audio=data/scp/libritts_r_clean+nus-48e_train_no_dev.scp minf0=60 maxf0=1000
sifigan-extract-features audio=data/scp/libritts_r_clean+nus-48e_dev.scp minf0=60 maxf0=1000
sifigan-extract-features audio=data/scp/libritts_r_clean+nus-48e_eval.scp minf0=60 maxf0=1000

sifigan-compute-statistics feats=data/scp/libritts_r_clean+nus-48e_train_no_dev.list stats=data/stats/libritts_r_clean+nus-48e_train_no_dev.joblib

sifigan-train out_dir=test/sifigan generator=sifigan data=libritts_r_clean+nus-48e train=sifigan_1000k

Monitor training progress

$ tensorboard --logdir exp

Citation

If you find the code is helpful, please cite the following article.

@INPROCEEDINGS{10095298,
  author={Yoneyama, Reo and Wu, Yi-Chiao and Toda, Tomoki},
  booktitle={ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  title={{Source-Filter HiFi-GAN: Fast and Pitch Controllable High-Fidelity Neural Vocoder}},
  year={2023},
  volume={},
  number={},
  pages={1-5},
  doi={10.1109/ICASSP49357.2023.10095298}
}

Authors

Development: Reo Yoneyama @ Nagoya University, Japan
E-mail: [email protected]

Advisors:
Yi-Chiao Wu @ Meta Reality Labs Research, USA
E-mail: [email protected]
Tomoki Toda @ Nagoya University, Japan
E-mail: [email protected]

sifigan's People

Contributors

chomeyama avatar clean-master avatar nakasako avatar r9y9 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.