Git Product home page Git Product logo

fish-diffusion's Introduction

LOGO

Fish Diffusion


An easy to understand TTS / SVS / SVC training framework.

Check our Wiki to get started!

As the main branch is actively developing, we recommend that new users choose a stable version, such as v1.12

中文文档

Summary

Using Diffusion Model to solve different voice generating tasks. Compared with the original diffsvc repository, the advantages and disadvantages of this repository are as follows:

  • Support multi-speaker
  • The code structure of this repository is simpler and easier to understand, and all modules are decoupled
  • Support 441khz Diff Singer community vocoder
  • Support multi-machine multi-devices training, support half-precision training, save your training speed and memory

Preparing the environment

The following commands need to be executed in the conda environment of python 3.10

# Install PyTorch related core dependencies, skip if installed
# Reference: https://pytorch.org/get-started/locally/
conda install pytorch torchvision torchaudio pytorch-cuda=11.7 -c pytorch -c nvidia

# Install Poetry dependency management tool, skip if installed
# Reference: https://python-poetry.org/docs/#installation
curl -sSL https://install.python-poetry.org | python3 -

# Install the project dependencies
poetry install

Vocoder preparation

Fish Diffusion requires the OPENVPI 441khz NSF-HiFiGAN vocoder to generate audio.

Automatic download

python tools/download_nsf_hifigan.py

If you are using the script to download the model, you can use the --agree-license parameter to agree to the CC BY-NC-SA 4.0 license.

python tools/download_nsf_hifigan.py --agree-license

If the OpenVPI vocoder performs poorly on high notes, you can try the Fish Audio Beta Vocoder.

python tools/download_nsf_hifigan.py --vocoder FishAudioBeta

If you want to try the latest ContentVec to extract phoneme features, you can use the following command to download it.

python tools/download_nsf_hifigan.py --content-vec

Manual download

Download and unzip nsf_hifigan_20221211.zip from 441khz vocoder
Or nsf_hifigan-beta-v2-epoch-434.zip from Fish Audio Beta Vocoder
Copy the nsf_hifigan folder to the checkpoints directory (create if not exist)

If you want to download ContentVec manually, you can download it from here and put it in the checkpoints directory.

Dataset preparation

You only need to put the dataset into the dataset directory in the following file structure

dataset
├───train
│   ├───xxx1-xxx1.wav
│   ├───...
│   ├───Lxx-0xx8.wav
│   └───speaker0 (Subdirectory is also supported)
│       └───xxx1-xxx1.wav
└───valid
    ├───xx2-0xxx2.wav
    ├───...
    └───xxx7-xxx007.wav
# Extract all data features, such as pitch, text features, mel features, etc.
python tools/preprocessing/extract_features.py --config configs/svc_hubert_soft.py --path dataset --clean

Baseline training

The project is under active development, please backup your config file
The project is under active development, please backup your config file
The project is under active development, please backup your config file

# Single machine single card / multi-card training
python tools/diffusion/train.py --config configs/svc_hubert_soft.py
# Multi-node training
python tools/diffusion/train.py --config configs/svc_content_vec_multi_node.py
# Environment variables need to be defined on each node,please see https://pytorch-lightning.readthedocs.io/en/1.6.5/clouds/cluster.html  for more infomation.

# Resume training
python tools/diffusion/train.py --config configs/svc_hubert_soft.py --resume [checkpoint file]

# Fine-tune the pre-trained model
# Note: You should adjust the learning rate scheduler in the config file to warmup_cosine_finetune
python tools/diffusion/train.py --config configs/svc_cn_hubert_soft_finetune.py --pretrained [checkpoint file]

Inference

# Inference using shell, you can use --help to view more parameters
python tools/diffusion/inference.py --config [config] \
    --checkpoint [checkpoint file] \
    --input [input audio] \
    --output [output audio]


# Gradio Web Inference, other parameters will be used as gradio default parameters
python tools/diffusion/inference.py --config [config] \
    --checkpoint [checkpoint file] \
    --gradio

Convert a DiffSVC model to Fish Diffusion

python tools/diffusion/diff_svc_converter.py --config configs/svc_hubert_soft_diff_svc.py \
    --input-path [DiffSVC ckpt] \
    --output-path [Fish Diffusion ckpt]

Contributing

If you have any questions, please submit an issue or pull request.
You should run tools/lint.sh before submitting a pull request.

Real-time documentation can be generated by

sphinx-autobuild docs docs/_build/html

Credits

Thanks to all contributors for their efforts

fish-diffusion's People

Contributors

leng-yue avatar innnky avatar kangarroar avatar cnchtu avatar geraint-dou avatar mlo7ghinsan avatar lordelf avatar ricecakey06 avatar stardust-minus avatar huanlinoto avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.