Git Product home page Git Product logo

gpt-sovits-inference's Introduction

GSVI : GPT-SoVITS Inference Plugin

Welcome to GSVI, an inference-specialized plugin built on top of GPT-SoVITS to enhance your text-to-speech (TTS) experience with a user-friendly API interface. This plugin enriches the original GPT-SoVITS project, making voice synthesis more accessible and versatile.

Please note that we do not recommend using GSVI for training. Its existence is to make the process of using GPT-soVITS simpler and more comfortable for others, and to make model sharing easier.

This fork is mainly based on the fast_inference_ branch, using a lot of PR code contributed by ChasonJiang. Thanks to this great developer. ”Dalao NB!“

At the same time, the Inference folder used by this branch is the main submodule, coming from https://github.com/X-T-E-R/TTS-for-GPT-soVITS.

Features

  • High-level abstract interface for easy character and emotion selection
  • Comprehensive TTS engine support (speaker selection, speed adjustment, volume control)
  • User-friendly design for everyone
  • Simply place the shared character model folder, and you can quickly use it.
  • High compatibility and extensibility for various platforms and applications (for example: SillyTavern)

Getting Started

  1. Install manually or use prezip for Windows
  2. Put your character model folders
  3. Run bat file or run python file manually
  4. If you encounter issues, join our community or consult the FAQ. QQ Group: 863760614 , Discord (AI Hub):

We look forward to seeing how you use GSVI to bring your creative projects to life!

Prezip : https://huggingface.co/XTer123/GSVI_prezip/tree/main

Usage

Use With Bat Files

You could see a bunch of bat files in 0 Bat Files/

  • If you want to update, then run bat 0 and 1 (or 999 0 1)
  • If you want to start with a single gradio file, then run bat 3
  • If you want to start with backend and frontend , run bat 5 and 6
  • If you want to manage your models, run 10.bat

Python Files

Start with a single gradio file

  • Gradio Application: app.py (In the root of GSVI)

Start with backend and frontend mod

  • Flask Backend Program: Inference/src/tts_backend.py
  • Gradio Frontend Application: Inference/src/TTS_Webui.py
  • Other Frontend Applications or Services Using Our API

Model Management

  • Gradio Model Management Interface: Inference/src/Character_Manager.py

API Documentation

For API documentation, visit our Yuque documentation page. or API Doc.md

Model Folder Format

In a character model folder, like trained/Character1/

Put the pth / ckpt / wav files in it, the wav should be named as the prompt text

Like :

trained
--hutao
----hutao-e75.ckpt
----hutao_e60_s3360.pth
----hutao said something.wav

Add a emotion for your model

To make that, open the Model Manage Tool (10.bat / Inference/src/Character_Manager.py)

It can assign a reference audio to each emotion, aiming to achieve the implementation of emotion options.

Installation

You could install this with the guide bellow, then download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models, and put your character model folder in trained

Or just download the pre-packaged distribution for Windows. ( then put your character model folder in trained )

About the character model folder, see below

Tested Environments

  • Python 3.9, PyTorch 2.0.1, CUDA 11
  • Python 3.10.13, PyTorch 2.1.2, CUDA 12.3
  • Python 3.9, PyTorch 2.3.0.dev20240122, macOS 14.3 (Apple silicon)

Note: numba==0.56.4 requires py<3.11

Windows

If you are a Windows user (tested with win>=10), you can directly download the pre-packaged distribution and double-click on go-webui.bat to start GPT-SoVITS-WebUI.

Or pip install -r requirements.txt , and then double click the install.bat

Linux

conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh

macOS

Note: The models trained with GPUs on Macs result in significantly lower quality compared to those trained on other devices, so we are temporarily using CPUs instead.

First make sure you have installed FFmpeg by running brew install ffmpeg or conda install ffmpeg, then install by using the following commands:

conda create -n GPTSoVits python=3.9
conda activate GPTSoVits

pip install -r requirements.txt
git submodule init
git submodule update --init --recursive

Install FFmpeg ( No need if use prezip )

Conda Users

conda install ffmpeg

Ubuntu/Debian Users

sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'

Windows Users

Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.

Pretrained Models ( No need if use prezip )

Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models.

Docker

Writing Now, Please Wait

Remove the pyaudio in the requirements.txt !!!!

Credits

This fork is mainly based on the fast_inference_ branch of GPT-soVITS project, using a lot of PR code contributed by ChasonJiang.

Special thanks to the following projects and contributors:

Theoretical

Pretrained Models

Text Frontend for Inference

WebUI Tools

Thanks to all contributors for their efforts

gpt-sovits-inference's People

Contributors

rvc-boss avatar chasonjiang avatar yuan-manx avatar x-t-e-r avatar d3lik avatar breakstring avatar atopona avatar kamiorinn avatar pengoosedev avatar ricecakey06 avatar kakaruhayate avatar narusemioshirakana avatar blaisewf avatar xxxxrt666 avatar lion-wu avatar sapphirelab avatar miuzarte avatar v3ucn avatar shadowloveelysia avatar alexzhou1995 avatar tps-f avatar watchtower-liu avatar erythrocyte3803 avatar kexul avatar im-ling avatar anyacoder avatar lauragpt avatar tundrawork avatar xmimu avatar bruce2233 avatar

Stargazers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.