Git Product home page Git Product logo

tts-with-rvc's Introduction

TTS-with-RVC 0.1.3

TTS-with-RVC (Text-to-Speech with RVC) is a package designed to enhance the capabilities of text-to-speech (TTS) systems by introducing a RVC module. The package enables users to not only convert text into speech but also personalize and customize the voice output according to their preferences with RVC support.

Pytorch with CUDA or MPS is required to get TTS-with-RVC work.

It may contain bugs. Report an issue in case of error.

Release notes

0.1.3 - fixed a lot problems, some optimization.

Prerequisites

You must have Python<=3.10 installed (3.10 is recommended).

You must have CUDA or MPS support for your GPU (mps is not tested yet).

Installation

  1. Install pytorch with CUDA or MPS support here: https://pytorch.org/get-started/locally/

  2. Then, install TTS-with-RVC using pip install:

python -m pip install git+https://github.com/Atm4x/tts-with-rvc.git#egg=tts_with_rvc
  1. After, install rvc:
python -m pip install git+https://github.com/Atm4x/rvc-lib.git@dev#egg=rvc
  1. Then, also install rvc but as repo:
python -m pip install -e git+https://github.com/Atm4x/rvc-lib.git#egg=rvclib
  1. Near the end, install the fixed version of rvc-tts-pipeline:
python -m pip install git+https://github.com/Atm4x/rvc-tts-pipeline-fix.git@dev#egg=rvc_tts_pipe
  1. And finally, install ffmpeg if you don't already have one, and add it to the folder with your script or better yet add ffmpeg to the Environment variables in Path.

How it Works

  1. Text-to-Speech (TTS): Users enter text into the TTS module, which then processes it and generates the corresponding speech as a file saved in the entered input directory
  2. RVC: With .pth file provided, RVC module reads the generated audio file, processes it and generates an new audio saved in output_directory with voice replaced.

Usage

TTS-with-RVC has a class called TTS_RVC. There are a few parameters that are required:

rvc_path - path to your installed rvclib directory (Usually in the venv/src folder. )

input_directory - path to your input directory (Temp directory for saving TTS output)

model_path - path to your .pth model

And optional parameters:

voice - voice from edge-tts list (default is "ru-RU-DmitryNeural")

output_directory - directory for saving voiced audio (temp/ is default).

To set the voice, firstly, make instance of TTS_RVC:

from tts_with_rvc import TTS_RVC

tts = TTS_RVC(rvc_path="src\\rvclib", model_path="models\\YourModel.pth", input_directory="input\\")

All voices available placed in voices.txt file:

tts.get_voices() is disabled indefinitely due to the problems

Next, set the voice for TTS with tts.set_voice() function:

tts.set_voice("un-Un-SelectedNeural")

Setting the appropriate language is necessary if you are using other languages for voiceovers!

And final step is calling tts to replace voice:

path = tts(text="Привет, мир!", pitch=6)

Parameters:

text - text for TTS (required)

pitch - pitch for RVC (optional, neg. values are compatible, default is 0)

tts_rate - extra rate of speech (optional, neg. values are compatible, default is 0)

tts_volume - extra volume of speech (optional, neg. values are compatible, default is 0)

tts_pitch - extra pitch of TTS-generated audio (optional, neg. values are compatible, not recommended, default is 0)

output_filename - specified path for voiced audio (optional, default is None)

Example of usage

A simple example for voicing text:

from tts_with_rvc import TTS_RVC
from playsound import playsound

tts = TTS_RVC(rvc_path="src\\rvclib", model_path="models\\DenVot.pth", input_directory="input\\")
tts.set_voice("ru-RU-DmitryNeural")
path = tts(text="Привет, мир!", pitch=6)

playsound(path)

Text parameters

There are some text parameters processor for integration issues such as adding GPT module.

You can process them using process_args in TTS_RVC class:

--tts-rate (value) - TTS parameter to edit the speech rate (negative value for decreasing rate and positive value for increasing rate)

--tts-volume (value) - TTS parameter to edit the speech volume (negative value for decreasing volume and positive value for increasing volume) Seems to not work because of the RVC module conversion.

--tts-pitch (value) - TTS parameter to edit the pitch of TTS generated audio (negative value for decreasing pitch and positive value for increasing pitch) I do not recommend using this because the RVC module has its own pitch for output.

--rvc-pitch (value) - RVC parameter to edit the pitch of the output audio (negative value for decreasing pitch and positive value for increasing pitch)

Now the principle of work:

from tts_with_rvc import TTS_RVC

tts = TTS_RVC(rvc_path="src\\rvclib", model_path="models\\YourModel.pth", input_directory="input\\")

# This method returns arguments and original text without these text parameters
args, message = tts.process_args(message)

The args variable contains an array with the following structure:

args[0] - TTS Rate

args[1] - TTS Volume

args[2] - TTS Pitch

args[3] - RVC pitch

And now we are ready to use it for generation:

path = tts(message, pitch=args[3],
               tts_rate=args[0],
               tts_volume=args[1],
               tts_pitch=args[2])

Exceptions

  1. NameError: NameError: name 'device' is not defined

Be sure your device supports CUDA and you installed right version of Torch.

  1. RuntimeError: RuntimeError: Failed to load audio: {e}

Be sure you installed ffmpeg.

License

No license

Authors

Atm4x (Artem Dikarev)

tts-with-rvc's People

Stargazers

 avatar  avatar  avatar  avatar  avatar

Watchers

 avatar

tts-with-rvc's Issues

Am I missing something?

I cant get it to work.
Every time I try to use a python program, they are hanging by a thread with the dependencies and coding. I can never get it to do what I want. Why do I need to be a developer to use code that should be already developed? Is that inherent to python projects? I already have CUDA, torch, etc. Everything should just work, but the console window says something about not being able to import from inference.py and jumps off a cliff.

Use case is for an old videogame with area descriptions and character speech text, but no voiceover. I already found a program that scans an area of the screen and outputs to a text file. I want to use your program because I think RVC is the right tool for quality and speed.

I'm trying to be able to run the script from a bat file: it should automatically read text thru a RVC model from a text file in a folder.

Why is this such a typical experience for me, even WITH instructions???
Sorry, and thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.