Git Product home page Git Product logo

onlyspeaktts's Introduction

OnlySpeakTTS

This is a TTS server that uses a private fork of tortoise to keep generation times and VRAM usage low. If you play the audio while generating, you can get very close to real-time.

You will need to provide your own audio clips. Store them in 'Tortoise/tortoise/voices/{voice_name}' you will want 2-3x 10 second clips, if it doesn't come out perfect, play around with the parameters. If you don't mind mix n' matching, you can include a few other clips that provide accent and dynamic range.

You may be able to get more emotion if you add a 'angry' or 'happy' clip to the mix, then generate different voices for each emotion. As long as you can tell what emotion to use, you can quickly swap between them.

System Requirements

Generations can use up to 5 Gigs of VRAM, and I average about 7-8 second generation times for full sentences on an RTX 3090, 3-4 seconds for shorter sentences. The 'fast' preset is slightly faster. Quality isn't bad, it's just not as smooth.

I Experience no slow-down in generations while running games like Minecraft. I did experience an increase in generation times by 100%~ while maxing my graphics card, playing games like Generation Zero.

I Never tested generating speech while inferencing on a text-generation model at the same time, but I assume it will result in slower generation times for both. It's perfectly okay to keep the models loaded into VRAM and go back and forth, though.

What Can This Do?

Assuming you use the server.py script, all inputs will automatically be seperated into segments that are within tortoise's max range.

You also have the option to save the generated voice tensors to files, and load from them later. This ensures that voices are consistent, instead of relying on randomly generated latents each time.

You can load voices from files in a second or less, using mutliple voices is perfectly viable.

How do i use this?

Tortoise-tts has it's own way it wants to be used, but I completely messed up the api.py script in this fork and didn't feel like fixing it.

For this fork, just use the server.py script and send http POST requests to port 7332 You can check the client.py script to see how these post requests should be formatted, what commands you can use, and how to use them.

You can:

  1. Generate a new voice
  2. Redo the previous generation if you got a bad one (becuase it's random)
  3. Save the current voice to files that can be loaded later
  4. Send a message to be spoken

In addition to the requirments for Tortoise, the server.py, client.py, and speech.py also have a few:

colorama, requests, soundfile, wave, pydub, threading, winsound, rich

Installation

You can create a venv, a conda env, or whatever. The original installation instructions for Tortoise still apply, but you may need to manually install setuptools before using the setup.py script

pip install setuptools

Example Video

Another_Example.mp4

Cloning Voices

Better_Cloning.mp4

This clip of laura croft uses around 5 diffusion iterations, that give it the poor quality to match the original voice. Alternatively, I could have used 7-12 diffusion iterations which would have given a much clearer voice for almost no extra generation time.

Links

I uploaded a longer example to youtube (the Lara Croft voice in this video is old):

https://www.youtube.com/watch?v=XV87AE22a6M

The Tortoise Repo:

https://github.com/neonbjb/tortoise-tts

onlyspeaktts's People

Contributors

pandaily591 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

onlyspeaktts's Issues

Where to get the .pt files?

Where can I find the .pt files that should be in the same folder as the voices? It is erroring for me trying to load the .pt files:

 * Serving Flask app 'server'
 * Debug mode: off
CHANGING VOICE...
Loaded 1.mp3
Loaded 2.mp3
Loaded 3.mp3
Loaded 4.mp3
Loaded 5.mp3
[2023-10-21 12:34:38,907] ERROR in app: Exception on / [POST]
Traceback (most recent call last):
  File "C:\Users\root\anaconda3\envs\OnlySpeakTTS\lib\site-packages\flask\app.py", line 1455, in wsgi_app
    response = self.full_dispatch_request()
  File "C:\Users\root\anaconda3\envs\OnlySpeakTTS\lib\site-packages\flask\app.py", line 869, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "C:\Users\root\anaconda3\envs\OnlySpeakTTS\lib\site-packages\flask\app.py", line 867, in full_dispatch_request
    rv = self.dispatch_request()
  File "C:\Users\root\anaconda3\envs\OnlySpeakTTS\lib\site-packages\flask\app.py", line 852, in dispatch_request
    return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args)
  File "C:\ai\OnlySpeakTTS\server.py", line 55, in result
    LOADED_VOICE = synthesizer.load_voice(os.path.join("voice_tensors", CURRENT_VOICE))
  File "C:\ai\OnlySpeakTTS\src\speech.py", line 151, in load_voice
    voice.append(torch.load(f"{voice_path}/Auto_Conditioning.pt"))
  File "C:\Users\root\anaconda3\envs\OnlySpeakTTS\lib\site-packages\torch\serialization.py", line 986, in load
    with _open_file_like(f, 'rb') as opened_file:
  File "C:\Users\root\anaconda3\envs\OnlySpeakTTS\lib\site-packages\torch\serialization.py", line 435, in _open_file_like
    return _open_file(name_or_buffer, mode)
  File "C:\Users\root\anaconda3\envs\OnlySpeakTTS\lib\site-packages\torch\serialization.py", line 416, in __init__
    super().__init__(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'voice_tensors\\natlamir/Auto_Conditioning.pt'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.