Git Product home page Git Product logo

rvc_cli's Introduction

RVC_CLI: Retrieval-based Voice Conversion Command Line Interface

Open In Collab

Table of Contents

  1. Installation
  2. Getting Started
  3. API
  4. Credits

Installation

Ensure that you have the necessary Python packages installed by following these steps (Python 3.9 is recommended):

Windows

Execute the install.bat file to activate a Conda environment. Afterward, launch the application using env/python.exe rvc.py instead of the conventional python rvc.py command.

Linux

chmod +x install.sh
./install.sh

Getting Started

Download the necessary models and executables by running the following command:

python rvc.py prerequisites

More information about the prerequisites command here

For detailed information and command-line options, refer to the help command:

python rvc.py -h

This command provides a clear overview of the available modes and their corresponding parameters, facilitating effective utilization of the RVC CLI.

Inference

Single Inference

python rvc.py infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_path "input_path" --output_path "output_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name Required Default Valid Options Description
f0up_key No 0 -24 to +24 Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius No 3 0 to 10 If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate No 0.3 0.0 to 1.0 Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length No 128 1 to 512 Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate No 1 0 to 1 Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect No 0.33 0 to 0.5 Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune No False True or False Apply a soft autotune to your inferences, recommended for singing conversions.
f0method No rmvpe pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
input_path Yes None Full path to the input audio file Full path to the input audio file
output_path Yes None Full path to the output audio file Full path to the output audio file
pth_path Yes None Full path to the pth file Full path to the pth file
index_path Yes None Full index file path Full index file path
split_audio No False True or False Split the audio into chunks for inference to obtain better results in some cases.
clean_audio No False True or False Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength No 0.7 0.0 to 1.0 Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format No WAV WAV, MP3, FLAC, OGG, M4A File audio format
embedder_model No hubert hubert or contentvec Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio No False True or False Upscale the audio to 48kHz for better results.

Refer to python rvc.py infer -h for additional help.

Batch Inference

python rvc.py batch_infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_folder_path "input_folder_path" --output_folder_path "output_folder_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name Required Default Valid Options Description
f0up_key No 0 -24 to +24 Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius No 3 0 to 10 If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate No 0.3 0.0 to 1.0 Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length No 128 1 to 512 Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate No 1 0 to 1 Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect No 0.33 0 to 0.5 Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune No False True or False Apply a soft autotune to your inferences, recommended for singing conversions.
f0method No rmvpe pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
input_folder_path Yes None Full path to the input audio folder (The folder may only contain audio files) Full path to the input audio folder
output_folder_path Yes None Full path to the output audio folder Full path to the output audio folder
pth_path Yes None Full path to the pth file Full path to the pth file
index_path Yes None Full path to the index file Full path to the index file
split_audio No False True or False Split the audio into chunks for inference to obtain better results in some cases.
clean_audio No False True or False Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength No 0.7 0.0 to 1.0 Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format No WAV WAV, MP3, FLAC, OGG, M4A File audio format
embedder_model No hubert hubert or contentvec Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio No False True or False Upscale the audio to 48kHz for better results.

Refer to python rvc.py batch_infer -h for additional help.

TTS Inference

python rvc.py tts_infer --tts_text "tts_text" --tts_voice "tts_voice" --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --output_tts_path "output_tts_path" --output_rvc_path "output_rvc_path" --pth_path "pth_path" --index_path "index_path"--split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"
Parameter Name Required Default Valid Options Description
tts_text Yes None Text for TTS synthesis Text for TTS synthesis
tts_voice Yes None Voice for TTS synthesis Voice for TTS synthesis
f0up_key No 0 -24 to +24 Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius No 3 0 to 10 If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate No 0.3 0.0 to 1.0 Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length No 128 1 to 512 Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate No 1 0 to 1 Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect No 0.33 0 to 0.5 Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune No False True or False Apply a soft autotune to your inferences, recommended for singing conversions.
f0method No rmvpe pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe] Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
output_tts_path Yes None Full path to the output TTS audio file Full path to the output TTS audio file
output_rvc_path Yes None Full path to the input RVC audio file Full path to the input RVC audio file
pth_path Yes None Full path to the pth file Full path to the pth file
index_path Yes None Full path to the index file Full path to the index file
split_audio No False True or False Split the audio into chunks for inference to obtain better results in some cases.
clean_audio No False True or False Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength No 0.7 0.0 to 1.0 Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format No WAV WAV, MP3, FLAC, OGG, M4A File audio format
embedder_model No hubert hubert or contentvec Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio No False True or False Upscale the audio to 48kHz for better results.

Refer to python rvc.py tts_infer -h for additional help.

Training

Preprocess Dataset

python rvc.py preprocess --model_name "model_name" --dataset_path "dataset_path" --sampling_rate "sampling_rate"
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
dataset_path Yes None Full path to the dataset folder (The folder may only contain audio files) Full path to the dataset folder
sampling_rate Yes None 32000, 40000, or 48000 Sampling rate of the audio data

Refer to python rvc.py preprocess -h for additional help.

Extract Features

python rvc.py extract --model_name "model_name" --rvc_version "rvc_version" --pitch_guidance "pitch_guidance" --hop_length "hop_length" --sampling_rate "sampling_rate"
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
rvc_version No v2 v1 or v2 Version of the model
pitch_guidance No True True or False By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
hop_length No 128 1 to 512 Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
sampling_rate Yes None 32000, 40000, or 48000 Sampling rate of the audio data
embedder_model No hubert hubert or contentvec Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.

Start Training

python rvc.py train --model_name "model_name" --rvc_version "rvc_version" --save_every_epoch "save_every_epoch" --save_only_latest "save_only_latest" --save_every_weights "save_every_weights" --total_epoch "total_epoch" --sampling_rate "sampling_rate" --batch_size "batch_size" --gpu "gpu" --pitch_guidance "pitch_guidance" --overtraining_detector "overtraining_detector" --overtraining_threshold "overtraining_threshold"  --sync_graph "sync_graph" --pretrained "pretrained" --custom_pretrained "custom_pretrained" [--g_pretrained "g_pretrained"] [--d_pretrained "d_pretrained"]
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
rvc_version No v2 v1 or v2 Version of the model
save_every_epoch Yes None 1 to 50 Determine at how many epochs the model will saved at.
save_only_latest No False True or False Enabling this setting will result in the G and D files saving only their most recent versions, effectively conserving storage space.
save_every_weights No True True or False This setting enables you to save the weights of the model at the conclusion of each epoch.
total_epoch No 1000 1 to 10000 Specifies the overall quantity of epochs for the model training process.
sampling_rate Yes None 32000, 40000, or 48000 Sampling rate of the audio data
batch_size No 8 1 to 50 It's advisable to align it with the available VRAM of your GPU. A setting of 4 offers improved accuracy but slower processing, while 8 provides faster and standard results.
gpu No 0 0 to โˆž separated by - Specify the number of GPUs you wish to utilize for training by entering them separated by hyphens (-).
pitch_guidance No True True or False By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
overtraining_detector No False True or False Utilize the overtraining detector to prevent overfitting. This feature is particularly valuable for scenarios where the model is at risk of overfitting.
overtraining_threshold No 50 1 to 100 Set the threshold for the overtraining detector. The lower the value, the more sensitive the detector will be.
pretrained No True True or False Utilize pretrained models when training your own. This approach reduces training duration and enhances overall quality.
custom_pretrained No False True or False Utilizing custom pretrained models can lead to superior results, as selecting the most suitable pretrained models tailored to the specific use case can significantly enhance performance.
g_pretrained No None Full path to pretrained file G, only if you have used custom_pretrained Full path to pretrained file G
d_pretrained No None Full path to pretrained file D, only if you have used custom_pretrained Full path to pretrained file D
sync_graph No False True or False Synchronize the graph of the tensorbaord. Only enable this setting if you are training a new model.

Refer to python rvc.py train -h for additional help.

Generate Index File

python rvc.py index --model_name "model_name" --rvc_version "rvc_version"
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
rvc_version Yes None v1 or v2 Version of the model

Refer to python rvc.py index -h for additional help.

UVR

python uvr.py [audio_file] [options]

Info and Debugging

Parameter Name Required Default Valid Options Description
audio_file Yes None Any valid audio file path The path to the audio file you want to separate, in any common format.
-d, --debug No False Enable debug logging.
-e, --env_info No False Print environment information and exit.
-l, --list_models No False List all supported models and exit.
--log_level No info info, debug, warning Log level.

Separation I/O Params

Parameter Name Required Default Valid Options Description
-m, --model_filename No UVR-MDX-NET-Inst_HQ_3.onnx Any valid model file path Model to use for separation.
--output_format No WAV Any common audio format Output format for separated files.
--output_dir No None Any valid directory path Directory to write output files.
--model_file_dir No /tmp/audio-separator-models/ Any valid directory path Model files directory.

Common Separation Parameters

Parameter Name Required Default Valid Options Description
--invert_spect No False Invert secondary stem using spectrogram.
--normalization No 0.9 Any float value Max peak amplitude to normalize input and output audio to.
--single_stem No None Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other Output only a single stem.
--sample_rate No 44100 Any integer value Modify the sample rate of the output audio.

MDXC Architecture Parameters

Parameter Name Required Default Valid Options Description
--mdxc_segment_size No 256 Any integer value Size of segments for MDXC architecture.
--mdxc_override_model_segment_size No False Opverride model default segment size instead of using the model default value.
--mdxc_overlap No 8 2 to 50 Amount of overlap between prediction windows for MDXC architecture.
--mdxc_batch_size No 1 Any integer value Batch size for MDXC architecture.
--mdxc_pitch_shift No 0 Any integer value Shift audio pitch by a number of semitones while processing for MDXC architecture.

MDX Architecture Parameters

Parameter Name Required Default Valid Options Description
--mdx_segment_size No 256 Any integer value Size of segments for MDX architecture.
--mdx_overlap No 0.25 0.001 to 0.999 Amount of overlap between prediction windows for MDX architecture.
--mdx_batch_size No 1 Any integer value Batch size for MDX architecture.
--mdx_hop_length No 1024 Any integer value Hop length for MDX architecture.
--mdx_enable_denoise No False Enable denoising during separation for MDX architecture.

Demucs Architecture Parameters

Parameter Name Required Default Valid Options Description
--demucs_segment_size No Default Any integer value Size of segments for Demucs architecture.
--demucs_shifts No 2 Any integer value Number of predictions with random shifts for Demucs architecture.
--demucs_overlap No 0.25 0.001 to 0.999 Overlap between prediction windows for Demucs architecture.
--demucs_segments_enabled No True Enable segment-wise processing for Demucs architecture.

VR Architecture Parameters

Parameter Name Required Default Valid Options Description
--vr_batch_size No 4 Any integer value Batch size for VR architecture.
--vr_window_size No 512 Any integer value Window size for VR architecture.
--vr_aggression No 5 -100 to 100 Intensity of primary stem extraction for VR architecture.
--vr_enable_tta No False Enable Test-Time-Augmentation for VR architecture.
--vr_high_end_process No False Mirror the missing frequency range of the output for VR architecture.
--vr_enable_post_process No False Identify leftover artifacts within vocal output for VR architecture.
--vr_post_process_threshold No 0.2 0.1 to 0.3 Threshold for post-process feature for VR architecture.

Additional Features

Model Extract

python rvc.py model_extract --pth_path "pth_path" --model_name "model_name" --sampling_rate "sampling_rate" --pitch_guidance "pitch_guidance" --rvc_version "rvc_version" --epoch "epoch" --step "step"
Parameter Name Required Default Valid Options Description
pth_path Yes None Path to the pth file Full path to the pth file
model_name Yes None Name of the model Name of the model
sampling_rate Yes None 32000, 40000, or 48000 Sampling rate of the audio data
pitch_guidance Yes None True or False By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
rvc_version Yes None v1 or v2 Version of the model
epoch Yes None 1 to 10000 Specifies the overall quantity of epochs for the model training process.
step Yes None 1 to โˆž Specifies the overall quantity of steps for the model training process.

Model Information

python rvc.py model_information --pth_path "pth_path"
Parameter Name Required Default Valid Options Description
pth_path Yes None Path to the pth file Full path to the pth file

Model Blender

python rvc.py model_blender --model_name "model_name" --pth_path_1 "pth_path_1" --pth_path_2 "pth_path_2" --ratio "ratio"
Parameter Name Required Default Valid Options Description
model_name Yes None Name of the model Name of the model
pth_path_1 Yes None Path to the first pth file Full path to the first pth file
pth_path_2 Yes None Path to the second pth file Full path to the second pth file
ratio No 0.5 0.0 to 1 Value for blender ratio

Launch TensorBoard

python rvc.py tensorboard

Download Models

Run the download script with the following command:

python rvc.py download --model_link "model_link"
Parameter Name Required Default Valid Options Description
model_link Yes None Link of the model (enclosed in double quotes; Google Drive or Hugging Face) Link of the model

Refer to python rvc.py download -h for additional help.

Audio Analyzer

python rvc.py audio_analyzer --input_path "input_path"
Parameter Name Required Default Valid Options Description
input_path Yes None Full path to the input audio file Full path to the input audio file

Refer to python rvc.py audio_analyzer -h for additional help.

Prerequisites Download

python rvc.py prerequisites --pretraineds_v1 "pretraineds_v1" --pretraineds_v2 "--pretraineds_v2" --models "models" --exe "exe"
Parameter Name Required Default Valid Options Description
pretraineds_v1 No True True or False Download pretrained models for v1
pretraineds_v2 No True True or False Download pretrained models for v2
models No True True or False Download models for v1 and v2
exe No True True or False Download the necessary executable files for the CLI to function properly (FFmpeg and FFprobe)

API

python rvc.py api --host "host" --port "port"
Parameter Name Required Default Valid Options Description
host No 127.0.0.1 Value for host IP Value for host IP
port No 8000 Value for port number Value for port number

To use the RVC CLI via the API, utilize the provided script. Make API requests to the following endpoints:

  • Docs: /docs
  • Ping: /ping
  • Infer: /infer
  • Batch Infer: /batch_infer
  • TTS: /tts
  • Preprocess: /preprocess
  • Extract: /extract
  • Train: /train
  • Index: /index
  • Model Information: /model_information
  • Model Fusion: /model_fusion
  • Download: /download

Make POST requests to these endpoints with the same required parameters as in CLI mode.

Credits

The RVC CLI builds upon the foundations of the following projects:

We acknowledge and appreciate the contributions of the respective authors and communities involved in these projects.

rvc_cli's People

Contributors

aitronssesin avatar blaisewf avatar dedgar avatar github-actions[bot] avatar lukaszliniewicz avatar poiqazwsx avatar vidalnt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

rvc_cli's Issues

feature request: enable GPU for inference

script works fine

feature requests
a) I use RVC and use GPU for inference, can you enable it in cli as well
b) can the temporary files be kept inside a folder say temp on projects, making it easier for housekeeping

thanks
Senthil

[BUG] Batch Conversion on Apple Silicon Mac

I've got Applio running on my M2 Max Mac Studio but Batch Conversion is not working. To get further information I cloned this git here and tried the CLI batch conversion, which also does not work. Single conversion works fine with CLI and Applio.

This is my single conversion cmd, which results in a working file:
python main.py infer --f0up_key "0" --filter_radius "3" --index_rate "0.8" --hop_length "64" --split_audio "True" --f0autotune "False" --f0method "rmvpe" --input_path "/Users/liam/Music/RVC/city_of_angels/hmmmh.wav" --output_path "/Users/liam/Downloads/test/test.wav" --pth_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super.pth" --index_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super_clean.index"

This is batch-conversion, which results in an error, no matter if rms_mix_rate and other parameters are included in the cmd or not:
python main.py batch_infer --f0up_key "0" --filter_radius "3" --index_rate "0.8" --hop_length "64" --split_audio "True" --f0autotune "False" --f0method "rmvpe" --input_folder "/Users/liam/Music/RVC/love_me_down/ValYoung" --output_folder "/Users/liam/Downloads/test" --pth_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super.pth" --index_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super_clean.index" --rms_mix_rate "0.0"

The conversion fails with the following error:

Inferring /Users/liam/Music/RVC/love_me_down/ValYoung/Ladada_1.wav.wav...
No supported Nvidia GPU found
Traceback (most recent call last):
  File "/Users/liam/Downloads/RVC_CLI/rvc/infer/infer.py", line 229, in <module>
    rms_mix_rate = float(sys.argv[12])
ValueError: could not convert string to float: 'True'

Seems like rms_mix_rate=True is sneaking in somewhere, and resulting in an error when converted to float. But where is it coming from? I removed all arguments that use True/False from the cmd, but it still ends up with this error.

Docker

Hi,
I am trying to make this work as a docker container too, but can't really get it to work...
Maybe there is already a Dockerfile out there?

If not, this is my current Dockerfile (currently I wanted to test inference first, so I copied my models in):


FROM nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04

# Create a working directory
WORKDIR /app

# Install dependenceis to add PPAs and git
RUN apt-get update && \
    apt-get install -y -qq ffmpeg aria2 && apt clean && \
    apt-get install -y software-properties-common && \
    apt-get install -y git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Add the deadsnakes PPA to get Python 3.9
RUN add-apt-repository ppa:deadsnakes/ppa

# Clone the repository
RUN git clone https://github.com/blaise-tk/RVC_CLI.git

# Set the working directory to the cloned repo
WORKDIR /app/RVC_CLI

# Install Python 3.9 and pip
RUN apt-get update && \
    apt-get install -y build-essential python-dev python3-dev python3.9-distutils python3.9-dev python3.9 curl && \
    apt-get clean && \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1 && \
    curl https://bootstrap.pypa.io/get-pip.py | python3.9

# Set Python 3.9 as the default
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.9 1

# Install Python dependencies
RUN chmod +x install.sh
RUN ./install.sh

# create init.py file in app/RVC_CLI/rvc folder
RUN touch /app/RVC_CLI/rvc/__init__.py

# Download prerequisites
RUN python rvc.py prerequisites --pretraineds_v1 True --pretraineds_v2 True --models True --exe True

# Copy the audio file into the container
COPY audio.mp3 /app/RVC_CLI/audio.mp3
COPY Jari.pth /app/RVC_CLI/Jari.pth
COPY Jari.index /app/RVC_CLI/Jari.index

# Set the entrypoint to keep the container running
CMD ["tail", "-f", "/dev/null"]

It builds, but when I run infer.py eg with this:
python rvc.py infer --f0up_key 0 --filter_radius 3 --index_rate 0.3 --hop_length 128 --rms_mix_rate 1.0 --protect 0.3 --f0autotune False --f0method rmvpe --input_path /app/RVC_CLI/audio.mp3 --output_path /output/audio_out.mp3 --pth_path /app/RVC_CLI/Jari.pth --index_path /app/RVC_CLI/Jari.index

I don't get an output file.

I am pretty new to Docker so maybe someone more experienced could figure a good setup out quite quickly :)
And I think this would make the repo even easier to use for the average user.

Inference fails on apple silicon

First of all, thank you, this is the first cli for rvc that actually works!! I've been trying all kinds of solutions. Below is a minor enhancement you could make.

The following error is experienced when inferencing on apple silicon:
The operator 'aten::_fft_r2c' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on pytorch/pytorch#77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Voice conversion failed: cannot unpack non-iterable NoneType object

Setting the mps fallback as mentioned works but could be handled in your code.

file and folder name are same, "rvc".

I noticed a potential conflict with the file "rvc.py" and the folder named "rvc". To avoid confusion or issues, could we please rename either the file or the folder?

[BUG] '<' not supported between instances of 'str' and 'float'

Describe the bug
A clear and concise description of what the bug is.
It looks like the value of args.protect isn't being converted properly to a float. I've tried converting the arg to a float directly but it produces errors related tot he tensor size in torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats. Fiddled with a bit but couldn't get anything worthwhile to come out.
.......................
To Reproduce
Steps to reproduce the behavior:
$ python3 main.py batch_infer --f0up_key 5 --filter_radius 4 --index_rate 0.9 --hop_length 128 --rms_mix_rate 1.0 --protect 0.4 --f0autotune True --f0method rmvpe --input_folder "/home/mb/Desktop/" --output_folder "/home/......................." --pth_path "/home/........................pth" --index_path "/home........................index" --export_format WAV

changing the value of protect doesn't seem to change the error.

Expected behavior
For the inference to happen correcty

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):

  • Linux Mint 21 runningg kernel 6.5.0-26-generic
  • Firefox?

Additional context
Add any other context about the problem here.

[BUG]

Bug Description
i ran the install.bat, now i am trying to run env/python.exe rvc.py or python rvc.py prerequisites, also one thing when i executed the bat file , a;; the dependencies installed in my whole local system

File "C:\Users\ESHAN\Desktop\rvctest\rvc.py", line 10, in
from rvc.configs.config import Config
File "C:\Users\ESHAN\Desktop\rvctest\rvc.py", line 10, in
from rvc.configs.config import Config
ModuleNotFoundError: No module named 'rvc.configs'; 'rvc' is not a package

Desktop Details:
-windows 11, nvdia gtx 1650

API

I'm having some issues with the API call (internal server error) - I'm assuming its the syntax of the JSON at this point, ive messed around a bit but keeps returning in error. Here is how the Json is syntaxed atm:
{
"f0up_key": 0,
"filter_radius": 5,
"index_rate": 0.5,
"hop_length": 256,
"f0method": "rmvpe",
"input_path": "D:\Projects\VoiceChangerAI\TestFile\testa.wav",
"output_path": "D:\Projects\VoiceChangerAI\TestFile\output.wav",
"pth_file": "LB.pth",
"index_path": "LB.index",
"split_audio": false,
}

have "LB.pth" and the index in the "RVC_CLI\models" folder currently?

Thanks for any help - I'm total narb with this stuff >_<

[BUG] RVC doesn't produce a usable result with default settings

Hey guys,

I was able to get the server started and configured to work, tts is working too after changing locale to shortname.
So im getting tts output but rvc output is just interference, like a continuous beep. also when performing inference, it works well on the original and the fork with the same model, do I need special type of model here?

tried both api and cli.

[BUG] API infer not work.

Code Invocation:

 curl --location 'http://127.0.0.1:8000/infer' \
--header 'Content-Type: application/json' \
--data '{
  "f0up_key": 0,
  "filter_radius": 2,
  "index_rate": 0.5,
  "hop_length": 256,
  "rms_mix_rate": 0.5,
  "protect": 0.5,
  "f0autotune": false,
  "f0method": "rmvpe",
  "input_path": "/home/.../RVC_CLI/input/018b3ee3-50a3-7b40-8b02-c99d3753a8a4.mp3",
  "output_path": "/home/.../RVC_CLI/output/1.wav",
  "pth_path": "/home/.../RVC_CLI/logs/Alisa/Alisa.pth",
  "index_path": "/home/.../RVC_CLI/logs/Alisa/added_IVF757_Flat_nprobe_1_Alisa_v2.index",
  "split_audio": false,
  "clean_audio": false,
  "clean_strength": 0.5,
  "export_format": "WAV"
}

Result:


{
    "output": "<All keys matched successfully>\nConversion completed. Output file: '/home/.../RVC_CLI/output/1.wav' in 4.14 seconds.\n",
    "error": "/home/.../anaconda3/envs/rvc_cli/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.\n  warnings.warn(\"torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.\")\n"
}

However, the file does not appear in the output folder.

Moreover, to make it work, I changed:


@app.post("/infer")
async def infer(request: Request):
    command = ["python", "main.py", "infer"]

    json_data = await request.json()

    command += [f"--{key}={value}" for key, value in json_data.items()]

    return execute_command(command)

Also, it would be good to format the output as follows:

When the status code is 200:

{"audio_content": "path/name.wav", "message": "..."}

For other status codes:

{"message": "...", "error": "..."}

[BUG] Training Threshold set to incorrect value when no value is set in command

Describe the bug
A clear and concise description of what the bug is.
Get the following error, whether overtrain_detector is set to true/false and whether overtrain_threshold is set to an integer value or not:
train.py: error: argument -ot/--overtraining_threshold: invalid int value: 'False'
To Reproduce
Steps to reproduce the behavior:
Run with the following options:
{python_path} main.py train --model_name {model_name} --sampling_rate 40000 --pitch_guidance True --gpu 1 --save_every_epoch 50 --save_only_latest True --overtraining_detector False

Expected behavior
A clear and concise description of what you expected to happen.
Training runs and completes

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):

  • OS: Windows 11

[BUG]

Can anyone help please:

I'm trying to generate a new pth and got follow error:

"FileNotFoundError: [Errno 2] No such file or directory: '/home/user/RVC_CLI/logs/mute/sliced_audios/mute40000.wav'"
Step To get a error:

  1. Preprocess Dataset (ok in this step)
    python3 rvc.py preprocess --model_name "johnnyc" --dataset_path "../models/sample/" --sampling_rate "40000"
  2. Extract Features (ok in this step)
    python3 rvc.py extract --model_name "johnnyc" --rvc_version "v2" --sampling_rate "40000"
  3. Start Training
    python3 rvc.py train --model_name "johnnyc" --rvc_version "v2" --save_every_epoch "3" --sampling_rate "40000"
    Got a follow error:
Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/RVC_CLI/rvc/train/train.py", line 215, in run
    train_dataset = TextAudioLoaderMultiNSFsid(hps.data)
  File "/home/user/RVC_CLI/rvc/train/data_utils.py", line 21, in __init__
    self._filter()
  File "/home/user/RVC_CLI/rvc/train/data_utils.py", line 29, in _filter
    lengths.append(os.path.getsize(audiopath) // (3 * self.hop_length))
  File "/usr/lib/python3.10/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/RVC_CLI/logs/mute/sliced_audios/mute40000.wav'
Saved index file '/home/user/RVC_CLI/logs/johnnyc/added_IVF94_Flat_nprobe_1_v2.index'

Training args

The voice is getting distorted when chunks are made resulting in robotic voice in output

IP and PORT change for API

Hello, I didnโ€™t find any other contacts, so Iโ€™ll write the problem here.

Question context:

I'm trying to write an application in C# (WinForms) using your solution. I am weak in programming, outside the zone of simple C# applications, but I was interested in the functionality of the RVC library. I want to try to establish interaction with the RVC by organizing a server and sending requests to it from clients from the global network. But I encountered a problem when the API server is launched on a local machine, where there is a suitable video card for work, but on the machine with which the application is being developed - there is no. And to establish communication with my โ€œserverโ€ where your solution will be deployed, I tried to change the server launch parameters to a local IP (192.168.x.x) and another port, but failed.

The main point of the question:

Can I somehow use launch parameters (for example the main.py api [-ip or -port] file) to change the parameters of the API server (uvicorn server)?
If not, is it possible to add such functionality to the "main.py api" command?

Stuff to improve GUI wise or ya get what i mean :3

So first things first, infer section looks alright. moving on to training u guys can def improve some stuff on there:

  1. Preprocess Dataset: i believe you guys should set the preset sample rate to 32k and not 40k
  2. Extract Features: eradicate v1, its outdated and uses a smaller hubert (i think) but regardless it lacks in dynamics much more than the v2 arch we have for rvc so yeah u can just remove the option of choosing versions and auto set it to v2.
    the rest looks fine for now, i really like it at least. One thing u can also add though is model fusion and all the other stuff the gui has like ckpt processing and so on and so forth. Other than that it works wonderfully :3

Error: 'config' when trying to infer from a model that I trained

Describe the bug
Inference from a model I trained isn't working and just printing out "Error: 'config'" instead

To Reproduce
I used the colab in the repo, with the change that I made the main directory on my Google Drive so I don't have to pull every time and so the models I make are automatically saved.

I trained a model, and maybe this is the problem, with v2 and 40k sample rate. I later saw in the configs that there is no v2-40k config.

I then tried the inference there, and it didn't work out, it just spat out "Error: 'config'"

I traced it to the vc pipeline where it loads the checkpoint and tries to access ckpt['config'].
Then outside of the code, I loaded the checkpoints saved from my training and they do not have a 'config' key.
So I looked at the code that saves the checkpoints, and there is no 'config' key there either.

I think I'm doing something wrong, but I'm not sure what.

  1. should all of the saved G's be usable? I think the most possible issue is that I'm using the wrong model file. I'm using the ones in the log/model_name directory. Usually RVC saves other weights in the weights directory, but there isn't one here.
  2. As mentioned above, I checked the training code where it saves, and there is no explicit 'config' in the saved pth when it's saving the epoch checkpoints. Are there different models being saved?
  3. Is V2 + 40k supported even though there is no config file for it?
  4. Could having the repo on Google drive cause issues? I know sometimes it causes Linux path issues.
  5. save_only_latest seems to not be a usable flag since the best model might not be the latest, and usually we need to go back and see the performance and pick the one that's best. How is this flag really used?

Expected behavior
Inference works

[BUG] '<' not supported between instances of 'str' and 'float' / No output file

Describe the bug
During infer an error/warning is reported, " '<' not supported between instances of 'str' and 'float' "
and no output file is written despite the output saying it is.

To Reproduce
Steps to reproduce the behavior:
Run the command

PS M:\LLMs\tts\RVC_CLI> .\env\python.exe main.py infer `
>> --index_path '.\rvcs\test0.index' `
>> --pth_path '.\rvcs\test0.pth' `
>> --input_path '.\output.wav' `
>> --output_path 'M:\LLMs\tts\RVC_CLI\output-rvc.wav'
<All keys matched successfully>
'<' not supported between instances of 'str' and 'float'
Conversion completed. Output file: 'M:\LLMs\tts\RVC_CLI\output-rvc.wav' in 2.22 seconds.
PS M:\LLMs\tts\RVC_CLI>

Expected behavior
An output file processed with the supplied pth/index

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):
Windows 11

Additional context
If I checkout tag 1.1.2 it all works

Slight quality issues

Hi,
Now it's a lot better than before. The parameters work well and the quality is better than before.

However, in some places, it makes the voice sound like an old grand mother struggling to speak.

Here's my command:

python main.py infer --f0up_key "2" --filter_radius 5 --index_rate "0.1" --hop_length "25" --f0method "dio" --input_path "input.wav" --output_path "output.wav" --pth_path "rvcfinalv4-harvest-1000epochs.pth" --index_path "rvcfinalv4-harvest-1000epochs.index" --split_audio "False" --f0autotune "False"

Let me know if we can do anything to improve the voice quality.

Once again, your work is great in the RVC commandline space. Yours is the best commandline tool for RVC, better than all the ones even released by the original RVC project. So, Thank you.

Training problem

I was trying to train a model using the OV2 pretraining model, and I came across a strange thing, for every 1 epoch only 1 step was generated

Example:
model_10_epoch_10_steps
model_100_epoch_100_steps

I am using the code on Kaggle, which uses conda with python 3.10

Sorry for the bad English, it's not my first language

[BUG] API won't start

Normal inference works, but when I try API, with or without host/port arguments, I get an error:

\RVC_CLI> ./env/python.exe main.py api
Error: [WinError 2] The system cannot find the file specified

[BUG?] It is not possible to use more than one GPU for training on Kaggle

Describe the bug
When i try to use 2 GPUs on Kaggle, this error occours:
[W socket.cpp:663] [c10d] The client socket has failed to connect to [localhost]:55292 (errno: 99 - Cannot assign requested address).

To Reproduce
Steps to reproduce the behavior:

  1. Simply put "0-1" in --gpu in training part on Kaggle

Expected behavior
Be able to use two GPUs for training.

Assets
image

Desktop (please complete the following information):

  • OS: Linux
  • Browser: chrome

Additional context
Yes, I was using the 2x T4, and was using the last commit ("fix multi gpu")

robotic sounding output

Static (Robotic) noise in generated output. I even tried upto 3500 epochs but no success using the commandline.

However, when I use the gui, it works. I am not sure what the issue is.

#!/bin/bash

# Define variables
MODEL_NAME="MyVoiceModel"
VOICE_DATA_PATH="voice/myvoice.wav"
SAMPLE_RATE=48000
RVC_VERSION="v2"
HOP_LENGTH=256
F0METHOD="rmvpe"
TOTAL_EPOCHS=1000
BATCH_SIZE=16
GPU=0 # Adjust if you have multiple GPUs
SAVE_EVERY_EPOCH=10
TEXT_TO_SYNTHESIZE="This is a sample text for voice conversion."

# Step 1: Preprocess Dataset
echo "Preprocessing Dataset..."
python main.py preprocess "$MODEL_NAME" "$VOICE_DATA_PATH" $SAMPLE_RATE

# Step 2: Extract Features
echo "Extracting Features..."
python main.py extract "$MODEL_NAME" $RVC_VERSION $F0METHOD $HOP_LENGTH $SAMPLE_RATE

# Step 3: Train the Model
echo "Training the Model..."
python main.py train "$MODEL_NAME" $RVC_VERSION $SAVE_EVERY_EPOCH False True $TOTAL_EPOCHS $SAMPLE_RATE $BATCH_SIZE $GPU True False False

# Step 4: Generate Index File
echo "Generating Index File..."
python main.py index "$MODEL_NAME" $RVC_VERSION

# Step 5: Voice Conversion Inference (Modify paths to the model and index files as needed)
echo "Performing Voice Conversion Inference..."
python main.py infer "$TEXT_TO_SYNTHESIZE" "$MODEL_NAME" 0 5 0.5 $HOP_LENGTH "$F0METHOD" "output_tts.wav" "output_rvc.wav" "path_to_trained_model/$MODEL_NAME.pth" "path_to_index_file/$MODEL_NAME.index"

echo "Voice Conversion Process Completed."

split_audio does not seem to work?

Setting to True or False always results on in the same output.wav? how can I get the voice swapped output combined with the original instrumental ? Thanks. This one has been the easiest to use tool so far.

Issue with Model Output Generating Noise

Issue with Model Output Generating Noise

The model is training successfully, but when attempting to process a file, the output consists only of squeaks and noise. This issue persists even when using alternative models; the resulting audio remains distorted. Interestingly, utilizing a model trained in a different version of RVC yields normal functioning.
Another instance of RVC works fine in this machine.

What could be the underlying cause? Various output file formats have been experimented with to no avail.

Using VDS

  • OS: Ubuntu 20.04
  • RTX 3060

API -

still having issues with the API. The main infer worked with the same input as the Json bellow. I tried messing around with the format a bit but no luck.

JSON:

{
"f0up_key": "2",
"filter_radius": "5",
"index_rate": "0.1",
"hop_length": "25",
"f0method": "dio",
"input_path": "D:\Projects\VoiceChangerAI\TestFile\testa.wav",
"output_path": "D:\Projects\VoiceChangerAI\TestFile\output_API.wav",
"pth_path": "C:\Users\KCLEE\Documents\GitHub\models\LenvalBrown.pth",
"index_path": "C:\Users\KCLEE\Documents\GitHub\models\LenvalBrown.index",
"split_audio": "false",
"f0autotune": "false"
}

the console spits out this:

Traceback (most recent call last):
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 953, in
main()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 947, in main
run_api_script()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 385, in run_api_script
subprocess.run(command)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 507, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1126, in communicate
self.wait()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1189, in wait
return self._wait(timeout=timeout)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1486, in _wait
result = _winapi.WaitForSingleObject(self._handle,

Client side I get a "error 400 - bad request"

Does not work?

No matter what I try for the hop length value with or without double quotes/single quotes, it wont work. It became very frustrating.

python main.py infer --f0up_key "0" --filter_radius "5" --index_rate "0.5" --hop_length "256" --f0method "dio" --input_path "input.wav" --output_path "output.wav" --pth_file "model.pth" --index_path "model.index" --split_audio "False" --f0autotune "False"

it was better before you changed the arguments. Atleast it worked and it was robotic. but now I am totally unable to use it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.