blaisewf / rvc_cli Goto Github PK

View Code? Open in Web Editor NEW

90.0 4.0 29.0 1.17 MB

🚀 RVC + UVR = A perfect set of tools for voice cloning, easily and free!

Home Page: https://rvc-cli.pages.dev/

License: Other

Python 99.77% Batchfile 0.22% Shell 0.01%

ai cli rvc vits voice-conversion deep-learning pytorch vc voice voiceconversion api

rvc_cli's Introduction

RVC_CLI: Retrieval-based Voice Conversion Command Line Interface

Installation
- Windows
- Linux
Getting Started
API
Credits

Installation

Ensure that you have the necessary Python packages installed by following these steps (Python 3.9 is recommended):

Windows

Execute the install.bat file to activate a Conda environment. Afterward, launch the application using env/python.exe rvc.py instead of the conventional python rvc.py command.

Linux

chmod +x install.sh
./install.sh

Getting Started

Download the necessary models and executables by running the following command:

python rvc.py prerequisites

More information about the prerequisites command here

For detailed information and command-line options, refer to the help command:

python rvc.py -h

This command provides a clear overview of the available modes and their corresponding parameters, facilitating effective utilization of the RVC CLI.

Inference

Single Inference

python rvc.py infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_path "input_path" --output_path "output_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"

Parameter Name	Required	Default	Valid Options	Description
`f0up_key`	No	0	-24 to +24	Set the pitch of the audio, the higher the value, thehigher the pitch.
`filter_radius`	No	3	0 to 10	If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
`index_rate`	No	0.3	0.0 to 1.0	Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
`hop_length`	No	128	1 to 512	Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
`rms_mix_rate`	No	1	0 to 1	Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
`protect`	No	0.33	0 to 0.5	Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
`f0autotune`	No	False	True or False	Apply a soft autotune to your inferences, recommended for singing conversions.
`f0method`	No	rmvpe	pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe]	Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
`input_path`	Yes	None	Full path to the input audio file	Full path to the input audio file
`output_path`	Yes	None	Full path to the output audio file	Full path to the output audio file
`pth_path`	Yes	None	Full path to the pth file	Full path to the pth file
`index_path`	Yes	None	Full index file path	Full index file path
`split_audio`	No	False	True or False	Split the audio into chunks for inference to obtain better results in some cases.
`clean_audio`	No	False	True or False	Clean your audio output using noise detection algorithms, recommended for speaking audios.
`clean_strength`	No	0.7	0.0 to 1.0	Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
`export_format`	No	WAV	WAV, MP3, FLAC, OGG, M4A	File audio format
`embedder_model`	No	hubert	hubert or contentvec	Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
`upscale_audio`	No	False	True or False	Upscale the audio to 48kHz for better results.

Refer to python rvc.py infer -h for additional help.

Batch Inference

python rvc.py batch_infer --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --input_folder_path "input_folder_path" --output_folder_path "output_folder_path" --pth_path "pth_path" --index_path "index_path" --split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"

Parameter Name	Required	Default	Valid Options	Description
`f0up_key`	No	0	-24 to +24	Set the pitch of the audio, the higher the value, thehigher the pitch.
`filter_radius`	No	3	0 to 10	If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
`index_rate`	No	0.3	0.0 to 1.0	Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
`hop_length`	No	128	1 to 512	Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
`rms_mix_rate`	No	1	0 to 1	Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
`protect`	No	0.33	0 to 0.5	Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
`f0autotune`	No	False	True or False	Apply a soft autotune to your inferences, recommended for singing conversions.
`f0method`	No	rmvpe	pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe]	Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
`input_folder_path`	Yes	None	Full path to the input audio folder (The folder may only contain audio files)	Full path to the input audio folder
`output_folder_path`	Yes	None	Full path to the output audio folder	Full path to the output audio folder
`pth_path`	Yes	None	Full path to the pth file	Full path to the pth file
`index_path`	Yes	None	Full path to the index file	Full path to the index file
`split_audio`	No	False	True or False	Split the audio into chunks for inference to obtain better results in some cases.
`clean_audio`	No	False	True or False	Clean your audio output using noise detection algorithms, recommended for speaking audios.
`clean_strength`	No	0.7	0.0 to 1.0	Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
`export_format`	No	WAV	WAV, MP3, FLAC, OGG, M4A	File audio format
`embedder_model`	No	hubert	hubert or contentvec	Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
`upscale_audio`	No	False	True or False	Upscale the audio to 48kHz for better results.

Refer to python rvc.py batch_infer -h for additional help.

TTS Inference

python rvc.py tts_infer --tts_text "tts_text" --tts_voice "tts_voice" --f0up_key "f0up_key" --filter_radius "filter_radius" --index_rate "index_rate" --hop_length "hop_length" --rms_mix_rate "rms_mix_rate" --protect "protect" --f0autotune "f0autotune" --f0method "f0method" --output_tts_path "output_tts_path" --output_rvc_path "output_rvc_path" --pth_path "pth_path" --index_path "index_path"--split_audio "split_audio" --clean_audio "clean_audio" --clean_strength "clean_strength" --export_format "export_format"

Parameter Name	Required	Default	Valid Options	Description
`tts_text`	Yes	None	Text for TTS synthesis	Text for TTS synthesis
`tts_voice`	Yes	None	Voice for TTS synthesis	Voice for TTS synthesis
`f0up_key`	No	0	-24 to +24	Set the pitch of the audio, the higher the value, thehigher the pitch.
`filter_radius`	No	3	0 to 10	If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
`index_rate`	No	0.3	0.0 to 1.0	Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
`hop_length`	No	128	1 to 512	Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
`rms_mix_rate`	No	1	0 to 1	Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
`protect`	No	0.33	0 to 0.5	Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
`f0autotune`	No	False	True or False	Apply a soft autotune to your inferences, recommended for singing conversions.
`f0method`	No	rmvpe	pm, harvest, dio, crepe, crepe-tiny, rmvpe, fcpe, hybrid[crepe+rmvpe], hybrid[crepe+fcpe], hybrid[rmvpe+fcpe], hybrid[crepe+rmvpe+fcpe]	Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
`output_tts_path`	Yes	None	Full path to the output TTS audio file	Full path to the output TTS audio file
`output_rvc_path`	Yes	None	Full path to the input RVC audio file	Full path to the input RVC audio file
`pth_path`	Yes	None	Full path to the pth file	Full path to the pth file
`index_path`	Yes	None	Full path to the index file	Full path to the index file
`split_audio`	No	False	True or False	Split the audio into chunks for inference to obtain better results in some cases.
`clean_audio`	No	False	True or False	Clean your audio output using noise detection algorithms, recommended for speaking audios.
`clean_strength`	No	0.7	0.0 to 1.0	Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
`export_format`	No	WAV	WAV, MP3, FLAC, OGG, M4A	File audio format
`embedder_model`	No	hubert	hubert or contentvec	Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
`upscale_audio`	No	False	True or False	Upscale the audio to 48kHz for better results.

Refer to python rvc.py tts_infer -h for additional help.

Training

Preprocess Dataset

python rvc.py preprocess --model_name "model_name" --dataset_path "dataset_path" --sampling_rate "sampling_rate"

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`dataset_path`	Yes	None	Full path to the dataset folder (The folder may only contain audio files)	Full path to the dataset folder
`sampling_rate`	Yes	None	32000, 40000, or 48000	Sampling rate of the audio data

Refer to python rvc.py preprocess -h for additional help.

Extract Features

python rvc.py extract --model_name "model_name" --rvc_version "rvc_version" --pitch_guidance "pitch_guidance" --hop_length "hop_length" --sampling_rate "sampling_rate"

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`rvc_version`	No	v2	v1 or v2	Version of the model
`pitch_guidance`	No	True	True or False	By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
`hop_length`	No	128	1 to 512	Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
`sampling_rate`	Yes	None	32000, 40000, or 48000	Sampling rate of the audio data
`embedder_model`	No	hubert	hubert or contentvec	Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.

Start Training

python rvc.py train --model_name "model_name" --rvc_version "rvc_version" --save_every_epoch "save_every_epoch" --save_only_latest "save_only_latest" --save_every_weights "save_every_weights" --total_epoch "total_epoch" --sampling_rate "sampling_rate" --batch_size "batch_size" --gpu "gpu" --pitch_guidance "pitch_guidance" --overtraining_detector "overtraining_detector" --overtraining_threshold "overtraining_threshold"  --sync_graph "sync_graph" --pretrained "pretrained" --custom_pretrained "custom_pretrained" [--g_pretrained "g_pretrained"] [--d_pretrained "d_pretrained"]

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`rvc_version`	No	v2	v1 or v2	Version of the model
`save_every_epoch`	Yes	None	1 to 50	Determine at how many epochs the model will saved at.
`save_only_latest`	No	False	True or False	Enabling this setting will result in the G and D files saving only their most recent versions, effectively conserving storage space.
`save_every_weights`	No	True	True or False	This setting enables you to save the weights of the model at the conclusion of each epoch.
`total_epoch`	No	1000	1 to 10000	Specifies the overall quantity of epochs for the model training process.
`sampling_rate`	Yes	None	32000, 40000, or 48000	Sampling rate of the audio data
`batch_size`	No	8	1 to 50	It's advisable to align it with the available VRAM of your GPU. A setting of 4 offers improved accuracy but slower processing, while 8 provides faster and standard results.
`gpu`	No	0	0 to ∞ separated by -	Specify the number of GPUs you wish to utilize for training by entering them separated by hyphens (-).
`pitch_guidance`	No	True	True or False	By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
`overtraining_detector`	No	False	True or False	Utilize the overtraining detector to prevent overfitting. This feature is particularly valuable for scenarios where the model is at risk of overfitting.
`overtraining_threshold`	No	50	1 to 100	Set the threshold for the overtraining detector. The lower the value, the more sensitive the detector will be.
`pretrained`	No	True	True or False	Utilize pretrained models when training your own. This approach reduces training duration and enhances overall quality.
`custom_pretrained`	No	False	True or False	Utilizing custom pretrained models can lead to superior results, as selecting the most suitable pretrained models tailored to the specific use case can significantly enhance performance.
`g_pretrained`	No	None	Full path to pretrained file G, only if you have used custom_pretrained	Full path to pretrained file G
`d_pretrained`	No	None	Full path to pretrained file D, only if you have used custom_pretrained	Full path to pretrained file D
`sync_graph`	No	False	True or False	Synchronize the graph of the tensorbaord. Only enable this setting if you are training a new model.

Refer to python rvc.py train -h for additional help.

Generate Index File

python rvc.py index --model_name "model_name" --rvc_version "rvc_version"

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`rvc_version`	Yes	None	v1 or v2	Version of the model

Refer to python rvc.py index -h for additional help.

UVR

python uvr.py [audio_file] [options]

Info and Debugging

Parameter Name	Required	Default	Valid Options	Description
`audio_file`	Yes	None	Any valid audio file path	The path to the audio file you want to separate, in any common format.
`-d`, `--debug`	No	False		Enable debug logging.
`-e`, `--env_info`	No	False		Print environment information and exit.
`-l`, `--list_models`	No	False		List all supported models and exit.
`--log_level`	No	info	info, debug, warning	Log level.

Separation I/O Params

Parameter Name	Required	Default	Valid Options	Description
`-m`, `--model_filename`	No	UVR-MDX-NET-Inst_HQ_3.onnx	Any valid model file path	Model to use for separation.
`--output_format`	No	WAV	Any common audio format	Output format for separated files.
`--output_dir`	No	None	Any valid directory path	Directory to write output files.
`--model_file_dir`	No	/tmp/audio-separator-models/	Any valid directory path	Model files directory.

Common Separation Parameters

Parameter Name	Required	Default	Valid Options	Description
`--invert_spect`	No	False		Invert secondary stem using spectrogram.
`--normalization`	No	0.9	Any float value	Max peak amplitude to normalize input and output audio to.
`--single_stem`	No	None	Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other	Output only a single stem.
`--sample_rate`	No	44100	Any integer value	Modify the sample rate of the output audio.

MDXC Architecture Parameters

Parameter Name	Required	Default	Valid Options	Description
`--mdxc_segment_size`	No	256	Any integer value	Size of segments for MDXC architecture.
`--mdxc_override_model_segment_size`	No	False		Opverride model default segment size instead of using the model default value.
`--mdxc_overlap`	No	8	2 to 50	Amount of overlap between prediction windows for MDXC architecture.
`--mdxc_batch_size`	No	1	Any integer value	Batch size for MDXC architecture.
`--mdxc_pitch_shift`	No	0	Any integer value	Shift audio pitch by a number of semitones while processing for MDXC architecture.

MDX Architecture Parameters

Parameter Name	Required	Default	Valid Options	Description
`--mdx_segment_size`	No	256	Any integer value	Size of segments for MDX architecture.
`--mdx_overlap`	No	0.25	0.001 to 0.999	Amount of overlap between prediction windows for MDX architecture.
`--mdx_batch_size`	No	1	Any integer value	Batch size for MDX architecture.
`--mdx_hop_length`	No	1024	Any integer value	Hop length for MDX architecture.
`--mdx_enable_denoise`	No	False		Enable denoising during separation for MDX architecture.

Demucs Architecture Parameters

Parameter Name	Required	Default	Valid Options	Description
`--demucs_segment_size`	No	Default	Any integer value	Size of segments for Demucs architecture.
`--demucs_shifts`	No	2	Any integer value	Number of predictions with random shifts for Demucs architecture.
`--demucs_overlap`	No	0.25	0.001 to 0.999	Overlap between prediction windows for Demucs architecture.
`--demucs_segments_enabled`	No	True		Enable segment-wise processing for Demucs architecture.

VR Architecture Parameters

Parameter Name	Required	Default	Valid Options	Description
`--vr_batch_size`	No	4	Any integer value	Batch size for VR architecture.
`--vr_window_size`	No	512	Any integer value	Window size for VR architecture.
`--vr_aggression`	No	5	-100 to 100	Intensity of primary stem extraction for VR architecture.
`--vr_enable_tta`	No	False		Enable Test-Time-Augmentation for VR architecture.
`--vr_high_end_process`	No	False		Mirror the missing frequency range of the output for VR architecture.
`--vr_enable_post_process`	No	False		Identify leftover artifacts within vocal output for VR architecture.
`--vr_post_process_threshold`	No	0.2	0.1 to 0.3	Threshold for post-process feature for VR architecture.

Additional Features

Model Extract

python rvc.py model_extract --pth_path "pth_path" --model_name "model_name" --sampling_rate "sampling_rate" --pitch_guidance "pitch_guidance" --rvc_version "rvc_version" --epoch "epoch" --step "step"

Parameter Name	Required	Default	Valid Options	Description
`pth_path`	Yes	None	Path to the pth file	Full path to the pth file
`model_name`	Yes	None	Name of the model	Name of the model
`sampling_rate`	Yes	None	32000, 40000, or 48000	Sampling rate of the audio data
`pitch_guidance`	Yes	None	True or False	By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
`rvc_version`	Yes	None	v1 or v2	Version of the model
`epoch`	Yes	None	1 to 10000	Specifies the overall quantity of epochs for the model training process.
`step`	Yes	None	1 to ∞	Specifies the overall quantity of steps for the model training process.

Model Information

python rvc.py model_information --pth_path "pth_path"

Parameter Name	Required	Default	Valid Options	Description
`pth_path`	Yes	None	Path to the pth file	Full path to the pth file

Model Blender

python rvc.py model_blender --model_name "model_name" --pth_path_1 "pth_path_1" --pth_path_2 "pth_path_2" --ratio "ratio"

Parameter Name	Required	Default	Valid Options	Description
`model_name`	Yes	None	Name of the model	Name of the model
`pth_path_1`	Yes	None	Path to the first pth file	Full path to the first pth file
`pth_path_2`	Yes	None	Path to the second pth file	Full path to the second pth file
`ratio`	No	0.5	0.0 to 1	Value for blender ratio

Launch TensorBoard

python rvc.py tensorboard

Download Models

Run the download script with the following command:

python rvc.py download --model_link "model_link"

Parameter Name	Required	Default	Valid Options	Description
`model_link`	Yes	None	Link of the model (enclosed in double quotes; Google Drive or Hugging Face)	Link of the model

Refer to python rvc.py download -h for additional help.

Audio Analyzer

python rvc.py audio_analyzer --input_path "input_path"

Parameter Name	Required	Default	Valid Options	Description
`input_path`	Yes	None	Full path to the input audio file	Full path to the input audio file

Refer to python rvc.py audio_analyzer -h for additional help.

Prerequisites Download

python rvc.py prerequisites --pretraineds_v1 "pretraineds_v1" --pretraineds_v2 "--pretraineds_v2" --models "models" --exe "exe"

Parameter Name	Required	Default	Valid Options	Description
`pretraineds_v1`	No	True	True or False	Download pretrained models for v1
`pretraineds_v2`	No	True	True or False	Download pretrained models for v2
`models`	No	True	True or False	Download models for v1 and v2
`exe`	No	True	True or False	Download the necessary executable files for the CLI to function properly (FFmpeg and FFprobe)

API

python rvc.py api --host "host" --port "port"

Parameter Name	Required	Default	Valid Options	Description
`host`	No	127.0.0.1	Value for host IP	Value for host IP
`port`	No	8000	Value for port number	Value for port number

To use the RVC CLI via the API, utilize the provided script. Make API requests to the following endpoints:

Docs: /docs
Ping: /ping
Infer: /infer
Batch Infer: /batch_infer
TTS: /tts
Preprocess: /preprocess
Extract: /extract
Train: /train
Index: /index
Model Information: /model_information
Model Fusion: /model_fusion
Download: /download

Make POST requests to these endpoints with the same required parameters as in CLI mode.

Credits

The RVC CLI builds upon the foundations of the following projects:

ContentVec by auspicious3000
HIFIGAN by jik876
audio-slicer by openvpi
python-audio-separator by karaokenerds
RMVPE by Dream-High
FCPE by CNChTu
VITS by jaywalnut310
So-Vits-SVC by svc-develop-team
Harmonify by Eempostor
Retrieval-based-Voice-Conversion-WebUI by RVC-Project
Mangio-RVC-Fork by Mangio621
anyf0 by SoulMelody

We acknowledge and appreciate the contributions of the respective authors and communities involved in these projects.

rvc_cli's People

Contributors

Stargazers

Watchers

rvc_cli's Issues

[BUG] Documentation should say env/python.exe for Windows

env/python won't work on Windows after the install script. It may be confusing for some people.

feature request: enable GPU for inference

script works fine

feature requests
a) I use RVC and use GPU for inference, can you enable it in cli as well
b) can the temporary files be kept inside a folder say temp on projects, making it easier for housekeeping

thanks
Senthil

[BUG] Batch Conversion on Apple Silicon Mac

I've got Applio running on my M2 Max Mac Studio but Batch Conversion is not working. To get further information I cloned this git here and tried the CLI batch conversion, which also does not work. Single conversion works fine with CLI and Applio.

This is my single conversion cmd, which results in a working file:
python main.py infer --f0up_key "0" --filter_radius "3" --index_rate "0.8" --hop_length "64" --split_audio "True" --f0autotune "False" --f0method "rmvpe" --input_path "/Users/liam/Music/RVC/city_of_angels/hmmmh.wav" --output_path "/Users/liam/Downloads/test/test.wav" --pth_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super.pth" --index_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super_clean.index"

This is batch-conversion, which results in an error, no matter if rms_mix_rate and other parameters are included in the cmd or not:
python main.py batch_infer --f0up_key "0" --filter_radius "3" --index_rate "0.8" --hop_length "64" --split_audio "True" --f0autotune "False" --f0method "rmvpe" --input_folder "/Users/liam/Music/RVC/love_me_down/ValYoung" --output_folder "/Users/liam/Downloads/test" --pth_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super.pth" --index_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super_clean.index" --rms_mix_rate "0.0"

The conversion fails with the following error:

Inferring /Users/liam/Music/RVC/love_me_down/ValYoung/Ladada_1.wav.wav...
No supported Nvidia GPU found
Traceback (most recent call last):
  File "/Users/liam/Downloads/RVC_CLI/rvc/infer/infer.py", line 229, in <module>
    rms_mix_rate = float(sys.argv[12])
ValueError: could not convert string to float: 'True'

Seems like rms_mix_rate=True is sneaking in somewhere, and resulting in an error when converted to float. But where is it coming from? I removed all arguments that use True/False from the cmd, but it still ends up with this error.

Docker

Hi,
I am trying to make this work as a docker container too, but can't really get it to work...
Maybe there is already a Dockerfile out there?

If not, this is my current Dockerfile (currently I wanted to test inference first, so I copied my models in):


FROM nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04

# Create a working directory
WORKDIR /app

# Install dependenceis to add PPAs and git
RUN apt-get update && \
    apt-get install -y -qq ffmpeg aria2 && apt clean && \
    apt-get install -y software-properties-common && \
    apt-get install -y git && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Add the deadsnakes PPA to get Python 3.9
RUN add-apt-repository ppa:deadsnakes/ppa

# Clone the repository
RUN git clone https://github.com/blaise-tk/RVC_CLI.git

# Set the working directory to the cloned repo
WORKDIR /app/RVC_CLI

# Install Python 3.9 and pip
RUN apt-get update && \
    apt-get install -y build-essential python-dev python3-dev python3.9-distutils python3.9-dev python3.9 curl && \
    apt-get clean && \
    update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1 && \
    curl https://bootstrap.pypa.io/get-pip.py | python3.9

# Set Python 3.9 as the default
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.9 1

# Install Python dependencies
RUN chmod +x install.sh
RUN ./install.sh

# create init.py file in app/RVC_CLI/rvc folder
RUN touch /app/RVC_CLI/rvc/__init__.py

# Download prerequisites
RUN python rvc.py prerequisites --pretraineds_v1 True --pretraineds_v2 True --models True --exe True

# Copy the audio file into the container
COPY audio.mp3 /app/RVC_CLI/audio.mp3
COPY Jari.pth /app/RVC_CLI/Jari.pth
COPY Jari.index /app/RVC_CLI/Jari.index

# Set the entrypoint to keep the container running
CMD ["tail", "-f", "/dev/null"]

It builds, but when I run infer.py eg with this:
python rvc.py infer --f0up_key 0 --filter_radius 3 --index_rate 0.3 --hop_length 128 --rms_mix_rate 1.0 --protect 0.3 --f0autotune False --f0method rmvpe --input_path /app/RVC_CLI/audio.mp3 --output_path /output/audio_out.mp3 --pth_path /app/RVC_CLI/Jari.pth --index_path /app/RVC_CLI/Jari.index

I don't get an output file.

I am pretty new to Docker so maybe someone more experienced could figure a good setup out quite quickly :)
And I think this would make the repo even easier to use for the average user.

Inference fails on apple silicon

First of all, thank you, this is the first cli for rvc that actually works!! I've been trying all kinds of solutions. Below is a minor enhancement you could make.

The following error is experienced when inferencing on apple silicon:
The operator 'aten::_fft_r2c' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on pytorch/pytorch#77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Voice conversion failed: cannot unpack non-iterable NoneType object

Setting the mps fallback as mentioned works but could be handled in your code.

file and folder name are same, "rvc".

I noticed a potential conflict with the file "rvc.py" and the folder named "rvc". To avoid confusion or issues, could we please rename either the file or the folder?

How can I join the output song with the instrumentals of the input?

Hi,

I am generating songs using the inference code. The problem is that the output just contains the vocals and I would like to join the instrumentals too.
Could you give me a hand on this?

Thanks!

[BUG] '<' not supported between instances of 'str' and 'float'

Describe the bug
A clear and concise description of what the bug is.
It looks like the value of args.protect isn't being converted properly to a float. I've tried converting the arg to a float directly but it produces errors related tot he tensor size in torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats. Fiddled with a bit but couldn't get anything worthwhile to come out.
.......................
To Reproduce
Steps to reproduce the behavior:
$ python3 main.py batch_infer --f0up_key 5 --filter_radius 4 --index_rate 0.9 --hop_length 128 --rms_mix_rate 1.0 --protect 0.4 --f0autotune True --f0method rmvpe --input_folder "/home/mb/Desktop/" --output_folder "/home/......................." --pth_path "/home/........................pth" --index_path "/home........................index" --export_format WAV

changing the value of protect doesn't seem to change the error.

Expected behavior
For the inference to happen correcty

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):

Linux Mint 21 runningg kernel 6.5.0-26-generic
Firefox?

Additional context
Add any other context about the problem here.

[BUG]

Bug Description
i ran the install.bat, now i am trying to run env/python.exe rvc.py or python rvc.py prerequisites, also one thing when i executed the bat file , a;; the dependencies installed in my whole local system

File "C:\Users\ESHAN\Desktop\rvctest\rvc.py", line 10, in
from rvc.configs.config import Config
File "C:\Users\ESHAN\Desktop\rvctest\rvc.py", line 10, in
from rvc.configs.config import Config
ModuleNotFoundError: No module named 'rvc.configs'; 'rvc' is not a package

Desktop Details:
-windows 11, nvdia gtx 1650

API

I'm having some issues with the API call (internal server error) - I'm assuming its the syntax of the JSON at this point, ive messed around a bit but keeps returning in error. Here is how the Json is syntaxed atm:
{
"f0up_key": 0,
"filter_radius": 5,
"index_rate": 0.5,
"hop_length": 256,
"f0method": "rmvpe",
"input_path": "D:\Projects\VoiceChangerAI\TestFile\testa.wav",
"output_path": "D:\Projects\VoiceChangerAI\TestFile\output.wav",
"pth_file": "LB.pth",
"index_path": "LB.index",
"split_audio": false,
}

have "LB.pth" and the index in the "RVC_CLI\models" folder currently?

Thanks for any help - I'm total narb with this stuff >_<

[BUG] RVC doesn't produce a usable result with default settings

Hey guys,

I was able to get the server started and configured to work, tts is working too after changing locale to shortname.
So im getting tts output but rvc output is just interference, like a continuous beep. also when performing inference, it works well on the original and the fork with the same model, do I need special type of model here?

tried both api and cli.

[BUG] API infer not work.

Code Invocation:

 curl --location 'http://127.0.0.1:8000/infer' \
--header 'Content-Type: application/json' \
--data '{
  "f0up_key": 0,
  "filter_radius": 2,
  "index_rate": 0.5,
  "hop_length": 256,
  "rms_mix_rate": 0.5,
  "protect": 0.5,
  "f0autotune": false,
  "f0method": "rmvpe",
  "input_path": "/home/.../RVC_CLI/input/018b3ee3-50a3-7b40-8b02-c99d3753a8a4.mp3",
  "output_path": "/home/.../RVC_CLI/output/1.wav",
  "pth_path": "/home/.../RVC_CLI/logs/Alisa/Alisa.pth",
  "index_path": "/home/.../RVC_CLI/logs/Alisa/added_IVF757_Flat_nprobe_1_Alisa_v2.index",
  "split_audio": false,
  "clean_audio": false,
  "clean_strength": 0.5,
  "export_format": "WAV"
}

Result:


{
    "output": "<All keys matched successfully>\nConversion completed. Output file: '/home/.../RVC_CLI/output/1.wav' in 4.14 seconds.\n",
    "error": "/home/.../anaconda3/envs/rvc_cli/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.\n  warnings.warn(\"torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.\")\n"
}

However, the file does not appear in the output folder.

Moreover, to make it work, I changed:


@app.post("/infer")
async def infer(request: Request):
    command = ["python", "main.py", "infer"]

    json_data = await request.json()

    command += [f"--{key}={value}" for key, value in json_data.items()]

    return execute_command(command)

Also, it would be good to format the output as follows:

When the status code is 200:

{"audio_content": "path/name.wav", "message": "..."}

For other status codes:

{"message": "...", "error": "..."}

[BUG] Training Threshold set to incorrect value when no value is set in command

Describe the bug
A clear and concise description of what the bug is.
Get the following error, whether overtrain_detector is set to true/false and whether overtrain_threshold is set to an integer value or not:
train.py: error: argument -ot/--overtraining_threshold: invalid int value: 'False'
To Reproduce
Steps to reproduce the behavior:
Run with the following options:
{python_path} main.py train --model_name {model_name} --sampling_rate 40000 --pitch_guidance True --gpu 1 --save_every_epoch 50 --save_only_latest True --overtraining_detector False

Expected behavior
A clear and concise description of what you expected to happen.
Training runs and completes

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):

OS: Windows 11

[BUG]

Can anyone help please:

I'm trying to generate a new pth and got follow error:

"FileNotFoundError: [Errno 2] No such file or directory: '/home/user/RVC_CLI/logs/mute/sliced_audios/mute40000.wav'"
Step To get a error:

Preprocess Dataset (ok in this step)
python3 rvc.py preprocess --model_name "johnnyc" --dataset_path "../models/sample/" --sampling_rate "40000"
Extract Features (ok in this step)
python3 rvc.py extract --model_name "johnnyc" --rvc_version "v2" --sampling_rate "40000"
Start Training
python3 rvc.py train --model_name "johnnyc" --rvc_version "v2" --save_every_epoch "3" --sampling_rate "40000"
Got a follow error:

Process Process-1:
Traceback (most recent call last):
  File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "/home/user/RVC_CLI/rvc/train/train.py", line 215, in run
    train_dataset = TextAudioLoaderMultiNSFsid(hps.data)
  File "/home/user/RVC_CLI/rvc/train/data_utils.py", line 21, in __init__
    self._filter()
  File "/home/user/RVC_CLI/rvc/train/data_utils.py", line 29, in _filter
    lengths.append(os.path.getsize(audiopath) // (3 * self.hop_length))
  File "/usr/lib/python3.10/genericpath.py", line 50, in getsize
    return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/RVC_CLI/logs/mute/sliced_audios/mute40000.wav'
Saved index file '/home/user/RVC_CLI/logs/johnnyc/added_IVF94_Flat_nprobe_1_v2.index'

Training args

The voice is getting distorted when chunks are made resulting in robotic voice in output

IP and PORT change for API

Hello, I didn’t find any other contacts, so I’ll write the problem here.

Question context:

I'm trying to write an application in C# (WinForms) using your solution. I am weak in programming, outside the zone of simple C# applications, but I was interested in the functionality of the RVC library. I want to try to establish interaction with the RVC by organizing a server and sending requests to it from clients from the global network. But I encountered a problem when the API server is launched on a local machine, where there is a suitable video card for work, but on the machine with which the application is being developed - there is no. And to establish communication with my “server” where your solution will be deployed, I tried to change the server launch parameters to a local IP (192.168.x.x) and another port, but failed.

The main point of the question:

Can I somehow use launch parameters (for example the main.py api [-ip or -port] file) to change the parameters of the API server (uvicorn server)?
If not, is it possible to add such functionality to the "main.py api" command?

Stuff to improve GUI wise or ya get what i mean :3

So first things first, infer section looks alright. moving on to training u guys can def improve some stuff on there:

Preprocess Dataset: i believe you guys should set the preset sample rate to 32k and not 40k
Extract Features: eradicate v1, its outdated and uses a smaller hubert (i think) but regardless it lacks in dynamics much more than the v2 arch we have for rvc so yeah u can just remove the option of choosing versions and auto set it to v2.
the rest looks fine for now, i really like it at least. One thing u can also add though is model fusion and all the other stuff the gui has like ckpt processing and so on and so forth. Other than that it works wonderfully :3

Error: 'config' when trying to infer from a model that I trained

Describe the bug
Inference from a model I trained isn't working and just printing out "Error: 'config'" instead

To Reproduce
I used the colab in the repo, with the change that I made the main directory on my Google Drive so I don't have to pull every time and so the models I make are automatically saved.

I trained a model, and maybe this is the problem, with v2 and 40k sample rate. I later saw in the configs that there is no v2-40k config.

I then tried the inference there, and it didn't work out, it just spat out "Error: 'config'"

I traced it to the vc pipeline where it loads the checkpoint and tries to access ckpt['config'].
Then outside of the code, I loaded the checkpoints saved from my training and they do not have a 'config' key.
So I looked at the code that saves the checkpoints, and there is no 'config' key there either.

I think I'm doing something wrong, but I'm not sure what.

should all of the saved G's be usable? I think the most possible issue is that I'm using the wrong model file. I'm using the ones in the log/model_name directory. Usually RVC saves other weights in the weights directory, but there isn't one here.
As mentioned above, I checked the training code where it saves, and there is no explicit 'config' in the saved pth when it's saving the epoch checkpoints. Are there different models being saved?
Is V2 + 40k supported even though there is no config file for it?
Could having the repo on Google drive cause issues? I know sometimes it causes Linux path issues.
save_only_latest seems to not be a usable flag since the best model might not be the latest, and usually we need to go back and see the performance and pick the one that's best. How is this flag really used?

Expected behavior
Inference works

[BUG] '<' not supported between instances of 'str' and 'float' / No output file

Describe the bug
During infer an error/warning is reported, " '<' not supported between instances of 'str' and 'float' "
and no output file is written despite the output saying it is.

To Reproduce
Steps to reproduce the behavior:
Run the command

PS M:\LLMs\tts\RVC_CLI> .\env\python.exe main.py infer `
>> --index_path '.\rvcs\test0.index' `
>> --pth_path '.\rvcs\test0.pth' `
>> --input_path '.\output.wav' `
>> --output_path 'M:\LLMs\tts\RVC_CLI\output-rvc.wav'
<All keys matched successfully>
'<' not supported between instances of 'str' and 'float'
Conversion completed. Output file: 'M:\LLMs\tts\RVC_CLI\output-rvc.wav' in 2.22 seconds.
PS M:\LLMs\tts\RVC_CLI>

Expected behavior
An output file processed with the supplied pth/index

Assets
If applicable, add screenshots/videos to help explain your problem.

Desktop (please complete the following information):
Windows 11

Additional context
If I checkout tag 1.1.2 it all works

Slight quality issues

Hi,
Now it's a lot better than before. The parameters work well and the quality is better than before.

However, in some places, it makes the voice sound like an old grand mother struggling to speak.

Here's my command:

python main.py infer --f0up_key "2" --filter_radius 5 --index_rate "0.1" --hop_length "25" --f0method "dio" --input_path "input.wav" --output_path "output.wav" --pth_path "rvcfinalv4-harvest-1000epochs.pth" --index_path "rvcfinalv4-harvest-1000epochs.index" --split_audio "False" --f0autotune "False"

Let me know if we can do anything to improve the voice quality.

Once again, your work is great in the RVC commandline space. Yours is the best commandline tool for RVC, better than all the ones even released by the original RVC project. So, Thank you.

Training problem

I was trying to train a model using the OV2 pretraining model, and I came across a strange thing, for every 1 epoch only 1 step was generated

Example:
model_10_epoch_10_steps
model_100_epoch_100_steps

I am using the code on Kaggle, which uses conda with python 3.10

Sorry for the bad English, it's not my first language

[BUG] API won't start

Normal inference works, but when I try API, with or without host/port arguments, I get an error:

\RVC_CLI> ./env/python.exe main.py api
Error: [WinError 2] The system cannot find the file specified

[BUG?] It is not possible to use more than one GPU for training on Kaggle

Describe the bug
When i try to use 2 GPUs on Kaggle, this error occours:
[W socket.cpp:663] [c10d] The client socket has failed to connect to [localhost]:55292 (errno: 99 - Cannot assign requested address).

To Reproduce
Steps to reproduce the behavior:

Simply put "0-1" in --gpu in training part on Kaggle

Expected behavior
Be able to use two GPUs for training.

Assets

Desktop (please complete the following information):

OS: Linux
Browser: chrome

Additional context
Yes, I was using the 2x T4, and was using the last commit ("fix multi gpu")

robotic sounding output

Static (Robotic) noise in generated output. I even tried upto 3500 epochs but no success using the commandline.

However, when I use the gui, it works. I am not sure what the issue is.

#!/bin/bash

# Define variables
MODEL_NAME="MyVoiceModel"
VOICE_DATA_PATH="voice/myvoice.wav"
SAMPLE_RATE=48000
RVC_VERSION="v2"
HOP_LENGTH=256
F0METHOD="rmvpe"
TOTAL_EPOCHS=1000
BATCH_SIZE=16
GPU=0 # Adjust if you have multiple GPUs
SAVE_EVERY_EPOCH=10
TEXT_TO_SYNTHESIZE="This is a sample text for voice conversion."

# Step 1: Preprocess Dataset
echo "Preprocessing Dataset..."
python main.py preprocess "$MODEL_NAME" "$VOICE_DATA_PATH" $SAMPLE_RATE

# Step 2: Extract Features
echo "Extracting Features..."
python main.py extract "$MODEL_NAME" $RVC_VERSION $F0METHOD $HOP_LENGTH $SAMPLE_RATE

# Step 3: Train the Model
echo "Training the Model..."
python main.py train "$MODEL_NAME" $RVC_VERSION $SAVE_EVERY_EPOCH False True $TOTAL_EPOCHS $SAMPLE_RATE $BATCH_SIZE $GPU True False False

# Step 4: Generate Index File
echo "Generating Index File..."
python main.py index "$MODEL_NAME" $RVC_VERSION

# Step 5: Voice Conversion Inference (Modify paths to the model and index files as needed)
echo "Performing Voice Conversion Inference..."
python main.py infer "$TEXT_TO_SYNTHESIZE" "$MODEL_NAME" 0 5 0.5 $HOP_LENGTH "$F0METHOD" "output_tts.wav" "output_rvc.wav" "path_to_trained_model/$MODEL_NAME.pth" "path_to_index_file/$MODEL_NAME.index"

echo "Voice Conversion Process Completed."

split_audio does not seem to work?

Setting to True or False always results on in the same output.wav? how can I get the voice swapped output combined with the original instrumental ? Thanks. This one has been the easiest to use tool so far.

Issue with Model Output Generating Noise

The model is training successfully, but when attempting to process a file, the output consists only of squeaks and noise. This issue persists even when using alternative models; the resulting audio remains distorted. Interestingly, utilizing a model trained in a different version of RVC yields normal functioning.
Another instance of RVC works fine in this machine.

What could be the underlying cause? Various output file formats have been experimented with to no avail.

Using VDS

OS: Ubuntu 20.04
RTX 3060

API -

still having issues with the API. The main infer worked with the same input as the Json bellow. I tried messing around with the format a bit but no luck.

JSON:

{
"f0up_key": "2",
"filter_radius": "5",
"index_rate": "0.1",
"hop_length": "25",
"f0method": "dio",
"input_path": "D:\Projects\VoiceChangerAI\TestFile\testa.wav",
"output_path": "D:\Projects\VoiceChangerAI\TestFile\output_API.wav",
"pth_path": "C:\Users\KCLEE\Documents\GitHub\models\LenvalBrown.pth",
"index_path": "C:\Users\KCLEE\Documents\GitHub\models\LenvalBrown.index",
"split_audio": "false",
"f0autotune": "false"
}

the console spits out this:

Traceback (most recent call last):
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 953, in
main()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 947, in main
run_api_script()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 385, in run_api_script
subprocess.run(command)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 507, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1126, in communicate
self.wait()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1189, in wait
return self._wait(timeout=timeout)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1486, in _wait
result = _winapi.WaitForSingleObject(self._handle,

Client side I get a "error 400 - bad request"

Does not work?

No matter what I try for the hop length value with or without double quotes/single quotes, it wont work. It became very frustrating.

python main.py infer --f0up_key "0" --filter_radius "5" --index_rate "0.5" --hop_length "256" --f0method "dio" --input_path "input.wav" --output_path "output.wav" --pth_file "model.pth" --index_path "model.index" --split_audio "False" --f0autotune "False"

it was better before you changed the arguments. Atleast it worked and it was robotic. but now I am totally unable to use it.

blaisewf / rvc_cli Goto Github PK

rvc_cli's Introduction

RVC_CLI: Retrieval-based Voice Conversion Command Line Interface

Table of Contents

Installation

Windows

Linux

Getting Started

Inference

Single Inference

Batch Inference

TTS Inference

Training

Preprocess Dataset

Extract Features

Start Training

Generate Index File

UVR

Info and Debugging

Separation I/O Params

Common Separation Parameters

MDXC Architecture Parameters

MDX Architecture Parameters

Demucs Architecture Parameters

VR Architecture Parameters

Additional Features

Model Extract

Model Information

Model Blender

Launch TensorBoard

Download Models

Audio Analyzer

Prerequisites Download

API

Credits

rvc_cli's People

Contributors

Stargazers

Watchers

Forkers

rvc_cli's Issues

Question context:

The main point of the question:

Recommend Projects

Recommend Topics

Recommend Org