Ensure that you have the necessary Python packages installed by following these steps (Python 3.9 is recommended):
Windows
Execute the install.bat file to activate a Conda environment. Afterward, launch the application using env/python.exe rvc.py instead of the conventional python rvc.py command.
Linux
chmod +x install.sh
./install.sh
Getting Started
Download the necessary models and executables by running the following command:
python rvc.py prerequisites
More information about the prerequisites command here
For detailed information and command-line options, refer to the help command:
python rvc.py -h
This command provides a clear overview of the available modes and their corresponding parameters, facilitating effective utilization of the RVC CLI.
Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius
No
3
0 to 10
If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate
No
0.3
0.0 to 1.0
Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length
No
128
1 to 512
Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate
No
1
0 to 1
Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect
No
0.33
0 to 0.5
Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune
No
False
True or False
Apply a soft autotune to your inferences, recommended for singing conversions.
Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
input_path
Yes
None
Full path to the input audio file
Full path to the input audio file
output_path
Yes
None
Full path to the output audio file
Full path to the output audio file
pth_path
Yes
None
Full path to the pth file
Full path to the pth file
index_path
Yes
None
Full index file path
Full index file path
split_audio
No
False
True or False
Split the audio into chunks for inference to obtain better results in some cases.
clean_audio
No
False
True or False
Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength
No
0.7
0.0 to 1.0
Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format
No
WAV
WAV, MP3, FLAC, OGG, M4A
File audio format
embedder_model
No
hubert
hubert or contentvec
Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio
No
False
True or False
Upscale the audio to 48kHz for better results.
Refer to python rvc.py infer -h for additional help.
Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius
No
3
0 to 10
If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate
No
0.3
0.0 to 1.0
Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length
No
128
1 to 512
Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate
No
1
0 to 1
Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect
No
0.33
0 to 0.5
Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune
No
False
True or False
Apply a soft autotune to your inferences, recommended for singing conversions.
Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
input_folder_path
Yes
None
Full path to the input audio folder (The folder may only contain audio files)
Full path to the input audio folder
output_folder_path
Yes
None
Full path to the output audio folder
Full path to the output audio folder
pth_path
Yes
None
Full path to the pth file
Full path to the pth file
index_path
Yes
None
Full path to the index file
Full path to the index file
split_audio
No
False
True or False
Split the audio into chunks for inference to obtain better results in some cases.
clean_audio
No
False
True or False
Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength
No
0.7
0.0 to 1.0
Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format
No
WAV
WAV, MP3, FLAC, OGG, M4A
File audio format
embedder_model
No
hubert
hubert or contentvec
Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio
No
False
True or False
Upscale the audio to 48kHz for better results.
Refer to python rvc.py batch_infer -h for additional help.
Set the pitch of the audio, the higher the value, thehigher the pitch.
filter_radius
No
3
0 to 10
If the number is greater than or equal to three, employing median filtering on the collected tone results has the potential to decrease respiration.
index_rate
No
0.3
0.0 to 1.0
Influence exerted by the index file; a higher value corresponds to greater influence. However, opting for lower values can help mitigate artifacts present in the audio.
hop_length
No
128
1 to 512
Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
rms_mix_rate
No
1
0 to 1
Substitute or blend with the volume envelope of the output. The closer the ratio is to 1, the more the output envelope is employed.
protect
No
0.33
0 to 0.5
Safeguard distinct consonants and breathing sounds to prevent electro-acoustic tearing and other artifacts. Pulling the parameter to its maximum value of 0.5 offers comprehensive protection. However, reducing this value might decrease the extent of protection while potentially mitigating the indexing effect.
f0autotune
No
False
True or False
Apply a soft autotune to your inferences, recommended for singing conversions.
Pitch extraction algorithm to use for the audio conversion. The default algorithm is rmvpe, which is recommended for most cases.
output_tts_path
Yes
None
Full path to the output TTS audio file
Full path to the output TTS audio file
output_rvc_path
Yes
None
Full path to the input RVC audio file
Full path to the input RVC audio file
pth_path
Yes
None
Full path to the pth file
Full path to the pth file
index_path
Yes
None
Full path to the index file
Full path to the index file
split_audio
No
False
True or False
Split the audio into chunks for inference to obtain better results in some cases.
clean_audio
No
False
True or False
Clean your audio output using noise detection algorithms, recommended for speaking audios.
clean_strength
No
0.7
0.0 to 1.0
Set the clean-up level to the audio you want, the more you increase it the more it will clean up, but it is possible that the audio will be more compressed.
export_format
No
WAV
WAV, MP3, FLAC, OGG, M4A
File audio format
embedder_model
No
hubert
hubert or contentvec
Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
upscale_audio
No
False
True or False
Upscale the audio to 48kHz for better results.
Refer to python rvc.py tts_infer -h for additional help.
By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
hop_length
No
128
1 to 512
Denotes the duration it takes for the system to transition to a significant pitch change. Smaller hop lengths require more time for inference but tend to yield higher pitch accuracy.
sampling_rate
Yes
None
32000, 40000, or 48000
Sampling rate of the audio data
embedder_model
No
hubert
hubert or contentvec
Embedder model to use for the audio conversion. The default model is hubert, which is recommended for most cases.
Determine at how many epochs the model will saved at.
save_only_latest
No
False
True or False
Enabling this setting will result in the G and D files saving only their most recent versions, effectively conserving storage space.
save_every_weights
No
True
True or False
This setting enables you to save the weights of the model at the conclusion of each epoch.
total_epoch
No
1000
1 to 10000
Specifies the overall quantity of epochs for the model training process.
sampling_rate
Yes
None
32000, 40000, or 48000
Sampling rate of the audio data
batch_size
No
8
1 to 50
It's advisable to align it with the available VRAM of your GPU. A setting of 4 offers improved accuracy but slower processing, while 8 provides faster and standard results.
gpu
No
0
0 to โ separated by -
Specify the number of GPUs you wish to utilize for training by entering them separated by hyphens (-).
pitch_guidance
No
True
True or False
By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
overtraining_detector
No
False
True or False
Utilize the overtraining detector to prevent overfitting. This feature is particularly valuable for scenarios where the model is at risk of overfitting.
overtraining_threshold
No
50
1 to 100
Set the threshold for the overtraining detector. The lower the value, the more sensitive the detector will be.
pretrained
No
True
True or False
Utilize pretrained models when training your own. This approach reduces training duration and enhances overall quality.
custom_pretrained
No
False
True or False
Utilizing custom pretrained models can lead to superior results, as selecting the most suitable pretrained models tailored to the specific use case can significantly enhance performance.
g_pretrained
No
None
Full path to pretrained file G, only if you have used custom_pretrained
Full path to pretrained file G
d_pretrained
No
None
Full path to pretrained file D, only if you have used custom_pretrained
Full path to pretrained file D
sync_graph
No
False
True or False
Synchronize the graph of the tensorbaord. Only enable this setting if you are training a new model.
Refer to python rvc.py train -h for additional help.
Generate Index File
python rvc.py index --model_name "model_name" --rvc_version "rvc_version"
Parameter Name
Required
Default
Valid Options
Description
model_name
Yes
None
Name of the model
Name of the model
rvc_version
Yes
None
v1 or v2
Version of the model
Refer to python rvc.py index -h for additional help.
UVR
python uvr.py [audio_file] [options]
Info and Debugging
Parameter Name
Required
Default
Valid Options
Description
audio_file
Yes
None
Any valid audio file path
The path to the audio file you want to separate, in any common format.
-d, --debug
No
False
Enable debug logging.
-e, --env_info
No
False
Print environment information and exit.
-l, --list_models
No
False
List all supported models and exit.
--log_level
No
info
info, debug, warning
Log level.
Separation I/O Params
Parameter Name
Required
Default
Valid Options
Description
-m, --model_filename
No
UVR-MDX-NET-Inst_HQ_3.onnx
Any valid model file path
Model to use for separation.
--output_format
No
WAV
Any common audio format
Output format for separated files.
--output_dir
No
None
Any valid directory path
Directory to write output files.
--model_file_dir
No
/tmp/audio-separator-models/
Any valid directory path
Model files directory.
Common Separation Parameters
Parameter Name
Required
Default
Valid Options
Description
--invert_spect
No
False
Invert secondary stem using spectrogram.
--normalization
No
0.9
Any float value
Max peak amplitude to normalize input and output audio to.
--single_stem
No
None
Instrumental, Vocals, Drums, Bass, Guitar, Piano, Other
Output only a single stem.
--sample_rate
No
44100
Any integer value
Modify the sample rate of the output audio.
MDXC Architecture Parameters
Parameter Name
Required
Default
Valid Options
Description
--mdxc_segment_size
No
256
Any integer value
Size of segments for MDXC architecture.
--mdxc_override_model_segment_size
No
False
Opverride model default segment size instead of using the model default value.
--mdxc_overlap
No
8
2 to 50
Amount of overlap between prediction windows for MDXC architecture.
--mdxc_batch_size
No
1
Any integer value
Batch size for MDXC architecture.
--mdxc_pitch_shift
No
0
Any integer value
Shift audio pitch by a number of semitones while processing for MDXC architecture.
MDX Architecture Parameters
Parameter Name
Required
Default
Valid Options
Description
--mdx_segment_size
No
256
Any integer value
Size of segments for MDX architecture.
--mdx_overlap
No
0.25
0.001 to 0.999
Amount of overlap between prediction windows for MDX architecture.
--mdx_batch_size
No
1
Any integer value
Batch size for MDX architecture.
--mdx_hop_length
No
1024
Any integer value
Hop length for MDX architecture.
--mdx_enable_denoise
No
False
Enable denoising during separation for MDX architecture.
Demucs Architecture Parameters
Parameter Name
Required
Default
Valid Options
Description
--demucs_segment_size
No
Default
Any integer value
Size of segments for Demucs architecture.
--demucs_shifts
No
2
Any integer value
Number of predictions with random shifts for Demucs architecture.
--demucs_overlap
No
0.25
0.001 to 0.999
Overlap between prediction windows for Demucs architecture.
--demucs_segments_enabled
No
True
Enable segment-wise processing for Demucs architecture.
VR Architecture Parameters
Parameter Name
Required
Default
Valid Options
Description
--vr_batch_size
No
4
Any integer value
Batch size for VR architecture.
--vr_window_size
No
512
Any integer value
Window size for VR architecture.
--vr_aggression
No
5
-100 to 100
Intensity of primary stem extraction for VR architecture.
--vr_enable_tta
No
False
Enable Test-Time-Augmentation for VR architecture.
--vr_high_end_process
No
False
Mirror the missing frequency range of the output for VR architecture.
--vr_enable_post_process
No
False
Identify leftover artifacts within vocal output for VR architecture.
--vr_post_process_threshold
No
0.2
0.1 to 0.3
Threshold for post-process feature for VR architecture.
By employing pitch guidance, it becomes feasible to mirror the intonation of the original voice, including its pitch. This feature is particularly valuable for singing and other scenarios where preserving the original melody or pitch pattern is essential.
rvc_version
Yes
None
v1 or v2
Version of the model
epoch
Yes
None
1 to 10000
Specifies the overall quantity of epochs for the model training process.
step
Yes
None
1 to โ
Specifies the overall quantity of steps for the model training process.
feature requests
a) I use RVC and use GPU for inference, can you enable it in cli as well
b) can the temporary files be kept inside a folder say temp on projects, making it easier for housekeeping
I've got Applio running on my M2 Max Mac Studio but Batch Conversion is not working. To get further information I cloned this git here and tried the CLI batch conversion, which also does not work. Single conversion works fine with CLI and Applio.
This is my single conversion cmd, which results in a working file: python main.py infer --f0up_key "0" --filter_radius "3" --index_rate "0.8" --hop_length "64" --split_audio "True" --f0autotune "False" --f0method "rmvpe" --input_path "/Users/liam/Music/RVC/city_of_angels/hmmmh.wav" --output_path "/Users/liam/Downloads/test/test.wav" --pth_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super.pth" --index_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super_clean.index"
This is batch-conversion, which results in an error, no matter if rms_mix_rate and other parameters are included in the cmd or not: python main.py batch_infer --f0up_key "0" --filter_radius "3" --index_rate "0.8" --hop_length "64" --split_audio "True" --f0autotune "False" --f0method "rmvpe" --input_folder "/Users/liam/Music/RVC/love_me_down/ValYoung" --output_folder "/Users/liam/Downloads/test" --pth_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super.pth" --index_path "/Applications/RVC_Applio/logs/40k_natedogg_super/40k_natedogg_super_clean.index" --rms_mix_rate "0.0"
The conversion fails with the following error:
Inferring /Users/liam/Music/RVC/love_me_down/ValYoung/Ladada_1.wav.wav...
No supported Nvidia GPU found
Traceback (most recent call last):
File "/Users/liam/Downloads/RVC_CLI/rvc/infer/infer.py", line 229, in <module>
rms_mix_rate = float(sys.argv[12])
ValueError: could not convert string to float: 'True'
Seems like rms_mix_rate=True is sneaking in somewhere, and resulting in an error when converted to float. But where is it coming from? I removed all arguments that use True/False from the cmd, but it still ends up with this error.
Hi,
I am trying to make this work as a docker container too, but can't really get it to work...
Maybe there is already a Dockerfile out there?
If not, this is my current Dockerfile (currently I wanted to test inference first, so I copied my models in):
FROM nvidia/cuda:11.6.2-cudnn8-runtime-ubuntu20.04
# Create a working directory
WORKDIR /app
# Install dependenceis to add PPAs and git
RUN apt-get update && \
apt-get install -y -qq ffmpeg aria2 && apt clean && \
apt-get install -y software-properties-common && \
apt-get install -y git && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Add the deadsnakes PPA to get Python 3.9
RUN add-apt-repository ppa:deadsnakes/ppa
# Clone the repository
RUN git clone https://github.com/blaise-tk/RVC_CLI.git
# Set the working directory to the cloned repo
WORKDIR /app/RVC_CLI
# Install Python 3.9 and pip
RUN apt-get update && \
apt-get install -y build-essential python-dev python3-dev python3.9-distutils python3.9-dev python3.9 curl && \
apt-get clean && \
update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.9 1 && \
curl https://bootstrap.pypa.io/get-pip.py | python3.9
# Set Python 3.9 as the default
RUN update-alternatives --install /usr/bin/python python /usr/bin/python3.9 1
# Install Python dependencies
RUN chmod +x install.sh
RUN ./install.sh
# create init.py file in app/RVC_CLI/rvc folder
RUN touch /app/RVC_CLI/rvc/__init__.py
# Download prerequisites
RUN python rvc.py prerequisites --pretraineds_v1 True --pretraineds_v2 True --models True --exe True
# Copy the audio file into the container
COPY audio.mp3 /app/RVC_CLI/audio.mp3
COPY Jari.pth /app/RVC_CLI/Jari.pth
COPY Jari.index /app/RVC_CLI/Jari.index
# Set the entrypoint to keep the container running
CMD ["tail", "-f", "/dev/null"]
It builds, but when I run infer.py eg with this:
python rvc.py infer --f0up_key 0 --filter_radius 3 --index_rate 0.3 --hop_length 128 --rms_mix_rate 1.0 --protect 0.3 --f0autotune False --f0method rmvpe --input_path /app/RVC_CLI/audio.mp3 --output_path /output/audio_out.mp3 --pth_path /app/RVC_CLI/Jari.pth --index_path /app/RVC_CLI/Jari.index
I don't get an output file.
I am pretty new to Docker so maybe someone more experienced could figure a good setup out quite quickly :)
And I think this would make the repo even easier to use for the average user.
First of all, thank you, this is the first cli for rvc that actually works!! I've been trying all kinds of solutions. Below is a minor enhancement you could make.
The following error is experienced when inferencing on apple silicon:
The operator 'aten::_fft_r2c' is not currently implemented for the MPS device. If you want this op to be added in priority during the prototype phase of this feature, please comment on pytorch/pytorch#77764. As a temporary fix, you can set the environment variable PYTORCH_ENABLE_MPS_FALLBACK=1 to use the CPU as a fallback for this op. WARNING: this will be slower than running natively on MPS.
Voice conversion failed: cannot unpack non-iterable NoneType object
Setting the mps fallback as mentioned works but could be handled in your code.
I noticed a potential conflict with the file "rvc.py" and the folder named "rvc". To avoid confusion or issues, could we please rename either the file or the folder?
I am generating songs using the inference code. The problem is that the output just contains the vocals and I would like to join the instrumentals too.
Could you give me a hand on this?
Describe the bug
A clear and concise description of what the bug is.
It looks like the value of args.protect isn't being converted properly to a float. I've tried converting the arg to a float directly but it produces errors related tot he tensor size in torch.from_numpy(npy).unsqueeze(0).to(self.device) * index_rate + (1 - index_rate) * feats. Fiddled with a bit but couldn't get anything worthwhile to come out.
....................... To Reproduce
Steps to reproduce the behavior:
$ python3 main.py batch_infer --f0up_key 5 --filter_radius 4 --index_rate 0.9 --hop_length 128 --rms_mix_rate 1.0 --protect 0.4 --f0autotune True --f0method rmvpe --input_folder "/home/mb/Desktop/" --output_folder "/home/......................." --pth_path "/home/........................pth" --index_path "/home........................index" --export_format WAV
changing the value of protect doesn't seem to change the error.
Expected behavior
For the inference to happen correcty
Assets
If applicable, add screenshots/videos to help explain your problem.
Desktop (please complete the following information):
Linux Mint 21 runningg kernel 6.5.0-26-generic
Firefox?
Additional context
Add any other context about the problem here.
Bug Description
i ran the install.bat, now i am trying to run env/python.exe rvc.py or python rvc.py prerequisites, also one thing when i executed the bat file , a;; the dependencies installed in my whole local system
File "C:\Users\ESHAN\Desktop\rvctest\rvc.py", line 10, in
from rvc.configs.config import Config
File "C:\Users\ESHAN\Desktop\rvctest\rvc.py", line 10, in
from rvc.configs.config import Config
ModuleNotFoundError: No module named 'rvc.configs'; 'rvc' is not a package
I'm having some issues with the API call (internal server error) - I'm assuming its the syntax of the JSON at this point, ive messed around a bit but keeps returning in error. Here is how the Json is syntaxed atm:
{
"f0up_key": 0,
"filter_radius": 5,
"index_rate": 0.5,
"hop_length": 256,
"f0method": "rmvpe",
"input_path": "D:\Projects\VoiceChangerAI\TestFile\testa.wav",
"output_path": "D:\Projects\VoiceChangerAI\TestFile\output.wav",
"pth_file": "LB.pth",
"index_path": "LB.index",
"split_audio": false,
}
have "LB.pth" and the index in the "RVC_CLI\models" folder currently?
Thanks for any help - I'm total narb with this stuff >_<
I was able to get the server started and configured to work, tts is working too after changing locale to shortname.
So im getting tts output but rvc output is just interference, like a continuous beep. also when performing inference, it works well on the original and the fork with the same model, do I need special type of model here?
{
"output": "<All keys matched successfully>\nConversion completed. Output file: '/home/.../RVC_CLI/output/1.wav' in 4.14 seconds.\n",
"error": "/home/.../anaconda3/envs/rvc_cli/lib/python3.9/site-packages/torch/nn/utils/weight_norm.py:30: UserWarning: torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.\n warnings.warn(\"torch.nn.utils.weight_norm is deprecated in favor of torch.nn.utils.parametrizations.weight_norm.\")\n"
}
However, the file does not appear in the output folder.
Moreover, to make it work, I changed:
@app.post("/infer")
async def infer(request: Request):
command = ["python", "main.py", "infer"]
json_data = await request.json()
command += [f"--{key}={value}" for key, value in json_data.items()]
return execute_command(command)
Also, it would be good to format the output as follows:
Describe the bug
A clear and concise description of what the bug is.
Get the following error, whether overtrain_detector is set to true/false and whether overtrain_threshold is set to an integer value or not:
train.py: error: argument -ot/--overtraining_threshold: invalid int value: 'False' To Reproduce
Steps to reproduce the behavior:
Run with the following options:
{python_path} main.py train --model_name {model_name} --sampling_rate 40000 --pitch_guidance True --gpu 1 --save_every_epoch 50 --save_only_latest True --overtraining_detector False
Expected behavior
A clear and concise description of what you expected to happen.
Training runs and completes
Assets
If applicable, add screenshots/videos to help explain your problem.
Desktop (please complete the following information):
I'm trying to generate a new pth and got follow error:
"FileNotFoundError: [Errno 2] No such file or directory: '/home/user/RVC_CLI/logs/mute/sliced_audios/mute40000.wav'"
Step To get a error:
Preprocess Dataset (ok in this step) python3 rvc.py preprocess --model_name "johnnyc" --dataset_path "../models/sample/" --sampling_rate "40000"
Extract Features (ok in this step) python3 rvc.py extract --model_name "johnnyc" --rvc_version "v2" --sampling_rate "40000"
Start Training python3 rvc.py train --model_name "johnnyc" --rvc_version "v2" --save_every_epoch "3" --sampling_rate "40000"
Got a follow error:
Process Process-1:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File "/home/user/RVC_CLI/rvc/train/train.py", line 215, in run
train_dataset = TextAudioLoaderMultiNSFsid(hps.data)
File "/home/user/RVC_CLI/rvc/train/data_utils.py", line 21, in __init__
self._filter()
File "/home/user/RVC_CLI/rvc/train/data_utils.py", line 29, in _filter
lengths.append(os.path.getsize(audiopath) // (3 * self.hop_length))
File "/usr/lib/python3.10/genericpath.py", line 50, in getsize
return os.stat(filename).st_size
FileNotFoundError: [Errno 2] No such file or directory: '/home/user/RVC_CLI/logs/mute/sliced_audios/mute40000.wav'
Saved index file '/home/user/RVC_CLI/logs/johnnyc/added_IVF94_Flat_nprobe_1_v2.index'
Hello, I didnโt find any other contacts, so Iโll write the problem here.
Question context:
I'm trying to write an application in C# (WinForms) using your solution. I am weak in programming, outside the zone of simple C# applications, but I was interested in the functionality of the RVC library. I want to try to establish interaction with the RVC by organizing a server and sending requests to it from clients from the global network. But I encountered a problem when the API server is launched on a local machine, where there is a suitable video card for work, but on the machine with which the application is being developed - there is no. And to establish communication with my โserverโ where your solution will be deployed, I tried to change the server launch parameters to a local IP (192.168.x.x) and another port, but failed.
The main point of the question:
Can I somehow use launch parameters (for example the main.py api [-ip or -port] file) to change the parameters of the API server (uvicorn server)?
If not, is it possible to add such functionality to the "main.py api" command?
So first things first, infer section looks alright. moving on to training u guys can def improve some stuff on there:
Preprocess Dataset: i believe you guys should set the preset sample rate to 32k and not 40k
Extract Features: eradicate v1, its outdated and uses a smaller hubert (i think) but regardless it lacks in dynamics much more than the v2 arch we have for rvc so yeah u can just remove the option of choosing versions and auto set it to v2.
the rest looks fine for now, i really like it at least. One thing u can also add though is model fusion and all the other stuff the gui has like ckpt processing and so on and so forth. Other than that it works wonderfully :3
Describe the bug
Inference from a model I trained isn't working and just printing out "Error: 'config'" instead
To Reproduce
I used the colab in the repo, with the change that I made the main directory on my Google Drive so I don't have to pull every time and so the models I make are automatically saved.
I trained a model, and maybe this is the problem, with v2 and 40k sample rate. I later saw in the configs that there is no v2-40k config.
I then tried the inference there, and it didn't work out, it just spat out "Error: 'config'"
I traced it to the vc pipeline where it loads the checkpoint and tries to access ckpt['config'].
Then outside of the code, I loaded the checkpoints saved from my training and they do not have a 'config' key.
So I looked at the code that saves the checkpoints, and there is no 'config' key there either.
I think I'm doing something wrong, but I'm not sure what.
should all of the saved G's be usable? I think the most possible issue is that I'm using the wrong model file. I'm using the ones in the log/model_name directory. Usually RVC saves other weights in the weights directory, but there isn't one here.
As mentioned above, I checked the training code where it saves, and there is no explicit 'config' in the saved pth when it's saving the epoch checkpoints. Are there different models being saved?
Is V2 + 40k supported even though there is no config file for it?
Could having the repo on Google drive cause issues? I know sometimes it causes Linux path issues.
save_only_latest seems to not be a usable flag since the best model might not be the latest, and usually we need to go back and see the performance and pick the one that's best. How is this flag really used?
Describe the bug
During infer an error/warning is reported, " '<' not supported between instances of 'str' and 'float' "
and no output file is written despite the output saying it is.
To Reproduce
Steps to reproduce the behavior:
Run the command
PS M:\LLMs\tts\RVC_CLI> .\env\python.exe main.py infer `
>> --index_path '.\rvcs\test0.index' `
>> --pth_path '.\rvcs\test0.pth' `
>> --input_path '.\output.wav' `
>> --output_path 'M:\LLMs\tts\RVC_CLI\output-rvc.wav'
<All keys matched successfully>
'<' not supported between instances of 'str' and 'float'
Conversion completed. Output file: 'M:\LLMs\tts\RVC_CLI\output-rvc.wav' in 2.22 seconds.
PS M:\LLMs\tts\RVC_CLI>
Expected behavior
An output file processed with the supplied pth/index
Assets
If applicable, add screenshots/videos to help explain your problem.
Desktop (please complete the following information):
Windows 11
Additional context
If I checkout tag 1.1.2 it all works
Let me know if we can do anything to improve the voice quality.
Once again, your work is great in the RVC commandline space. Yours is the best commandline tool for RVC, better than all the ones even released by the original RVC project. So, Thank you.
Describe the bug
When i try to use 2 GPUs on Kaggle, this error occours: [W socket.cpp:663] [c10d] The client socket has failed to connect to [localhost]:55292 (errno: 99 - Cannot assign requested address).
To Reproduce
Steps to reproduce the behavior:
Simply put "0-1" in --gpu in training part on Kaggle
Expected behavior
Be able to use two GPUs for training.
Assets
Desktop (please complete the following information):
OS: Linux
Browser: chrome
Additional context
Yes, I was using the 2x T4, and was using the last commit ("fix multi gpu")
Setting to True or False always results on in the same output.wav? how can I get the voice swapped output combined with the original instrumental ? Thanks. This one has been the easiest to use tool so far.
The model is training successfully, but when attempting to process a file, the output consists only of squeaks and noise. This issue persists even when using alternative models; the resulting audio remains distorted. Interestingly, utilizing a model trained in a different version of RVC yields normal functioning.
Another instance of RVC works fine in this machine.
What could be the underlying cause? Various output file formats have been experimented with to no avail.
still having issues with the API. The main infer worked with the same input as the Json bellow. I tried messing around with the format a bit but no luck.
Traceback (most recent call last):
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 953, in
main()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 947, in main
run_api_script()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\main.py", line 385, in run_api_script
subprocess.run(command)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 507, in run
stdout, stderr = process.communicate(input, timeout=timeout)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1126, in communicate
self.wait()
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1189, in wait
return self._wait(timeout=timeout)
File "C:\Users\KCLEE\Documents\GitHub\RVC_CLI\env\lib\subprocess.py", line 1486, in _wait
result = _winapi.WaitForSingleObject(self._handle,