Git Product home page Git Product logo

whisper-vits-svc's People


archivoice avatar bfloat16 avatar flottant avatar forsakenrei avatar futorio avatar fyphen1223 avatar imedina7 avatar innnky avatar maxmax2016 avatar silvelter avatar stardust-minus avatar stillonearth avatar thestmitsuki avatar vatsalya-vyas avatar vidyaa18 avatar vivekguruduttk28 avatar zscharlie avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar


 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whisper-vits-svc's Issues


don't use pitch shift
No diffusion model or config found. Shallow diffusion mode will False
Traceback (most recent call last):
File "", line 121, in
File "", line 83, in main
svc_model = Svc(args.model_path, args.config_path, args.device, args.cluster_model_path,enhance,diffusion_model_path,diffusion_config_path,shallow_diffusion,only_diffusion)
File "/root/autodl-tmp/so-vits-svc5/diff_tool/inference/", line 159, in init
File "/root/autodl-tmp/so-vits-svc5/diff_tool/inference/", line 176, in load_model // 2 + 1,
AttributeError: 'Svc' object has no attribute 'hps_ms'
mv: cannot stat './diff_tool/results/*': No such file or directory

could you explain about whisper_ppg?

it seems you are using whisper's encoder output directly as content information vectors, how is it better than contentvec used in previous so-vits-svc?


i don't know which file to use for --spk parameter when using my own wav file at inference step ?


What's the f0 parameter?


你好,我在本地使用预览模型进行测试时发现没有变声效果,但在hugging face上用同样的音频测试,效果却不一样,请问下我是哪个过程错了呢?

  1. 下载预览模型
  2. 使用svc_export.py将sovits5.0.pretrain.pth转成sovits5.0.pth
  3. 按照readme的说明进行测试
python --config configs/base.yaml --model sovits5.0.pth --spk ./configs/singers/singer0051.npy --wave test.wav --ppg test.ppg.npy --pit test.csv

how to make pretrain model?

is pretrain model only have one speaker or have mutil speaker? will us train our dataset on the pretrained first speaker?
why pretrain model is smaller than our trained model, is there a way to convert our model to pretain model?

incorrect audio shape

執行 python prepare/ -w data_svc/waves-16k/ -p data_svc/whisper後出現此錯誤



How can I create the checkpoints for each dataset of speakers separately in different folders?





Super Slow Loss Calc On Google TPU

loss_kl_f = kl_loss(z_f, logs_q, m_p, logs_p, logdet_f, z_mask) * hp.train.c_kl
loss_kl_r = kl_loss(z_r, logs_p, m_q, logs_q, logdet_r, z_mask) * hp.train.c_kl
loss_g = score_loss + mel_loss + stft_loss + loss_kl_f

could you explain about training process?

thanks for awesome work! since i can not understand chinese, i translated readme to english i understood traning process as below

it seems there's two stage training process, training is quite complicated, especially for stage 2 training

For first stage, train VITS(SynthesizerTrn) with whisper ppg, NSF-hifigan, external speaker encoder(d-vector)

Second stage(SynthsizerTrnEx), apply GRL, SNAC for preventing speaker information leakage in text encoder, also apply natural speech loss(bidirectional loss between prior and posterior)

is it right? also, i can not find SynthesizerTrnEx's usage in this code base(maybe currently). could you explain bit more about training process?

sovits5.0-48k-debug.pth 在版本 5d0c4b4 推理没效果

使用的是 configs/singers_sample/47-wave-girl/025.wav

!python whisper/ -w /content/so-vits-svc-5.0/configs/singers_sample/47-wave-girl/025.wav -p test.ppg.npy
!python --config configs/base.yaml --model sovits5.0-48k-debug.pth --spk ./configs/singers/singer0023.npy --wave /content/so-vits-svc-5.0/configs/singers_sample/47-wave-girl/025.wav --ppg test.ppg.npy



  1. 请问有最短的时长要求吗?
  2. 有最短时常要求吗?即音频集最少需要多长总时常可以达到开始训练的程度


  1. 推荐的python版本是?
  2. set PYTHONPATH=%cd%中的%cd%指的是什么?如果是使用conda创建的虚拟环境,还需要指定PYTHONPATH吗?


指的是将python prepare/ data_svc/waves-16k/ data_svc/speaker更改为python prepare/ data_svc/waves-16k/ data_svc/timbre吗?

指定configs/base.yaml参数pretrain: "./5.0.epoch1200.full.pth",并适当调小学习率






Train Time

how much time does it take to train ?

import module

how should i solve this error?
ModuleNotFoundError: No module named 'whisper.model'; 'whisper' is not a package

error at step 7 calls vits but itself is in a map so can't locate it.

if I copy the vits map into prepare it still throws an error at step 7:

line 252, in iter
ids_bucket = ids_bucket + ids_bucket * (rem // len_bucket) + ids_bucket[:(rem % len_bucket)]
ZeroDivisionError: integer division or modulo by zero in \vits\

is it because is missing? it does not seem to exist on the internet?

issue with zip archive

how can i solve this at step4 of Data preprocessing ?
RuntimeError: PytorchStreamReader failed reading zip archive: failed finding central directory

hyper parameters

what are the best values for learnign-rate , epochs, batch-size ?

No module named 'whisper‘.but it seems whisper is built-in

(venv) PS D:\so-vits-svc-5.0> python prepare/ -w data_svc/waves-16k/ -p data_svc/whisper
Traceback (most recent call last):
File "D:\so-vits-svc-5.0\prepare\", line 6, in
from whisper.model import Whisper, ModelDimensions
ModuleNotFoundError: No module named 'whisper

i just clone this project,and install dependence,then i run those commands below

python prepare/ -w ./dataset_raw -o ./data_svc/waves-16k -s 16000
python prepare/ -w data_svc/waves-16k/ -p data_svc/pitch
when i run "python prepare/ -w data_svc/waves-16k/ -p"

occur hapend,

windows 10
Python 3.10.7

help :(

issue in step 4 of Data preprocessing

i get this error when running this : python prepare/ -w data_svc/waves-16k/ -p data_svc/whisper

Traceback (most recent call last):
File "prepare/", line 56, in
whisper = load_model(os.path.join("whisper_pretrain", ""))
File "prepare/", line 25, in load_model
dims = ModelDimensions(**checkpoint["dims"])
TypeError: 'ModuleSpec' object is not callable

Nvidia CUDA 10.1 user here! Can I run this program with pytorch 1.8.1? What is the minimum version requirement?

I am a music instructor and I would love to introduce this lovely AI software to our students to try out.

Here in my school we have several Windows 7 Pro 64-bit computers in our classrooms, running on Nvidia GeForce GTX 660M GPU. According to Nvidia, the highest version of graphic driver we can install is 425.31, and the highest CUDA Toolkit we can install would be 10.1.

According to pytorch dot org, with CUDA version 10.1, the highest torch we can install would be:

Here, “cu101” in the file name, is referring to CUDA 10.1.

Any torch version higher than 1.8.1, will have a higher “cu” number attached in the whl file name, such as:
“torch-1.10.0+cu102-cp36-cp36m-win_amd64.whl”, or
“torch-1.13.0+cu116-cp310-cp310-win_amd64.whl”, etc.

In other words, our school can not install any torch higher than version 1.8.1.

In the non-fork so-vits-svc-4.0 program folder, there is a file called “requirements.txt”. We opened that file, and can see it says “torch==1.13.1”. Can we assume torch version 1.13.1 is the lowest minimum requirement for so-vits-svc program to run?

Does it mean we can not install your amazing software on our school’s computers, because our Nvidia GPU are too old, and can’t reach your required pytorch 1.13.1 version? Or maybe it doesn’t matter, a lower pytorch 1.8.1 version can still run?

Too bad! My colleagues have already trained several G_43200.pth models on their home computers, and they can just simply copy these models to our school’s computers and start the voice inference right away. We don’t need to train on the classroom’s computers, we just need to infer on existing models, to demonstrate to our students. Inferring takes an awful lot less of GPU powers to do.

Has anyone tested this program on CUDA 10.1?

Please let me know. So, should I give up? Is it a death penalty for our students to see this?

CUDA_10 1_User

MacBook CUDA error

M1 MacBook 报错见下。是不是意味着只有 NVIDIA 显卡才行

raise AssertionError("Torch not compiled with CUDA enabled")

AssertionError: Torch not compiled with CUDA enabled

Error in step 5 of data preprocess

How to fix this?

Setting up Audio Processor...
| > sample_rate:16000
| > resample:False
| > num_mels:80
| > log_func:np.log10
| > min_level_db:-100
| > frame_shift_ms:None
| > frame_length_ms:None
| > ref_level_db:20
| > fft_size:1024
| > power:1.5
| > preemphasis:0.98
| > griffin_lim_iters:60
| > signal_norm:True
| > symmetric_norm:True
| > mel_fmin:0
| > mel_fmax:8000.0
| > spec_gain:20.0
| > stft_pad_mode:reflect
| > max_norm:4.0
| > clip_norm:True
| > do_trim_silence:True
| > trim_db:60
| > do_sound_norm:False
| > do_amp_to_db_linear:True
| > do_amp_to_db_mel:True
| > stats_path:None
| > base:10
| > hop_length:256
| > win_length:1024
49%|████████████████████████████████████████████████▌ | 200/408 [00:11<00:12, 17.05it/s]/home/parisa/so-vits-svc-5.0__/speaker/utils/ RuntimeWarning: invalid value encountered in true_divide
return x / abs(x).max() * 0.95
49%|████████████████████████████████████████████████▌ | 200/408 [00:11<00:11, 17.49it/s]
Traceback (most recent call last):
File "prepare/", line 79, in
spec = speaker_encoder_ap.melspectrogram(waveform)
File "/home/parisa/so-vits-svc-5.0__/speaker/utils/", line 564, in melspectrogram
D = self.stft(self.apply_preemphasis(y))
File "/home/parisa/so-vits-svc-5.0
_/speaker/utils/", line 624, in _stft
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/librosa/util/", line 88, in inner_f
return f(*args, **kwargs)
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/librosa/core/", line 202, in stft
util.valid_audio(y, mono=False)
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/librosa/util/", line 88, in inner_f
return f(*args, **kwargs)
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/librosa/util/", line 294, in valid_audio
raise ParameterError("Audio buffer is not finite everywhere")
librosa.util.exceptions.ParameterError: Audio buffer is not finite everywhere

ModuleNotFoundError: No module named 'whisper'

调用python whisper/ -w test.wav -p test.ppg.npy 报错无whisper模块
pip install whisper之后报错、

Traceback (most recent call last):
File "H:\svc\sovits\so-vits-svc-5.0\whisper\", line 6, in
from whisper.model import Whisper, ModelDimensions
File "H:\svc\sovits\so-vits-svc-5.0\venv\lib\site-packages\", line 65, in
libc = ctypes.CDLL(libc_name)
File "C:\Python310\lib\", line 364, in init
if '/' in name or '\' in name:
TypeError: argument of type 'NoneType' is not iterable




│   ├───xxx1-xxx1.wav
│   ├───...
│   └───Lxx-0xx8.wav


what is the reason for this?

I installed everything that was in the file requirements.txt , but it gives me this error at stage 4
I also exported PYTHONPATH (I got to stage 7, except stage 4, with no errors)

Traceback (most recent call last):
File "prepare/", line 54, in
pred_ppg(whisper, f"{wavPath}/{spks}/{file}.wav", f"{ppgPath}/{spks}/{file}.ppg")
File "prepare/", line 20, in pred_ppg
audio = load_audio(wavPath)
File "so-vits-svc-5.0-main\whisper\", line 42, in load_audio
ffmpeg.input(file, threads=0)
File "Python\Python38\lib\site-packages\", line 313, in run
process = run_async(
File "Python\Python38\lib\site-packages\", line 284, in run_async
return subprocess.Popen(
File "Python\Python38\lib\", line 854, in init
self._execute_child(args, executable, preexec_fn, close_fds,
File "Python\Python38\lib\", line 1307, in _execute_child
hp, ht, pid, tid = _winapi.CreateProcess(executable, args,

FileNotFoundError: [WinError 2] The specified file cannot be found




what is this ?
how to solve it?
prepare/ RuntimeWarning: invalid value encountered in true_divide
wav = wav / np.abs(wav).max() * 0.6

错误 [WinError 183] 当文件已存在时,无法创建该文件。

运行代码 重采样


生成采样率16000Hz音频, 存储路径为:./data_svc/waves-16k

python prepare/ -w ./data_raw -o ./data_svc/waves-16k -s 16000
后,出现 [WinError 3] 系统找不到指定的路径。
再次运行代码 [WinError 183] 当文件已存在时,无法创建该文件。

Error when training

Why this happened when i run this command : python -c configs/base.yaml -n sovits5.0

File "", line 11, in
from vits_extend.train import train
File "/home/parisa/so-vits-svc-5.0__/vits_extend/", line 16, in
from vits_extend.writer import MyWriter
File "/home/parisa/so-vits-svc-5.0__/vits_extend/", line 1, in
from torch.utils.tensorboard import SummaryWriter
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/torch/utils/tensorboard/", line 12, in
from .writer import FileWriter, SummaryWriter # noqa: F401
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/torch/utils/tensorboard/", line 9, in
from tensorboard.compat.proto.event_pb2 import SessionLog
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/tensorboard/compat/proto/", line 17, in
from tensorboard.compat.proto import summary_pb2 as tensorboard_dot_compat_dot_proto_dot_summary__pb2
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/tensorboard/compat/proto/", line 17, in
from tensorboard.compat.proto import histogram_pb2 as tensorboard_dot_compat_dot_proto_dot_histogram__pb2
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/tensorboard/compat/proto/", line 42, in
serialized_options=None, file=DESCRIPTOR),
File "/home/parisa/anaconda3/envs/voice/lib/python3.7/site-packages/google/protobuf/", line 561, in new
TypeError: Descriptors cannot not be created directly.
If this call came from a file, your generated code is out of date and must be regenerated with protoc >= 3.19.0.
If you cannot immediately regenerate your protos, some other possible workarounds are:

  1. Downgrade the protobuf package to 3.20.x or lower.
  2. Set PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=python (but this will use pure-Python parsing and will be much slower).

More information:

wtf man

from whisper.model import Whisper, ModelDimensions
ModuleNotFoundError: No module named 'whisper'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.