cyhuang-tw / adain-vc Goto Github PK

An unofficial implementation of the paper "One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization".

Python 100.00%

adain-vc's People

Contributors

Stargazers

Watchers

adain-vc's Issues

Robotic /Metalic voice ?

Hello,
I noticed that the audio samples you provided sound robotic. What is the reason for this?
Thanks.

Questions for data preprocessing and fetching

Thank you very much for the clean implementation. I have some questions regarding the dataset and hope you could clarify, if possible.

On data fetching, if I understand correctly, each sequence from a batch is a concatenation of n_uttrs (4 by default) 128-frame mel spectrograms of a single speaker

AdaIN-VC/data/dataset.py

Line 26 in 88fe733

[m[:, start : (start + self.segment)] for (m, start) in zip(mels, starts)]

This fetching strategy seems to require speaker labels and effectively make the model supervised. Because this is not mentioned in the original paper, so I am not sure if this is how it's done officially, or is an alternative design choice for this unofficial implementation. This also makes a batch include n unique speakers where n is batch size.

On data preprocessing, I find recordings with long silence on both ends can still be kept after the pipeline

AdaIN-VC/data/wav2mel.py

Line 41 in 88fe733

 self.sox_effects = SoxEffects(sample_rate, norm_db, sil_threshold, sil_duration) 

and segments with large portions of non-voiced frames can be sampled as a result. Is the data fetching strategy mentioned above implemented as a way to mitigate this?

Which speakers are seen in the pre-trained model?

Hi, thanks very much for the sharing the model. I wonder when you trained the model, did you use 89 speakers in VCTK as training set (seen speakers) and 20 as the test set (unseen speakers)? If so, may you share which 20 speakers are in the test set?

meet a bug run the inderence.

run log as below.
\Anaconda3\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
Traceback (most recent call last):
File "E:/vioce_conversion/AdaIN-VC-master/inference.py", line 49, in
main(**vars(parser.parse_args()))
File "E:/vioce_conversion/AdaIN-VC-master/inference.py", line 30, in main
src = wav2mel(src, src_sr)[None, :].to(device)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\vioce_conversion\AdaIN-VC-master\data\wav2mel.py", line 55, in forward
wav_tensor = self.sox_effects(wav_tensor, sample_rate)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\vioce_conversion\AdaIN-VC-master\data\wav2mel.py", line 89, in forward
wav_tensor, _ = apply_effects_tensor(wav_tensor, sample_rate, self.effects)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torchaudio_internal\module_utils.py", line 35, in wrapped
raise RuntimeError(f'{func.module}.{func.name} requires {req}')
RuntimeError: torchaudio.sox_effects.sox_effects.apply_effects_tensor requires module: torchaudio._torchaudio

PS C:\Users\yx> pip install torchaudio
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: torchaudio in c:\users\yx\anaconda3\lib\site-packages (0.8.1)
Requirement already satisfied: torch==1.8.1 in c:\users\yx\anaconda3\lib\site-packages (from torchaudio) (1.8.1+cu101)
Requirement already satisfied: dataclasses in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (0.8)
Requirement already satisfied: numpy in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (1.19.5)
Requirement already satisfied: typing-extensions in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (3.7.4.3)
PS C:\Users\yx>

vocoder

I need to train vocoder on my own dataset(non-English) or can use your pretrained vocoder.pt?
thx

Pretrained models, samples and a question

Hey, Ive found your repository very interesting and i would like to try it. are there any sample from your trained model on the VCTK data ? also is it possible to get the model you are using ?
Also i've noticed that during prepossessing, you are creating evaluation and test sets but don't use then anywhere, is it currently WIP ?
Thank you very much !

This repository is a good work.
Thanks.

cyhuang-tw / adain-vc Goto Github PK

adain-vc's People

Contributors

Stargazers

Watchers

Forkers

adain-vc's Issues

Recommend Projects

Recommend Topics

Recommend Org