cyhuang-tw / adain-vc Goto Github PK
View Code? Open in Web Editor NEWAn unofficial implementation of the paper "One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization".
An unofficial implementation of the paper "One-shot Voice Conversion by Separating Speaker and Content Representations with Instance Normalization".
Hello,
I noticed that the audio samples you provided sound robotic. What is the reason for this?
Thanks.
Thank you very much for the clean implementation. I have some questions regarding the dataset and hope you could clarify, if possible.
On data fetching, if I understand correctly, each sequence from a batch is a concatenation of n_uttrs
(4 by default) 128-frame mel spectrograms of a single speaker
Line 26 in 88fe733
n
unique speakers where n
is batch size.
On data preprocessing, I find recordings with long silence on both ends can still be kept after the pipeline
Line 41 in 88fe733
Hi, thanks very much for the sharing the model. I wonder when you trained the model, did you use 89 speakers in VCTK as training set (seen speakers) and 20 as the test set (unseen speakers)? If so, may you share which 20 speakers are in the test set?
run log as below.
\Anaconda3\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
Traceback (most recent call last):
File "E:/vioce_conversion/AdaIN-VC-master/inference.py", line 49, in
main(**vars(parser.parse_args()))
File "E:/vioce_conversion/AdaIN-VC-master/inference.py", line 30, in main
src = wav2mel(src, src_sr)[None, :].to(device)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\vioce_conversion\AdaIN-VC-master\data\wav2mel.py", line 55, in forward
wav_tensor = self.sox_effects(wav_tensor, sample_rate)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\vioce_conversion\AdaIN-VC-master\data\wav2mel.py", line 89, in forward
wav_tensor, _ = apply_effects_tensor(wav_tensor, sample_rate, self.effects)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torchaudio_internal\module_utils.py", line 35, in wrapped
raise RuntimeError(f'{func.module}.{func.name} requires {req}')
RuntimeError: torchaudio.sox_effects.sox_effects.apply_effects_tensor requires module: torchaudio._torchaudio
PS C:\Users\yx> pip install torchaudio
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: torchaudio in c:\users\yx\anaconda3\lib\site-packages (0.8.1)
Requirement already satisfied: torch==1.8.1 in c:\users\yx\anaconda3\lib\site-packages (from torchaudio) (1.8.1+cu101)
Requirement already satisfied: dataclasses in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (0.8)
Requirement already satisfied: numpy in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (1.19.5)
Requirement already satisfied: typing-extensions in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (3.7.4.3)
PS C:\Users\yx>
I need to train vocoder on my own dataset(non-English) or can use your pretrained vocoder.pt?
thx
Hey, Ive found your repository very interesting and i would like to try it. are there any sample from your trained model on the VCTK data ? also is it possible to get the model you are using ?
Also i've noticed that during prepossessing, you are creating evaluation and test sets but don't use then anywhere, is it currently WIP ?
Thank you very much !
python train.py <config_file> <data_dir> <save_dir> [--n_steps steps] [--save_steps save] [--log_steps log] [--n_spks spks] [--n_uttrs uttrs]
Thanks
Hi!
Is code have resume for training from last ckpt? thx
Hi, thanks a lot for your clear realization. I wanna ask about the pretrained vocoder download link.
default segemnt=128
if I want use 64, I need edit only? train.py:32 # Prepare data
thx
您好,你的模型转换效果十分优秀,但我有一个疑问想请教一下,模型中的vocoder是您自己训练的吗?为何我使用yistLin/universal-vocoder项目中的vocoder模型进行转换得到的语音都是静音呢?是否其中还需要其他转换呢?如能赐教,不胜感激。。
Hello,
I found this repo very relevant for voice conversion. I started working on this. Got stuck in preparing the metadata. Json. Can someone please send me sample json file so that I will prepare my accordingly.
Hello, dear Author
May I ask what is the purpose of this line of code in file mel2wav.py, line132.
mel_tensor = (mel_tensor - self.ref_db + self.dc_db) / self.dc_db
And is there a technical name for this operation?
This repository is a good work.
Thanks.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.