Git Product home page Git Product logo

adain-vc's People

Contributors

cyhuang-tw avatar jjery2243542 avatar yiftachbeer avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

adain-vc's Issues

Robotic /Metalic voice ?

Hello,
I noticed that the audio samples you provided sound robotic. What is the reason for this?
Thanks.

Questions for data preprocessing and fetching

Thank you very much for the clean implementation. I have some questions regarding the dataset and hope you could clarify, if possible.

On data fetching, if I understand correctly, each sequence from a batch is a concatenation of n_uttrs (4 by default) 128-frame mel spectrograms of a single speaker

[m[:, start : (start + self.segment)] for (m, start) in zip(mels, starts)]

This fetching strategy seems to require speaker labels and effectively make the model supervised. Because this is not mentioned in the original paper, so I am not sure if this is how it's done officially, or is an alternative design choice for this unofficial implementation. This also makes a batch include n unique speakers where n is batch size.

On data preprocessing, I find recordings with long silence on both ends can still be kept after the pipeline

self.sox_effects = SoxEffects(sample_rate, norm_db, sil_threshold, sil_duration)

and segments with large portions of non-voiced frames can be sampled as a result. Is the data fetching strategy mentioned above implemented as a way to mitigate this?

Which speakers are seen in the pre-trained model?

Hi, thanks very much for the sharing the model. I wonder when you trained the model, did you use 89 speakers in VCTK as training set (seen speakers) and 20 as the test set (unseen speakers)? If so, may you share which 20 speakers are in the test set?

meet a bug run the inderence.

run log as below.
\Anaconda3\lib\site-packages\torchaudio\extension\extension.py:13: UserWarning: torchaudio C++ extension is not available.
warnings.warn('torchaudio C++ extension is not available.')
Traceback (most recent call last):
File "E:/vioce_conversion/AdaIN-VC-master/inference.py", line 49, in
main(**vars(parser.parse_args()))
File "E:/vioce_conversion/AdaIN-VC-master/inference.py", line 30, in main
src = wav2mel(src, src_sr)[None, :].to(device)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\vioce_conversion\AdaIN-VC-master\data\wav2mel.py", line 55, in forward
wav_tensor = self.sox_effects(wav_tensor, sample_rate)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "E:\vioce_conversion\AdaIN-VC-master\data\wav2mel.py", line 89, in forward
wav_tensor, _ = apply_effects_tensor(wav_tensor, sample_rate, self.effects)
File "C:\Users\yxandam\Anaconda3\lib\site-packages\torchaudio_internal\module_utils.py", line 35, in wrapped
raise RuntimeError(f'{func.module}.{func.name} requires {req}')
RuntimeError: torchaudio.sox_effects.sox_effects.apply_effects_tensor requires module: torchaudio._torchaudio

PS C:\Users\yx> pip install torchaudio
Looking in indexes: https://pypi.tuna.tsinghua.edu.cn/simple
Requirement already satisfied: torchaudio in c:\users\yx\anaconda3\lib\site-packages (0.8.1)
Requirement already satisfied: torch==1.8.1 in c:\users\yx\anaconda3\lib\site-packages (from torchaudio) (1.8.1+cu101)
Requirement already satisfied: dataclasses in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (0.8)
Requirement already satisfied: numpy in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (1.19.5)
Requirement already satisfied: typing-extensions in c:\users\yx\anaconda3\lib\site-packages (from torch==1.8.1->torchaudio) (3.7.4.3)
PS C:\Users\yx>

vocoder

I need to train vocoder on my own dataset(non-English) or can use your pretrained vocoder.pt?
thx

Pretrained models, samples and a question

Hey, Ive found your repository very interesting and i would like to try it. are there any sample from your trained model on the VCTK data ? also is it possible to get the model you are using ?
Also i've noticed that during prepossessing, you are creating evaluation and test sets but don't use then anywhere, is it currently WIP ?
Thank you very much !

train segment question

default segemnt=128
if I want use 64, I need edit only? train.py:32 # Prepare data
thx

关于声码器

您好,你的模型转换效果十分优秀,但我有一个疑问想请教一下,模型中的vocoder是您自己训练的吗?为何我使用yistLin/universal-vocoder项目中的vocoder模型进行转换得到的语音都是静音呢?是否其中还需要其他转换呢?如能赐教,不胜感激。。

metadata.json

Hello,
I found this repo very relevant for voice conversion. I started working on this. Got stuck in preparing the metadata. Json. Can someone please send me sample json file so that I will prepare my accordingly.

Code question

Hello, dear Author

May I ask what is the purpose of this line of code in file mel2wav.py, line132.
mel_tensor = (mel_tensor - self.ref_db + self.dc_db) / self.dc_db
And is there a technical name for this operation?

This repository is a good work.
Thanks.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.