maum-ai / nuwave2 Goto Github PK

View Code? Open in Web Editor NEW

261.0 9.0 21.0 46.37 MB

NU-Wave 2: A General Neural Audio Upsampling Model for Various Sampling Rates @ INTERSPEECH 2022

Home Page: https://mindslab-ai.github.io/nuwave2

License: BSD 3-Clause "New" or "Revised" License

Dockerfile 1.79% Python 98.21%

deep-learning neural-audio-upsampling super-resolution upsampling pytorch

nuwave2's People

Contributors

Stargazers

Watchers

nuwave2's Issues

Infer_step selection

I have been using your open-source code to perform 16k to 48k speech reconstruction. I utilized the default 8-step inference process and tested it on the untrimmed test set using your provided checkpoint.

However, I've encountered some issues with the reconstructed speech quality. Specifically, there appears to be a significant amount of noise in the high-frequency components of the reconstructed speech. The SNR I obtained is 19.472, and the LSD is 1.212. In contrast, the results in the research paper show SNR as 24.0 and LSD as 0.92.

I suspect that the issue might be related to the inadequacy of the inference steps. Therefore, I would like to understand how to better configure the infer_steps and infer_schedule to improve the quality of the reconstructed speech. Could you please provide guidance on how to adjust these parameters to get closer to the results mentioned in the research paper?

Some questions ...

Hey, first of all, great results and solid paper. I have some questions:

Are you planning on releasing the weights?
For how many iterations did you train (I see it's ~1.4M in the graph but I just wanna be sure)?
How much time did it roughly take to train the model (I noticed you trained on a batch size of 24) and which GPU was used?

Thanks for any help in advance!

Can this be used for real implementation as in upsampling a low quality audio file?

I wonder whether this AI is ready to use to upsample audio files and not just for training? If so how do we use it, do we upsample by testing it or by making an inference?

How much VRAM is necessary for this? 12 GB vram I got out of memory error

the command i executed : python inference.py -c nuwave2_02_16_13_epoch=629.ckpt -i 5dk.mp3 --sr 16

here results

D:\86 se courses youtube kanali\upsample audio>python inference.py -c nuwave2_02_16_13_epoch=629.ckpt -i 5dk.mp3 --sr 16
Traceback (most recent call last):
  File "D:\86 se courses youtube kanali\upsample audio\inference.py", line 115, in <module>
    wav_recon, wav_list = model.inference(wav_l, band, args.steps, noise_schedule)
  File "C:\Python399\lib\site-packages\torch\autograd\grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "D:\86 se courses youtube kanali\upsample audio\lightning_model.py", line 50, in inference
    signal, recon = self.model.denoise_ddim(signal, wav_l, band, logsnr_t, logsnr_s)
  File "D:\86 se courses youtube kanali\upsample audio\diffusion.py", line 54, in denoise_ddim
    noise = self.model(y, y_l, band, norm_nlogsnr)
  File "C:\Python399\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\upsample audio\model.py", line 215, in forward
    x, skip_connection = layer(x, band, noise_level)
  File "C:\Python399\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\upsample audio\model.py", line 173, in forward
    y_l, y_g = self.ffc1(y_l, y_g, band) # STFC
  File "C:\Python399\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "D:\86 se courses youtube kanali\upsample audio\model.py", line 152, in forward
    out_xl = self.convl2l(x_l) + self.convg2l(x_g)
  File "C:\Python399\lib\site-packages\torch\nn\modules\module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "C:\Python399\lib\site-packages\torch\nn\modules\conv.py", line 313, in forward
    return self._conv_forward(input, self.weight, self.bias)
  File "C:\Python399\lib\site-packages\torch\nn\modules\conv.py", line 309, in _conv_forward
    return F.conv1d(input, weight, bias, self.stride,
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 3.43 GiB (GPU 0; 12.00 GiB total capacity; 10.41 GiB already allocated; 0 bytes free; 10.42 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Getting a CUDA out of memory error during inference

First of all thanks to the author for his excellent project.
I getting a CUDA out of memory error during inference, my command is as follows:

python inference.py -c models/584_16k.ckpt -i asset/raw_16k.wav --sr 16000

I checked all the issues and documents, but found no relevant information. How should I set the parameters?

My graphics card is NVIDIA T4 16g RAM.

Whether sampling.py in nuwave can be used in nuwave2

Thank you for your great contribution!
I see how to train and test but I don't see any examples of how to use it. How do I upsample a wav once the model is trained?

I want to use the official checkpoint for audio upsampling. Can sampling.py be used directly in nuwave2.

RuntimeError: CUDA out of memory.

Model performance

Hello, could you please tell me how to reproduce the SNR and LSD indicators in the paper? When I used the model to infer all the audio of the eight speakers in the test set, I found that there were some gaps between the SNR and LSD indicators and the paper. Maybe you randomly selected the clips instead of testing all of them?

Just wondering if the pretrained model will produce good results for non-English voice.

How to use this to upsample 16kHz to 48kHz

Hello.

I want to improve audio of my recordings

I didn't understand from readme that how can I achieve this shown in demo

in demo you have Section Ⅱ: Examples for samples upsampled from 16kHz to 48kHz.

that is what I want to do

here my video that I want to upsample. so what do I need to do? I am not interested in training

https://youtu.be/2zY1dQDGl3o

EOFError: Ran out of input

Hello,
I'm very interested in your great work!
When I run python trainer.py -r 629 -s, I get an error like this.

`
Validation sanity check: 0%| | 0/2 [00:00<?, ?it/s]Traceback (most recent call last):
File "C:\Users\edwin\anaconda3\envs\nuwave\lib\multiprocessing\popen_spawn_win32.py", line 93, in init
reduction.dump(process_obj, to_child)
File "C:\Users\edwin\anaconda3\envs\nuwave\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'create_vctk_dataloader..collate_fn'
Traceback (most recent call last):
File "C:\Users\edwin\anaconda3\envs\nuwave\lib\multiprocessing\spawn.py", line 126, in _main
self = reduction.pickle.load(from_parent)
EOFError: Ran out of input

Process finished with exit code 1 `

I have inquired a lot of information, but it has not been solved. Do you have any good suggestions.

What version of numpy/python are you using

I installed the git, created an venv, activated it, dl'd the dataset, removed the speakers, oriented the yaml and when I go to run the flac2wav conversions, I get the error that numpy doesnt support the complex term. I am assuming this is a change with the python version or numpy version.

when i run inference.py script an error occurs

2023-08-02 12:00:07.518068: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT /usr/local/lib/python3.10/dist-packages/torch/functional.py:641: UserWarning: stft with return_complex=False is deprecated. In a future pytorch release, stft will return complex tensors for all inputs, and return_complex=False will raise an error. Note: you can still call torch.view_as_real on the complex output to recover the old return format. (Triggered internally at ../aten/src/ATen/native/SpectralOps.cpp:862.) return _VF.stft(input, n_fft, hop_length, win_length, window, # type: ignore[attr-defined] Traceback (most recent call last): File "/content/nuwave2/inference.py", line 115, in wav_recon, wav_list = model.inference(wav_l, band, args.steps, noise_schedule) File "/usr/local/lib/python3.10/dist-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) File "/content/nuwave2/lightning_model.py", line 50, in inference signal, recon = self.model.denoise_ddim(signal, wav_l, band, logsnr_t, logsnr_s) File "/content/nuwave2/diffusion.py", line 54, in denoise_ddim noise = self.model(y, y_l, band, norm_nlogsnr) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/content/nuwave2/model.py", line 215, in forward x, skip_connection = layer(x, band, noise_level) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/content/nuwave2/model.py", line 173, in forward y_l, y_g = self.ffc1(y_l, y_g, band) # STFC File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/content/nuwave2/model.py", line 153, in forward out_xg = self.convl2g(x_l) + self.convg2g(x_g, band) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/content/nuwave2/model.py", line 125, in forward output = self.fu(x, band) File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(*args, **kwargs) File "/content/nuwave2/model.py", line 105, in forward output = torch.istft(ffted, self.n_fft, hop_length=self.hop_size, win_length=self.win_size, window=self.hann_window,
RuntimeError: istft requires a complex-valued input tensor matching the output from stft with return_complex=True.

Please any one can resolve this issue for me , thanks in advance

Tricks to prepare the training dataset

Hello,
I'm very interested in your great work! I have 3 questions, would you mind helping me with them?

I was wondering whether your network could be extended to use on other kinds of audio data, such as music. To this end, I tested it on different instrument datasets. In test 1, I had 33 instruments (like the "speaker" in your case), each containing only about 3 minutes of audio data. In test 2, I had 11 instruments, each containing about 1 hour of audio data. So overall, audio data in test1 is shorter than audio data in test2. However, when I ran the experiments on the same machine, test2 ran ~2-3 times faster than test1. Does the training speed have something to do with the number of "speakers" more than with the whole duration of the training data?
How would you recommend the duration for each .wav file. In your dataset, each piece of training data is rather short (~2-3 seconds). Would your network also work for long data such as 1-2 minutes?
For inference, you set the infer_step as 8 with a specific infer_schedule. Is 8 the best parameter in your experiments? If we want to test different infer_step, how should we set the infer_schedule?

Thank you very much for your help in advance!

demo page broke

https://mindslab-ai.github.io/nuwave2/ doesnt work

timestamp_path: 'vctk-silence-labels/vctk-silences.0.92.txt'

Hello, may I ask whether this file is missing？

Setting to reduce memory usage during inference?

I am using a device with over 30 GB of memory and its still topping out, hard. Is there a way to reduce this amount used during inference?

Cannot Execute Program

When running inference.py, an error occurs:

File "C:\Users\User\nuwave2\inference.py", line 71, in
model = NuWave2(hparams).to(args.device)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\lightning_fabric\utilities\device_dtype_mixin.py", line 54, in to
return super().to(*args, **kwargs)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
return self._apply(convert)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
module._apply(fn)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 820, in apply
param_applied = fn(param)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
File "C:\Users\User\AppData\Local\Programs\Python\Python310\lib\site-packages\torch\cuda_init.py", line 239, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

Also, I don't know what values to place for the placeholder arguments {--steps:option} {--gt:option}.

maum-ai / nuwave2 Goto Github PK

nuwave2's People

Contributors

Stargazers

Watchers

Forkers

nuwave2's Issues

Recommend Projects

Recommend Topics

Recommend Org