sony / ai-research-code Goto Github PK

View Code? Open in Web Editor NEW

338.0 338.0 62.0 79.68 MB

License: Apache License 2.0

Python 89.40% Dockerfile 0.27% Jupyter Notebook 2.27% Shell 0.01% HTML 8.05%

ai-research-code's People

Contributors

Stargazers

Watchers

ai-research-code's Issues

[NVC-Net] About 16 kHz training and model convergence

Hi,

Thank you for sharing your great work!

I'm using nvcnet to train a Japanese voice conversion model, I have two questions.

First, I try to adapt your code to 16 kHz wavs, I did the following two manipulations:

changed sr in hparams.py from 22050 into 16000
changed segment_length in hparams.py from 32768 into 16384
The training goes well but the performance is bad even after 400 epochs.

I wonder if you have any idea on training nvcnet on 16 kHz wavs? Do I need any other modifications to ensure the training will go well ?

Second, could you share the value of g_loss_rec when the model converges?.
In my training the g_loss_rec converged to around 0.9 to 1.2, I'm not sure if this is what I should expect in model convergence.

Value error in nnabla on running x-umx on Rpi4, Raspberry Pi OS

On running x-umx on Rpi4, 8GB on Raspberry Pi OS i get the below error,

root@raspberrypi:/home/pi/x-umx# python3 test.py --inputs ../Music/test_16k_S16_LE_stereo.wav --context cpu --model /home/pi/x-umx/x-umx.h5 --outdir /home/pi/x-umx/results
2021-01-30 16:37:37,007 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "test.py", line 198, in
test()
File "test.py", line 170, in test
residual_model=args.residual_model
File "test.py", line 84, in separate
mix_spec, msk, _ = unmix_target(audio_nn, test=True)
File "/home/pi/x-umx/model.py", line 300, in call
lstm_out_bass = self.lstm(cross_1, nb_samples, "lstm_bass", test)
File "/home/pi/x-umx/model.py", line 231, in lstm
bidirectional=not self.unidirectional, training=not test, dropout=0.4, name=scope_name)
File "", line 8, in lstm
File "/usr/local/lib/python3.7/dist-packages/nnabla/parametric_functions.py", line 1567, in lstm
return F.lstm(x, h, c, weight_l0=w0, weight=w, bias=b, num_layers=num_layers, dropout=dropout, bidirectional=bidirectional, training=training)
File "", line 3, in lstm
File "/usr/local/lib/python3.7/dist-packages/nnabla/function_bases.py", line 222, in lstm
return F.LSTM(ctx, num_layers, dropout, bidirectional, training)(*inputs, n_outputs=n_outputs, auto_forward=get_auto_forward(), outputs=outputs)
File "function.pyx", line 292, in nnabla.function.Function.call
File "function.pyx", line 271, in nnabla.function.Function.cg_call
RuntimeError: value error in setup_impl
/home/pi/x-umx/nnabla/src/nbla/function/./generic/split.cpp:36
Failed num_outputs == outputs.size(): inputs[0].shape[axis] must be the same number as the outputs. inputs[0].shape[axis]: 431, outputs: 2.

I have successfully manually built & installed nnabla & llvmlite.
The latter was really very difficult to build & install.
root@raspberrypi:/home/pi/x-umx# pip3 freeze | grep 'nnabla'
nnabla==1.9.0

I think, now it is throwing error related to nnabla, of the input parameter size not equivalent to
output parameter size. Can you please suggest, where we need to set this nnabla files ?

Please help me.
Regards,
Rajiv.

[Mixed Precision DNNs]: ImageNet codebase?

Can you share the working codebase to replicate paper results for ImageNet with MixedPrecisionQuantization?

Pretrained model

Hi, can you release the pre-trained model of MobileNetV2?

Bad output sound quality

I noticed that output quality is much worse than in input file. Maybe there is some config, which cuts some frequencies from output file?

In spleeter there is issue like that, which can be resolved with simply config change.

Both pretrained models (openvino and default) provide same bad quality. Sounds like high frequencies are cutted off.

Thanks in advance!

[X-UMX] Bad performance when using --targets

Hi,

I am doing some tests on the Google Collab you provide and I've seen that the performance varies a lot if I use the flag --targets vocals respect to if I just run the default test command:
!python test.py --inputs $filename --out-dir results --model models/x-umx.h5

Why is this happening? My goal is to have audio + accompaniment

Many thanks in advance,

Guillem

D3Net Music Source Separation pretrained model not downloadable

The link to the pretrained weights seems to be dead: https://nnabla.org/pretrained-models/ai-research-code/d3net/mss/d3net-mss.zip

No pretrained models

Hello!
I see now all links for pretrained models are 404, where can I get these models?

【NVC-Net】RuntimeError: target_specific error in backward_impl. Failed `status == CUDNN_STATUS_SUCCESS`: UNKNOWN

Hi, I try to train NVC-Net on single gpu, but I meet some errors as follows:

value error in query
/home/gitlab-runner/builds/jmdP2aBr/1/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:69
Failed it != items_.end(): Any of [cudnn:float, cuda:float, cpu:float] could not be found in []

No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2022-02-15 17:16:13,887 [nnabla][INFO]: Training data with 100 speakers.
2022-02-15 17:16:13,888 [nnabla][INFO]: DataSource with shuffle(True)
2022-02-15 17:16:13,934 [nnabla][INFO]: Using DataIterator
Running epoch=1 lr=0.00010
Error during backward propagation:
Add2CudaCudnn
Add2CudaCudnn
Add2CudaCudnn
MulScalarCuda
MeanCudaCudnn
SquaredErrorCuda
Div2Cuda
PowScalarCuda
SumCuda
AddScalarCuda
PowScalarCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
GELUCuda
Add2CudaCudnn
ConvolutionCudaCudnn
Mul2Cuda
TanhCudaCudnn <-- ERROR
Traceback (most recent call last):
File "main.py", line 99, in
run(args)
File "main.py", line 70, in run
Trainer(gen, gen_optim, dis, dis_optim, dataloader, rng, hp).run()
File "11_ai-research-code-master/nvcnet/train.py", line 157, in run
self.train_on_batch(i)
File "11_ai-research-code-master/nvcnet/train.py", line 197, in train_on_batch
p['g_loss'].backward(clear_buffer=True)
File "_variable.pyx", line 826, in nnabla._variable.Variable.backward
RuntimeError: target_specific error in backward_impl
/home/gitlab-runner/builds/-phDBBa6/0/nnabla/builders/all/nnabla-ext-cuda/src/nbla/cuda/cudnn/function/./generic/tanh.cu:79
Failed status == CUDNN_STATUS_SUCCESS: UNKNOWN

I had followed the install page: https://nnabla.org/install/, but it does not work. Could you please give some suggestion?
My environments as follows:
CUDA11.0, cudnn 8.1.0, python 3.6.8

Thank you ! Look forward to your kind reply.

NVC-Net Training

Hi, thanks for releasing the code for NVC-Net. I've got two questions:

Firstly, when trying to train on multiple GPUs, I run into the following error:

Failed `it != items_.end()`: Any of [cudnn:float, cuda:float, cpu:float] could not be found in []
No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.

which basically means it's only running on one GPU. In fact I get the same error simply by running the following

import nnabla.communicators as C
from nnabla.ext_utils import get_extension_context
ctx = get_extension_context("cudnn", device_id='0')
C.MultiProcessDataParallelCommunicator(ctx)

I know this is probably more of a nnabla issue but as a PyTorch user I'm not sure where to get help with nnabla.

Secondly, is it normal for the content preservation loss g_loss_con to be 0.0 for the first few epochs? I'm finding that the encoder basically encodes everything to the same vector in the hidden dimension, hence the loss is 0.0. For reference I'm also using the VCTK dataset processed with the given script with default parametres.

Thanks alot!

additional conda env dependencies needed

to get this working I had to make the following changes to environment-gpu.yml

name: open-unmix-nnabla-gpu

channels:

conda-forge

dependencies:
- pip

python=3.6
numpy=1.16
scikit-learn=0.21
tqdm=4.28
cudatoolkit=10.0
cudnn
ffmpeg
pip:
- soundfile
- musdb
- norbert
- resampy
- nnabla
- nnabla-ext-cuda100
- pydub

test wavfile using cpu gives Segmentation fault

Hi,

I am using a pre-train model in a CPU environment with --context cpu option. my mixed wave file is 2 min long and I have a 16GB ram and an octa-core cpu system.

I am trying to run the test command for the pre-trained model and it gives segmentation fault. please check the below command and logs.
command: python test.py --input inputs/dm_mixed_vocal_and music_0002.wav --context cpu --model model/x-umx.h5 --outdir outputs/
logs:
2021-01-08 02:02:41,221 [nnabla][INFO]: Initializing CPU extension...
Segmentation fault (core dumped)

why this shows segmentation fault I don't know? Also, my system ram is not full at the time of segfault.
please suggest some ideas to resolve this issue.

Memory allocation failed

I tried to train with 2 GPUs by docker, but after one epoch, memory errors in allocation occur. I am not sure what to check and what's wrong possibly.

While running the d3net music seperation jupyter notebook in collab "AttributeError: module 'pynvml.nvml' has no attribute 'nvml_lib'" error is coming

2021-06-07 12:26:53,837 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "separate.py", line 96, in
ch_flip_average=True
File "separate.py", line 27, in run_separation
ctx = get_extension_context(args.context)
File "/usr/local/lib/python3.7/dist-packages/nnabla/ext_utils.py", line 97, in get_extension_context
mod = import_extension_module(ext_name)
File "/usr/local/lib/python3.7/dist-packages/nnabla/ext_utils.py", line 46, in import_extension_module
return importlib.import_module('.' + ext_name, 'nnabla_ext')
File "/usr/lib/python3.7/importlib/init.py", line 127, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "", line 1006, in _gcd_import
File "", line 983, in _find_and_load
File "", line 967, in _find_and_load_unlocked
File "", line 677, in _load_unlocked
File "", line 728, in exec_module
File "", line 219, in _call_with_frames_removed
File "/usr/local/lib/python3.7/dist-packages/nnabla_ext/cudnn/init.py", line 18, in
import nnabla_ext.cuda
File "/usr/local/lib/python3.7/dist-packages/nnabla_ext/cuda/init.py", line 114, in
check_gpu_compatibility()
File "/usr/local/lib/python3.7/dist-packages/nnabla_ext/cuda/init.py", line 71, in check_gpu_compatibility
from nnabla.utils.nvml import pynvml
File "/usr/local/lib/python3.7/dist-packages/nnabla/utils/nvml.py", line 39, in
load_nvml_for_win()
File "/usr/local/lib/python3.7/dist-packages/nnabla/utils/nvml.py", line 26, in load_nvml_for_win
if not (nvml.nvml_lib == None and sys.platform[:3] == "win"):
AttributeError: module 'pynvml.nvml' has no attribute 'nvml_lib'

【NVC-Net】ImportError: libcudart.so.10.2: cannot open shared object file: No such file or directory

and the details on the GPU is -
GPU: Tesla P100-PCIE-16gb
driver version: 450.119.04
CUDA version: 11.0

I'm using Kaggle to try and run this and keep running into this error. I tried running the docker file but that has no success in successfully running either. I've tried many online methods to resolve this but none seem to be working unfortunately. Could anyone potentially help with this issue?

Adding additional speakers - transfer learning

Has anyone figured out a way to use this algorithm to due transfer learning?

Say I train with 100 speakers and want to train the model with an additional 20 speakers. It appears that you have to retrain from the start rather than adding a set of 20 new latent spaces and training this new data.

Anyone tried this? Would be great to be able to transfer what's been learned, but tough on GANs.

Bests,
philip

X-UMX - Separating using --context cpu takes very long

Thank you so much for open-sourcing X-UMX!

Is it in a usable state right now?

I followed your instructions to perform source separation with the model as listed here.

I ran test.py with the flag --context cpu and separation is taking a very long time, it took over 15 minutes for separating a 3-minute track. I have 16 GB of RAM. Is CPU-based separation supposed to take this long?

Update: It ended up exhausting all my memory and crashed.

resuming training from checkpoint

Hi,
How do we resume training on x-umx from a checkpoint?
--checkpoint argument (as in umx) seems to be unrecognised
Thank you

X-UMX gets stuck when training

Hello i am trying to train a model using X-UMX on a single gpu. I am using the 7 second preview version of musdb just for testing.

After compute dataset statistics reaches 100% the next line is stuck at 0%

This is everything i did

cd /home/ubuntu/Downloads/ai-research-code/x-umx
ubuntu@:/Downloads/ai-research-code/x-umx$ conda activate open-unmix-nnabla-gpu
(open-unmix-nnabla-gpu) ubuntu@-:~/Downloads/ai-research-code/x-umx$ python train.py --output /home/ubuntu/Downloads/ai-research-code/x-umx/weights
2021-07-27 23:18:56,894 [nnabla][INFO]: Initializing CPU extension...
/home/ubuntu/anaconda3/envs/open-unmix-nnabla-gpu/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)
/home/ubuntu/anaconda3/envs/open-unmix-nnabla-gpu/lib/python3.6/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
return f(*args, **kwds)
2021-07-27 23:18:57,502 [nnabla][INFO]: Initializing CUDA extension...
2021-07-27 23:18:57,559 [nnabla][INFO]: Initializing cuDNN extension...
value error in query
/home/gitlab-runner/builds/-phDBBa6/3/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:69
Failed it != items_.end(): Any of [cudnn:float, cuda:float, cpu:float] could not be found in []

No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2021-07-27 23:18:57,560 [nnabla][INFO]: [Communicator] Using gpu_id = 0 as rank = 0
Mixing coef. is 10.0, i.e., MDL = 10.0*TD-Loss + FD-Loss
2021-07-27 23:18:57,561 [nnabla][INFO]: DataSource with shuffle(True)
tracks=80
2021-07-27 23:18:59,025 [nnabla][INFO]: DataSource with shuffle(False)
tracks=14
2021-07-27 23:18:59,289 [nnabla][INFO]: Using DataIterator
2021-07-27 23:18:59,289 [nnabla][INFO]: Using DataIterator
max_iter 320
Compute dataset statistics: 100%|███████████████| 80/80 [00:12<00:00, 8.11it/s]
0%| | 0/1000 [00:00<?, ?it/s]

The GPU memory and power is being used but nothing seems to be happening, please can anyone help me?

Large delay during inference

I'm running the nvcnet model after training and during inference there is a large time delay. The first inference is always large being around 3 seconds. All others have delays of about 1.2 seconds. The length of the wav file input doesn't change the delay.

After looking deeper it appears the delay is caused by the model construction. Is there a way to create the model object once and just do inference?

The delay makes the numbers posted in the paper to be false as you won't get the fast inference times that are published in the paper with these delays.

Any thoughts? I am still in the process of learning nnabla so maybe I am missing something about the library.

Best regards,
Philip

D3Net: inference on CPU

Separation of one source (vocals) from 3-minute track using D3Net takes ~2.5 hours on machine with 4 cores. Is there a way to speed up inference on CPU?

hi，where is tvc-gmm code？

[Quantized Depth Completion] Questions about implementation details

First of all, thanks for the great work, but the source code is still missing.
Could you share the training/evaluating code and pretrained weights about this work?

Also, I'm trying to reimplement with PyTorch and I have some questions about the paper:

How to compute the surface normal from ground truth depth in NYU Depth v2? The paper only shows the approximation for training but not the accurate one?!
The dot pattern to produce sparse depth in NYU Depth v2 is unknown. Can you share the example to reproduce?
The kernel_size of MaxPooling2D is missing
The kernel_size, filter_size of Conv2d in Upsampling layer is missing

Thanks! Look forward to your kind reply.

pretrained NVC model

Any plans to release NVC model?

outputs not saved

the output wav files are not saved on disk anywhere, i tried multiple combinations, without the output argument, with the output argument, etc, none seem to work.

Question about Mixed Precision DNNs

hello, I have a question about the implementations of CASE U3:
why the formula of calculating bitwidth is different between weight and activation?
https://github.com/sony/ai-research-code/blob/master/mixed-precision-dnns/train_resnet.py#L174
https://github.com/sony/ai-research-code/blob/master/mixed-precision-dnns/train_resnet.py#L264

Hope for reply. Thanks!

Segmentation fault and RuntimeError: value error in setup_impl

Hi,
I am trying to train using the X-UMX model on Google Colab, getting stuck on this error. Kindly help.

2022-07-02 10:30:45,285 [nnabla][INFO]: Initializing CPU extension...
2022-07-02 10:30:45,626 [root][INFO]: Generating grammar tables from /usr/lib/python3.7/lib2to3/Grammar.txt
2022-07-02 10:30:45,643 [root][INFO]: Generating grammar tables from /usr/lib/python3.7/lib2to3/PatternGrammar.txt
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
/usr/lib/python3.7/importlib/_bootstrap.py:219: RuntimeWarning: numpy.ufunc size changed, may indicate binary incompatibility. Expected 192 from C header, got 216 from PyObject
  return f(*args, **kwds)
2022-07-02 10:30:46,093 [nnabla][INFO]: Initializing CUDA extension...
2022-07-02 10:30:46,111 [nnabla][INFO]: Initializing cuDNN extension...
2022-07-02 10:30:46,558 [nnabla][INFO]: [Communicator] Using gpu_id = 0 as rank = 0
2022-07-02 10:30:46,610 [nnabla][INFO]: DataSource with shuffle(True)
Finished loading dataset with 86 tracks.
2022-07-02 10:30:52,995 [nnabla][INFO]: DataSource with shuffle(False)
Finished loading dataset with 14 tracks.
2022-07-02 10:30:54,130 [nnabla][INFO]: Using DataIterator
2022-07-02 10:30:54,131 [nnabla][INFO]: Using DataIterator
Compute dataset statistics: 100% 86/86 [01:28<00:00,  1.03s/it]
Traceback (most recent call last):
  File "train.py", line 207, in <module>
    train()
  File "train.py", line 113, in train
    model = get_model(args, scaler_mean, scaler_std, max_bin=max_bin)
  File "/content/ai-research-code/x-umx/model.py", line 431, in get_model
    mix_spec, m_hat, pred = unmix(mixture_audio)
  File "/content/ai-research-code/x-umx/model.py", line 327, in __call__
    self.n_fft, window_type='hanning', center=True)
  File "/usr/local/lib/python3.7/dist-packages/nnabla/functions.py", line 1101, in istft
    return istft_base(y_r, y_i, window_size, stride, fft_size, window_type, center, pad_mode, as_stft_backward)
  File "<istft>", line 3, in istft
  File "/usr/local/lib/python3.7/dist-packages/nnabla/function_bases.py", line 4926, in istft
    return F.ISTFT(ctx, window_size, stride, fft_size, window_type, center, pad_mode, as_stft_backward)(y_r, y_i, n_outputs=n_outputs, auto_forward=get_auto_forward(), outputs=outputs)
  File "function.pyx", line 328, in nnabla.function.Function.__call__
  File "function.pyx", line 306, in nnabla.function.Function._cg_call
RuntimeError: value error in setup_impl
/home/gitlab-runner/builds/LRsSYq-B/0/nnabla/builders/all/nnabla/src/nbla/function/./generic/istft.cpp:95
Failed `this->pad_mode_ == "constant"`: `pad_mode` should be "constant" for the normal use of ISTFT (`as_stft_backward == false`) since `pad_mode` is ignored and makes no effects in that case.

[b506019edf61:01753] *** Process received signal ***
[b506019edf61:01753] Signal: Segmentation fault (11)
[b506019edf61:01753] Signal code: Address not mapped (1)
[b506019edf61:01753] Failing at address: 0x7f0c3314f20d
[b506019edf61:01753] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12980)[0x7f0c35bf4980]
[b506019edf61:01753] [ 1] /lib/x86_64-linux-gnu/libc.so.6(getenv+0xa5)[0x7f0c35833775]
[b506019edf61:01753] [ 2] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(_ZN13TCMallocGuardD1Ev+0x34)[0x7f0c3609ee44]
[b506019edf61:01753] [ 3] /lib/x86_64-linux-gnu/libc.so.6(__cxa_finalize+0xf5)[0x7f0c35834605]
[b506019edf61:01753] [ 4] /usr/lib/x86_64-linux-gnu/libtcmalloc.so.4(+0x13cb3)[0x7f0c3609ccb3]
[b506019edf61:01753] *** 
End of error message ***

Steps followed:
!pip install musdb norbert pydub
!pip install nnabla
!pip install nnabla-ext-cuda110-nccl2-mpi3-1-6
!pip uninstall urllib3 -y
!pip uninstall folium -y
!pip install folium==0.2.1
!pip install urllib3==1.25.*

!git clone https://github.com/sony/ai-research-code.git
%cd ai-research-code/x-umx
!mkdir models
!wget -P models https://nnabla.org/pretrained-models/ai-research-code/x-umx/x-umx.h5

!python train.py --root /content/drive/MyDrive/dataset --output /content/drive/MyDrive/crossnet/ --is-wav --epochs 10 --lr 0.001

D3Net inference accepts only wav audio as input

Currently separate.py does not accept the other audio format than wav. Also, it doesn't give a proper error message to users when we feed non-wav files.

[NVC-NET]Inference in CPU environment

In the paper you calculate the speed of NVC-NET in CPU, could you please share the code to inference in CPU environment?

Chinese supported?

Thank you for opensourcing the code. I'd like to know is mandarin Chinese supported in this project?

D3Net: training code

When will the training code for music source separation be released?

【NVC-Net】How many epochs will the model converge?

e.g. For the VTCK dataset

Besides, have you tested whether the model is robust with noisy source files (e.g. recorded by mobile phone, with background of air conditioning, or heavy breathing, which is quite common in real life application) at inference time?

Thank you very much

MobileNet implementation for Mixed Precision DNNs

Hello,
Thanks for the great work.
I'm wondering if there's implementation for MobileNet for the Mixed Precision DNNs paper.

All the best,
Mohammed

【NVC-net】Failed `it != items_.end()`: Any of [cudnn:float, cuda:float, cpu:float] could not be found in []

Hi. I tried to train NVC-net, but the following error occurs:

2021-11-26 03:23:06,638 [nnabla][INFO]: Initializing CPU extension...
2021-11-26 03:23:06,997 [nnabla][INFO]: Initializing CUDA extension...
2021-11-26 03:23:09,117 [nnabla][INFO]: Initializing cuDNN extension...
value error in query
/home/gitlab-runner/builds/zxvvzZDJ/1/nnabla/builders/all/nnabla/include/nbla/function_registry.hpp:69
Failed  it != items_.end() : Any of [cudnn:float, cuda:float, cpu:float] could not be found in []

No communicator found. Running with a single process. If you run this with MPI processes, all processes will perform totally same.
2021-11-26 03:23:09,406 [nnabla][INFO]: Training data with 103 speakers.
2021-11-26 03:23:09,407 [nnabla][INFO]: DataSource with shuffle(True)
2021-11-26 03:23:09,464 [nnabla][INFO]: Using DataIterator
Running epoch=1 lr=0.00010
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate again.
Error during forward propagation:

Environment: Tesla T4, Cuda 10.2, Cudnn 8.1, Ubuntu 18.04.4 LTS.

I installed nnabla with pip install nnabla-ext-cuda102.
Besides, if I want to train the model with only one GPU, is python3 main.py the right command?

NVCnet g_loss_con=0.0000 while training

i met two problems
first one is the same as the issue"https://github.com/sony/ai-research-code/issues/54" mentioned before
i tried to use docker environment that suggested in that issue
`docker pull nnabla/nnabla-ext-cuda-multi-gpu:py37-cuda110-mpi3.1.6-v1.29.0
docker run --rm -it -u $(id -u):$(id -g) --gpus all nnabla/nnabla-ext-cuda-multi-gpu:py37-cuda110-mpi3.1.6-v1.29.0

mpirun -n 2 python3 -c "import nnabla_ext.cudnn; from nnabla.ext_utils import get_extension_context; import nnabla.communicators as C; ctx = get_extension_context('cudnn', device_id='0'); C.MultiProcessDataParallelCommunicator(ctx)"and it went well ![image](https://user-images.githubusercontent.com/63532787/201576169-9c27d8c4-b17e-438d-92bd-19843e2b984e.png) but then i tried to run the main.py and set the batchsize=8 and the error like thiswzy@2f0a2b4b4485:~/NVCnet$ mpirun -n 1 python main.py -c cudnn -d 7 --output_path log/baseline-wzy/ --batch_size 8
2022-11-14 04:17:10,938 [nnabla][INFO]: Initializing CPU extension...
2022-11-14 04:17:11,300 [nnabla][INFO]: Initializing CUDA extension...
2022-11-14 04:17:20,009 [nnabla][INFO]: Initializing cuDNN extension...
2022-11-14 04:17:20,359 [nnabla][INFO]: Training data with 103 speakers.
2022-11-14 04:17:20,360 [nnabla][INFO]: DataSource with shuffle(True)
2022-11-14 04:17:20,371 [nnabla][INFO]: Using DataIterator
Running epoch=1 lr=0.00010
[ 0/4689] d_loss 4.1589 (4.1589) g_loss_avd 2.0793 (2.0793) g_loss_con 0.0000 (0.0000) g_loss_rec 58.3829 (58.3829) g_loss_kld 0.0000 (0.0000)
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate. Freeing memory cache and retrying.
Failed to allocate again.
Error during backward propagation:
Add2CudaCudnn
Add2CudaCudnn
Add2CudaCudnn
MulScalarCuda
MeanCudaCudnn
SquaredErrorCuda
Div2Cuda
PowScalarCuda
SumCuda
AddScalarCuda
PowScalarCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
PadCuda
GELUCuda
ConvolutionCudaCudnn
GELUCuda
Add2CudaCudnn
ConvolutionCudaCudnn
Mul2Cuda
TanhCudaCudnn
SigmoidCudaCudnn
SliceCuda <-- ERROR
Traceback (most recent call last):
File "main.py", line 100, in
run(args)
File "main.py", line 70, in run
Trainer(gen, gen_optim, dis, dis_optim, dataloader, rng, hp).run()
File "/home/wzy/NVCnet/train.py", line 156, in run
self.train_on_batch(i)
File "/home/wzy/NVCnet/train.py", line 196, in train_on_batch
p['g_loss'].backward(clear_buffer=True)
File "_variable.pyx", line 827, in nnabla._variable.Variable.backward
RuntimeError: memory error in alloc
/home/gitlab-runner/builds/LRsSYq-B/0/nnabla/builders/all/nnabla/src/nbla/memory/memory.cpp:39
Failed this->alloc_impl(): N4nbla10CudaMemoryE allocation failed.

--------------------------------------------------------------------------`

when i changed the batchsize to 7 or 6 or 2，the glosscon is 0 all the time

this is the result of nvidia-msi

plz,enlighten me

Potential bug in xumx

I'm trying to run xumx (through https://github.com/JeffreyCA/spleeterweb-xumx, but it seems to be the same code).

In this line:

ai-research-code/x-umx/test.py

Line 79 in 4c02f0a

audio_nn = nn.Variable.from_numpy_array(audio.T[None, ...])

The function documentation says audio should be in the shape:

    audio: np.ndarray [shape=(nb_samples, nb_channels, nb_timesteps)]
        mixture audio

However, the ndarray is converted to a nnabla object with a transpose operator and a new dimension added:

    audio_nn = nn.Variable.from_numpy_array(audio.T[None, ...])

So when I pass audio like so:

x.shape: (1, 2, 9265664)

the audio_nn line results in this:

audio_nn: (1, 9265664, 2, 1)

Later on this fails in the STFT step from the __call__ function of the model:

ai-research-code/x-umx/model.py

Line 29 in 4c02f0a

nb_samples, nb_channels, _ = x.shape

It's better to have:

    audio_nn = nn.Variable.from_numpy_array(audio)

No pretrained NVC model

Where could I find pretrained NVC model?
Thank you in advanced.

【NVC-Net】Mutli-GPU training multiple models?

When training on multiple GPUs (2) it appears as though it's training 2 models at the same time, is this supposed to be the case?

This giving the following errors below

C:\Users\Vinicius111\Downloads\xumx-master\xumx-master\xumx>python test.py --input inputs/"C:\Users\Vinicius111\Music\Test.mp3"python test.py ----model model/C:\Users\Vinicius111\Downloads\x-umx.h5 --outdir outputs/
2021-01-30 12:04:32,295 [nnabla][INFO]: Initializing CPU extension...
Traceback (most recent call last):
File "test.py", line 28, in
from .args import get_inference_args
ModuleNotFoundError: No module named 'main.args'; 'main' is not a package

sony / ai-research-code Goto Github PK

ai-research-code's People

Contributors

Stargazers

Watchers

Forkers

ai-research-code's Issues

Recommend Projects

Recommend Topics

Recommend Org