Git Product home page Git Product logo

Comments (19)

EmreOzkose avatar EmreOzkose commented on September 22, 2024 2

I want to report here. I updated k2 and run decode.py again. The problem is not occurring now, thank you. However hyps are coming empty :). After now, it is my design's problem :).

from icefall.

csukuangfj avatar csukuangfj commented on September 22, 2024 1

I would recommend you to update your k2.

k2 v1.6 contains several bug fixes, including the one you are facing, I think.
As you are using conda, steps to update k2 are fairly simple. Please see
https://k2.readthedocs.io/en/latest/installation/conda.html

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

Note that it can be also a memory issue, because I have a small memory (16gb). However, If the problem was a memory issue, I would expect to observe an error like:

RuntimeError: CUDA out of memory. Tried to allocate 420.00 MiB (GPU 0; 15.90 GiB total capacity; 3.23 GiB already allocated; 168.75 MiB free; 3.56 GiB reserved in total by PyTorch)

from icefall.

danpovey avatar danpovey commented on September 22, 2024

Perhaps it's trying to use >1 GPU somehow?  (But it shouldn't).  If that's the case, setting something likeCUDA_VISIBLE_DEVICES=0(or whatever)should address it.Another possibility is that cuda:-2 is not a real device but some kind of error code.  That error message likely comes from torch.I think it would be worthwhile to try to catch the error in pdb, and print out the devices of all inputs to the function that failed.Once we know which object has the bad device, we can more easily debug.

from icefall.

csukuangfj avatar csukuangfj commented on September 22, 2024
 File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 66, in forward
    return _k2.index_select(src, index, default_value)

Could you modify /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py , line 66,

print(src.device, index.device)
return _k2.index_select(src, index, default_value)

It may show something that is useful.

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

@csukuangfj I already printed devices before, but all of them was cuda:0.

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

@danpovey I have 4 devices, but before training, I am setting CUDA_VISIBLE_DEVICES=0. I will also try to debug with pdb.

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

I added try-catch block to function decode_one_batch() in decode.py as:

try:
    best_path = nbest_decoding(
        lattice=lattice,
        num_paths=params.num_paths,
        use_double_scores=params.use_double_scores,
    )
except:
    breakpoint()

when I run python -m pdb tdnn_lstm_ctc/decode.py --avg 1 --epoch 8:

(k2) yunusemre.ozkose@boxx-3:/path/to/k2/icefall/egs/from_wav_scp/ASR$ python -m pdb tdnn_lstm_ctc/decode.py --avg 1 --epoch 8
> /path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py(3)<module>()
-> import os
(Pdb) c
2021-09-02 15:43:01,990 INFO [decode.py:330] Decoding started
2021-09-02 15:43:01,990 INFO [decode.py:331] {'exp_dir': PosixPath('tdnn_lstm_ctc/exp9_w2v2'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 1024, 'subsampling_factor': 3, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'method': 'nbest', 'num_paths': 30, 'max_frames': 1000, 'epoch': 8, 'avg': 1, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 500.0, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': True, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'full_libri': False}
2021-09-02 15:43:02,604 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-09-02 15:43:02,963 INFO [decode.py:340] device: cuda:0
2021-09-02 15:43:09,784 INFO [checkpoint.py:75] Loading checkpoint from tdnn_lstm_ctc/exp9_w2v2/epoch-8.pt
/path/to/k2/lhotse/lhotse/dataset/sampling/single_cut.py:170: UserWarning: The first cut drawn in batch collection violates the max_frames or max_cuts constraints - we'll return it anyway. Consider increasing max_frames/max_cuts.
  warnings.warn(
2021-09-02 15:43:11,389 INFO [decode.py:277] batch 0, cuts processed until now is 1/171 (0.584795%)
> /path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py(185)decode_one_batch()
-> key = f"no_rescore-{params.num_paths}"
(Pdb) lattice.device
device(type='cuda', index=0)
(Pdb) 

Problem occurs in nbest_decoding(). Only lattice tensor is given to that function and its device is 0.

from icefall.

danpovey avatar danpovey commented on September 22, 2024

I think you are not quite at the place where it failed-need to do "c" (continue) maybe?

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

When I didn't add a try-catch block, log is :

(k2) yunusemre.ozkose@boxx-3:/path/to/k2/icefall/egs/from_wav_scp/ASR$ python -m pdb tdnn_lstm_ctc/decode.py --avg 1 --epoch 8
> /path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py(3)<module>()
-> import os
(Pdb) c
2021-09-02 16:33:33,700 INFO [decode.py:327] Decoding started
2021-09-02 16:33:33,701 INFO [decode.py:328] {'exp_dir': PosixPath('tdnn_lstm_ctc/exp9_w2v2'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 1024, 'subsampling_factor': 3, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'method': 'nbest', 'num_paths': 30, 'max_frames': 1000, 'epoch': 8, 'avg': 1, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 500.0, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': True, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'full_libri': False}
2021-09-02 16:33:34,178 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-09-02 16:33:34,494 INFO [decode.py:337] device: cuda:0
2021-09-02 16:33:45,349 INFO [checkpoint.py:75] Loading checkpoint from tdnn_lstm_ctc/exp9_w2v2/epoch-8.pt
/path/to/k2/lhotse/lhotse/dataset/sampling/single_cut.py:170: UserWarning: The first cut drawn in batch collection violates the max_frames or max_cuts constraints - we'll return it anyway. Consider increasing max_frames/max_cuts.
  warnings.warn(
2021-09-02 16:33:47,481 INFO [decode.py:274] batch 0, cuts processed until now is 1/171 (0.584795%)
Traceback (most recent call last):
  File "/path/to/miniconda3/envs/k2/lib/python3.8/pdb.py", line 1705, in main
    pdb._runscript(mainpyfile)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/pdb.py", line 1573, in _runscript
    self.run(statement)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/bdb.py", line 580, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py", line 3, in <module>
    import os
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py", line 418, in main
    results_dict = decode_dataset(
  File "/path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py", line 253, in decode_dataset
    hyps_dict = decode_one_batch(
  File "/path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py", line 176, in decode_one_batch
    best_path = nbest_decoding(
  File "/path/to/k2/icefall/icefall/decode.py", line 208, in nbest_decoding
    path_lattice = _intersect_device(
  File "/path/to/k2/icefall/icefall/decode.py", line 25, in _intersect_device
    return k2.intersect_device(
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/fsa_algo.py", line 204, in intersect_device
    out_fsas = k2.utils.fsa_from_binary_function_tensor(a_fsas, b_fsas,
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/utils.py", line 581, in fsa_from_binary_function_tensor
    value = index_select(a_value, a_arc_map, default_value=filler) \
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 160, in index_select
    ans = _IndexSelectFunction.apply(src, index, default_value)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 66, in forward
    return _k2.index_select(src, index, default_value)
RuntimeError: Specified device cuda:0 does not match device of data cuda:-2
Exception raised from from_blob at /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/include/ATen/Functions.h:2267 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f359b5e32f2 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f359b5e067b in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x28200 (0x7f34fa699200 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x10e0a1 (0x7f34fa77f0a1 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x84bce (0x7f34fa6f5bce in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x8858f (0x7f34fa6f958f in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #6: <unknown function> + 0x9f876 (0x7f34fa710876 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x1dfcf (0x7f34fa68efcf in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
<omitting python frames>
frame #13: THPFunction_apply(_object*, _object*) + 0x8fd (0x7f35f253a41d in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)

Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py(66)forward()
-> return _k2.index_select(src, index, default_value)
(Pdb) lattice.device
*** NameError: name 'lattice' is not defined
(Pdb) 

I can't reach lattice after error, hence I added try-catch block.

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

I added breakpoint to place where @csukuangfj said. Log is here:

(Pdb) c
> /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py(67)forward()
-> return _k2.index_select(src, index, default_value)
(Pdb) src.device; index.device; default_value;
device(type='cuda', index=0)
device(type='cuda', index=0)
0.0
(Pdb) c
> /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py(67)forward()
-> return _k2.index_select(src, index, default_value)
(Pdb) src.device; index.device; default_value;
device(type='cuda', index=0)
device(type='cuda', index=0)
0.0
(Pdb) c
> /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py(67)forward()
-> return _k2.index_select(src, index, default_value)
(Pdb) src.device; index.device; default_value;
device(type='cuda', index=0)
device(type='cuda', index=0)
0.0
(Pdb) c
> /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py(67)forward()
-> return _k2.index_select(src, index, default_value)
(Pdb) src.device; index.device; default_value;
device(type='cuda', index=0)
device(type='cuda', index=0)
0.0
(Pdb) c
> /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py(67)forward()
-> return _k2.index_select(src, index, default_value)
(Pdb) src.device; index.device; default_value;
device(type='cuda', index=0)
device(type='cuda', index=0)
0.0
(Pdb) c
Traceback (most recent call last):
  File "/path/to/miniconda3/envs/k2/lib/python3.8/pdb.py", line 1705, in main
    pdb._runscript(mainpyfile)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/pdb.py", line 1573, in _runscript
    self.run(statement)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/bdb.py", line 580, in run
    exec(cmd, globals, locals)
  File "<string>", line 1, in <module>
  File "/path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py", line 435, in <module>
    main()
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py", line 418, in main
    results_dict = decode_dataset(
  File "/path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py", line 253, in decode_dataset
    hyps_dict = decode_one_batch(
  File "/path/to/k2/icefall/egs/sestek/ASR/tdnn_lstm_ctc/decode.py", line 176, in decode_one_batch
    best_path = nbest_decoding(
  File "/path/to/k2/icefall/icefall/decode.py", line 208, in nbest_decoding
    path_lattice = _intersect_device(
  File "/path/to/k2/icefall/icefall/decode.py", line 25, in _intersect_device
    return k2.intersect_device(
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/fsa_algo.py", line 204, in intersect_device
    out_fsas = k2.utils.fsa_from_binary_function_tensor(a_fsas, b_fsas,
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/utils.py", line 581, in fsa_from_binary_function_tensor
    value = index_select(a_value, a_arc_map, default_value=filler) \
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 161, in index_select
    ans = _IndexSelectFunction.apply(src, index, default_value)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 67, in forward
    return _k2.index_select(src, index, default_value)
RuntimeError: Specified device cuda:0 does not match device of data cuda:-2
Exception raised from from_blob at /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/include/ATen/Functions.h:2267 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7fe9a54c82f2 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7fe9a54c567b in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x28200 (0x7fe904576200 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x10e0a1 (0x7fe90465c0a1 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x84bce (0x7fe9045d2bce in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x8858f (0x7fe9045d658f in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #6: <unknown function> + 0x9f876 (0x7fe9045ed876 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x1dfcf (0x7fe90456bfcf in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
<omitting python frames>
frame #13: THPFunction_apply(_object*, _object*) + 0x8fd (0x7fe9fc41f41d in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)

Uncaught exception. Entering post mortem debugging
Running 'cont' or 'step' will restart the program
> /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py(67)forward()
-> return _k2.index_select(src, index, default_value)
(Pdb) 

the place in miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py :

65: breakpoint()
66: return _k2.index_select(src, index, default_value)

from icefall.

danpovey avatar danpovey commented on September 22, 2024

from icefall.

danpovey avatar danpovey commented on September 22, 2024

from icefall.

csukuangfj avatar csukuangfj commented on September 22, 2024

https://k2.readthedocs.io/en/latest/installation/for_developers.html

The above link contains instructions to build a debug version of k2.

from icefall.

csukuangfj avatar csukuangfj commented on September 22, 2024

I added breakpoint to place where @csukuangfj said. Log is here:

Could you also print the shape of src and index?

print(src.shape)
print(index.shape)

to verify that neither of them is empty?

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

I checked if index or src is empty, and noticed that index is empty when the problem occurs.

(k2) yunusemre.ozkose@boxx-3:/path/to/k2/icefall/egs/from_wav_scp/ASR$ python tdnn_lstm_ctc/decode.py --avg 1 --epoch 8
2021-09-03 08:14:46,220 INFO [decode.py:327] Decoding started
2021-09-03 08:14:46,220 INFO [decode.py:328] {'exp_dir': PosixPath('tdnn_lstm_ctc/exp9_w2v2'), 'lang_dir': PosixPath('data/lang_phone'), 'lm_dir': PosixPath('data/lm'), 'feature_dim': 1024, 'subsampling_factor': 3, 'search_beam': 20, 'output_beam': 5, 'min_active_states': 30, 'max_active_states': 10000, 'use_double_scores': True, 'method': 'nbest', 'num_paths': 30, 'max_frames': 1000, 'epoch': 8, 'avg': 1, 'feature_dir': PosixPath('data/fbank'), 'max_duration': 500.0, 'bucketing_sampler': False, 'num_buckets': 30, 'concatenate_cuts': True, 'duration_factor': 1.0, 'gap': 1.0, 'on_the_fly_feats': False, 'full_libri': False}
2021-09-03 08:14:46,837 INFO [lexicon.py:96] Loading pre-compiled data/lang_phone/Linv.pt
2021-09-03 08:14:47,150 INFO [decode.py:337] device: cuda:0
2021-09-03 08:14:55,636 INFO [checkpoint.py:75] Loading checkpoint from tdnn_lstm_ctc/exp9_w2v2/epoch-8.pt
/path/to/k2/lhotse/lhotse/dataset/sampling/single_cut.py:170: UserWarning: The first cut drawn in batch collection violates the max_frames or max_cuts constraints - we'll return it anyway. Consider increasing max_frames/max_cuts.
  warnings.warn(
cuda:0 cuda:0
torch.Size([36466453]) torch.Size([729618])
cuda:0 cuda:0
torch.Size([36466453]) torch.Size([729618])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([729618])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([729618])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([729618])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([729618])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([729618])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([15309908])
cuda:0 cuda:0
torch.Size([562]) torch.Size([15309908])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([15309908])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([15309908])
cuda:0 cuda:0
torch.Size([729618]) torch.Size([15309908])
cuda:0 cuda:0
torch.Size([15309908]) torch.Size([106588])
cuda:0 cuda:0
torch.Size([15309908]) torch.Size([106588])
cuda:0 cuda:0
torch.Size([15309908]) torch.Size([106588])
cuda:0 cuda:0
torch.Size([106588]) torch.Size([106588])
cuda:0 cuda:0
torch.Size([106588]) torch.Size([106588])
cuda:0 cuda:0
torch.Size([106588]) torch.Size([106588])
cuda:0 cuda:0
torch.Size([30]) torch.Size([1])
2021-09-03 08:14:57,654 INFO [decode.py:274] batch 0, cuts processed until now is 1/171 (0.584795%)
cuda:0 cuda:0
torch.Size([36466453]) torch.Size([1375261])
cuda:0 cuda:0
torch.Size([36466453]) torch.Size([1375261])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([1375261])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([1375261])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([1375261])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([1375261])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([1375261])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([36303965])
cuda:0 cuda:0
torch.Size([2322]) torch.Size([36303965])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([36303965])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([36303965])
cuda:0 cuda:0
torch.Size([1375261]) torch.Size([36303965])
cuda:0 cuda:0
torch.Size([36303965]) torch.Size([178240])
cuda:0 cuda:0
torch.Size([36303965]) torch.Size([178240])
cuda:0 cuda:0
torch.Size([36303965]) torch.Size([178240])
cuda:0 cuda:0
torch.Size([178240]) torch.Size([178240])
cuda:0 cuda:0
torch.Size([178240]) torch.Size([178240])
cuda:0 cuda:0
torch.Size([178240]) torch.Size([178240])
cuda:0 cuda:0
torch.Size([30]) torch.Size([1])
cuda:0 cuda:0
torch.Size([36466453]) torch.Size([749184])
cuda:0 cuda:0
torch.Size([36466453]) torch.Size([749184])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([749184])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([749184])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([749184])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([749184])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([749184])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([21094213])
cuda:0 cuda:0
torch.Size([1308]) torch.Size([21094213])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([21094213])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([21094213])
cuda:0 cuda:0
torch.Size([749184]) torch.Size([21094213])
cuda:0 cuda:0
torch.Size([21094213]) torch.Size([101191])
cuda:0 cuda:0
torch.Size([21094213]) torch.Size([101191])
cuda:0 cuda:0
torch.Size([21094213]) torch.Size([101191])
cuda:0 cuda:0
torch.Size([101191]) torch.Size([101191])
cuda:0 cuda:0
torch.Size([101191]) torch.Size([101191])
cuda:0 cuda:0
torch.Size([101191]) torch.Size([101191])
cuda:0 cuda:0
torch.Size([30]) torch.Size([1])
cuda:0 cuda:0
torch.Size([36466453]) torch.Size([183094])
cuda:0 cuda:0
torch.Size([36466453]) torch.Size([183094])
cuda:0 cuda:0
torch.Size([183094]) torch.Size([183094])
cuda:0 cuda:0
torch.Size([183094]) torch.Size([183094])
cuda:0 cuda:0
torch.Size([183094]) torch.Size([183094])
cuda:0 cuda:0
torch.Size([183094]) torch.Size([183094])
cuda:0 cuda:0
torch.Size([183094]) torch.Size([183094])
cuda:0 cuda:0
torch.Size([183094]) torch.Size([0])
Traceback (most recent call last):
  File "tdnn_lstm_ctc/decode.py", line 435, in <module>
    main()
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "tdnn_lstm_ctc/decode.py", line 418, in main
    results_dict = decode_dataset(
  File "tdnn_lstm_ctc/decode.py", line 253, in decode_dataset
    hyps_dict = decode_one_batch(
  File "tdnn_lstm_ctc/decode.py", line 176, in decode_one_batch
    best_path = nbest_decoding(
  File "/path/to/k2/icefall/icefall/decode.py", line 208, in nbest_decoding
    path_lattice = _intersect_device(
  File "/path/to/k2/icefall/icefall/decode.py", line 25, in _intersect_device
    return k2.intersect_device(
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/fsa_algo.py", line 204, in intersect_device
    out_fsas = k2.utils.fsa_from_binary_function_tensor(a_fsas, b_fsas,
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/utils.py", line 581, in fsa_from_binary_function_tensor
    value = index_select(a_value, a_arc_map, default_value=filler) \
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 163, in index_select
    ans = _IndexSelectFunction.apply(src, index, default_value)
  File "/path/to/miniconda3/envs/k2/lib/python3.8/site-packages/k2/ops.py", line 69, in forward
    return _k2.index_select(src, index, default_value)
RuntimeError: Specified device cuda:0 does not match device of data cuda:-2
Exception raised from from_blob at /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/include/ATen/Functions.h:2267 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x42 (0x7f42803f82f2 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x5b (0x7f42803f567b in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libc10.so)
frame #2: <unknown function> + 0x28200 (0x7f41df4f8200 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #3: <unknown function> + 0x10e0a1 (0x7f41df5de0a1 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #4: <unknown function> + 0x84bce (0x7f41df554bce in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #5: <unknown function> + 0x8858f (0x7f41df55858f in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #6: <unknown function> + 0x9f876 (0x7f41df56f876 in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
frame #7: <unknown function> + 0x1dfcf (0x7f41df4edfcf in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/_k2.cpython-38-x86_64-linux-gnu.so)
<omitting python frames>
frame #13: THPFunction_apply(_object*, _object*) + 0x8fd (0x7f42d734f41d in /path/to/miniconda3/envs/k2/lib/python3.8/site-packages/torch/lib/libtorch_python.so)
frame #52: __libc_start_main + 0xe7 (0x7f43096adb97 in /lib/x86_64-linux-gnu/libc.so.6)

from icefall.

csukuangfj avatar csukuangfj commented on September 22, 2024

@EmreOzkose
Could you show us the version of k2 you are using?

$ python3 -m k2.version

should give you such information.

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

@csukuangfj
My version info is :

Collecting environment information...

k2 version: 1.3
Build type: Release
Git SHA1: 6b8a10fa95213da285b8fce6525b2c5ed42198a6
Git date: Tue Aug 3 05:36:48 2021
Cuda used to build k2: 11.1
cuDNN used to build k2: 8.0.5
Python version used to build k2: 3.8
OS used to build k2: Ubuntu 16.04.7 LTS
CMake version: 3.18.4
GCC version: 5.5.0
CMAKE_CUDA_FLAGS:  --expt-extended-lambda -gencode arch=compute_35,code=sm_35 --expt-extended-lambda -gencode arch=compute_50,code=sm_50 --expt-extended-lambda -gencode arch=compute_60,code=sm_60 --expt-extended-lambda -gencode arch=compute_61,code=sm_61 --expt-extended-lambda -gencode arch=compute_70,code=sm_70 --expt-extended-lambda -gencode arch=compute_75,code=sm_75 -D_GLIBCXX_USE_CXX11_ABI=0 --compiler-options -Wall --compiler-options -Wno-unknown-pragmas --compiler-options -Wno-strict-overflow
CMAKE_CXX_FLAGS:  -D_GLIBCXX_USE_CXX11_ABI=0 -Wno-strict-overflow
PyTorch version used to build k2: 1.8.1
PyTorch is using Cuda: 11.1
NVTX enabled: True
With CUDA: True
Disable debug: True
Sync kernels : False
Disable checks: False

I think I understand the issue. I am trying different architectures and features. Since my memory is small, when I increase number of layer of the model, I have to decrease max_frames. When I use small number of frames (like 5000), index comes 0 for some batches.

from icefall.

EmreOzkose avatar EmreOzkose commented on September 22, 2024

Thank you so much! I am updating at once.

from icefall.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.