hulianyuyy / corrnet Goto Github PK

View Code? Open in Web Editor NEW

87.0 87.0 16.0 4.26 MB

Continuous Sign Language Recognition with Correlation Network (CVPR 2023)

Python 98.83% Shell 1.17%

corrnet's Introduction

Hi👋, i'm a PhD candidate (2021.09-now) in Tianjin University, China. My major interests include video understanding, sign language understanding and multi-modal learning. I'd like to let the people benefit more from general computer vision techniques. For more information, please visit www.hulianyu.top. Feel free to contact me via [email protected].

✉ News:

We release CorrNet+, an unified model with superior performance on both continuous sign language recognition and sign language translation tasks by using only RGB inputs.
We release AdaptSign, a CSLR model powered by frozen pretrained image models with 18.5% & 18.8% WER on phoenix2014, 18.6% & 19.8% WER on phoenix2014-T, 26.7% & 26.3% WER on CSL-Daily.
We release DSTA-SLR, which performs sign language recognition (SLR) with pure skeleton inputs but ahcieves comparable accuracy and much faster speed than recognition with RGB inputs.

corrnet's People

Contributors

Stargazers

Watchers

Forkers

lxysimple xiaoyuwant benwen518 imjingyang hw140701 richiesui chengluu puquan zhaoyijiang dop2001 richertbelmont ethio-artifical mohankrishna12 lhyang9527

corrnet's Issues

Can you share your environment.yml conda environment?

I'm facing a bit of a challenge installing ctcdecode since it doesn't support every PyTorch version. I really want to match all the exact versions of your libraries. So, can you share your environment.yml conda environment?

preprocessing error

I am trying to preprocess the data with "!python dataset_preprocess-T.py --process-image --multiprocessing"
but this is the error it gives. I have created the symbolic link properly and verified it too (!ln -s /mnt/c/Users/Samriddha\ Sanyal/PHOENIX-2014-T-release-v3/PHOENIX-2014-T ./dataset/phoenix2014-T)
I have referred to this #10 and moved the files in the features folders into a new "1" subdirectory by executing the script for dev and test although not yet for train.

I am working with phoenix2014-T dataset

Traceback (most recent call last): File "/mnt/c/Users/Samriddha Sanyal/Corrnet-main/preprocess/dataset_preprocess-T.py", line 109, in <module> information = csv2dict(f"{args.dataset_root}/{args.annotation_prefix.format(md)}", dataset_type=md) File "/mnt/c/Users/Samriddha Sanyal/Corrnet-main/preprocess/dataset_preprocess-T.py", line 15, in csv2dict inputs_list = pandas.read_csv(anno_path) File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/pandas/util/_decorators.py", line 311, in wrapper return func(*args, **kwargs) File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 586, in read_csv return _read(filepath_or_buffer, kwds) File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 482, in _read parser = TextFileReader(filepath_or_buffer, **kwds) File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 811, in __init__ self._engine = self._make_engine(self.engine) File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/pandas/io/parsers/readers.py", line 1040, in _make_engine return mapping[engine](self.f, **self.options) # type: ignore[call-arg] File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/pandas/io/parsers/c_parser_wrapper.py", line 51, in __init__ self._open_handles(src, kwds) File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/pandas/io/parsers/base_parser.py", line 222, in _open_handles self.handles = get_handle( File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/pandas/io/common.py", line 702, in get_handle handle = open( FileNotFoundError: [Errno 2] No such file or directory: '/disk1/dataset/PHOENIX-2014-T-release-v3/PHOENIX-2014-T/annotations/manual/PHOENIX-2014-T.dev.corpus.csv'

The log file for CSL dataset

Hello can you provide your CSL log file? I may have some problems with mine and I would like to see your process

IndexError

Hi !
I tried to run python main.py --device 0 --load-weights /weitghts/dev_18.90_PHOENIX14-T.pt --phase test , but I got an IndexError.
The detailed error message is as follows:

Original Traceback (most recent call last):
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/nan/project/CSLR/CorrNet/dataset/dataloader_video.py", line 50, in __getitem__
    input_data, label = self.normalize(input_data, label)
  File "/home/nan/project/CSLR/CorrNet/dataset/dataloader_video.py", line 87, in normalize
    video, label = self.data_aug(video, label, file_id)
  File "/home/nan/project/CSLR/CorrNet/utils/video_augmentation.py", line 24, in __call__
    image = t(image)
  File "/home/nan/project/CSLR/CorrNet/utils/video_augmentation.py", line 157, in __call__
    im_h, im_w, im_c = clip[0].shape
IndexError: list index out of range

I'm sure my dataset path is correct.
The complete error message is as follows:

Traceback (most recent call last):
  File "/home/nan/project/CSLR/CorrNet/main.py", line 256, in <module>
    processor.start()
  File "/home/nan/project/CSLR/CorrNet/main.py", line 98, in start
    dev_wer = seq_eval(self.arg, self.data_loader["dev"], self.model, self.device,
  File "/home/nan/project/CSLR/CorrNet/seq_scripts.py", line 58, in seq_eval
    for batch_idx, data in enumerate(tqdm(loader)):
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/_utils.py", line 425, in reraise
    raise self.exc_type(msg)
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/nan/anaconda3/envs/SLR/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 44, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/nan/project/CSLR/CorrNet/dataset/dataloader_video.py", line 50, in __getitem__
    input_data, label = self.normalize(input_data, label)
  File "/home/nan/project/CSLR/CorrNet/dataset/dataloader_video.py", line 87, in normalize
    video, label = self.data_aug(video, label, file_id)
  File "/home/nan/project/CSLR/CorrNet/utils/video_augmentation.py", line 24, in __call__
    image = t(image)
  File "/home/nan/project/CSLR/CorrNet/utils/video_augmentation.py", line 157, in __call__
    im_h, im_w, im_c = clip[0].shape
IndexError: list index out of range

All errors occur after the dataset has finished loading.

unable to reconize any word but the loss is decreasing???

hello, i get an error on the training phase The loss is decreasing but when i evaluate the model it doesn't recognize any word i get 100 always.
i install pytorch 1.13.0
python 3.10.13
ctcdecode-1.0.3

this is my log file

Sat Jan 27 01:36:04 2024 ] Parameters:

{'work_dir': 'PATH_TO_SAVE_RESULTS', 'config': './configs/baseline.yaml', 'random_fix': True, 'device': '0', 'phase': 'train', 'save_interval': 5, 'random_seed': 0, 'eval_interval': 1, 'print_log': True, 'log_interval': 50, 'evaluate_tool': 'python', 'feeder': 'dataset.dataloader_video.BaseFeeder', 'dataset': 'phoenix14', 'dataset_info': {'dataset_root': './dataset/phoenix2014/phoenix-2014-multisigner', 'dict_path': './preprocess/phoenix2014/gloss_dict.npy', 'evaluation_dir': './evaluation/slr_eval', 'evaluation_prefix': 'phoenix2014-groundtruth'}, 'num_worker': 10, 'feeder_args': {'mode': 'test', 'datatype': 'video', 'num_gloss': -1, 'drop_ratio': 1.0, 'prefix': './dataset/phoenix2014/phoenix-2014-multisigner', 'transform_mode': False}, 'model': 'slr_network.SLRModel', 'model_args': {'num_classes': 65, 'c2d_type': 'resnet18', 'conv_type': 2, 'use_bn': 1, 'share_classifier': False, 'weight_norm': False}, 'load_weights': None, 'load_checkpoints': None, 'decode_mode': 'beam', 'ignore_weights': [], 'batch_size': 8, 'test_batch_size': 8, 'loss_weights': {'SeqCTC': 1.0}, 'optimizer_args': {'optimizer': 'Adam', 'base_lr': 0.0001, 'step': [20, 35], 'learning_ratio': 1, 'weight_decay': 0.0001, 'start_epoch': 0, 'nesterov': False}, 'num_epoch': 20}

[ Sat Jan 27 01:36:31 2024 ] Epoch: 0, Batch(0/122) done. Loss: 110.28868103 lr:0.000100
[ Sat Jan 27 01:38:26 2024 ] Epoch: 0, Batch(50/122) done. Loss: 13.18387794 lr:0.000100
[ Sat Jan 27 01:40:25 2024 ] Epoch: 0, Batch(100/122) done. Loss: 12.18678570 lr:0.000100
[ Sat Jan 27 01:41:07 2024 ] Mean training loss: 18.2596124587.
[ Sat Jan 27 01:41:58 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:42:24 2024 ] Epoch: 1, Batch(0/122) done. Loss: 12.15300369 lr:0.000100
[ Sat Jan 27 01:44:21 2024 ] Epoch: 1, Batch(50/122) done. Loss: 11.67739010 lr:0.000100
[ Sat Jan 27 01:46:22 2024 ] Epoch: 1, Batch(100/122) done. Loss: 13.26895523 lr:0.000100
[ Sat Jan 27 01:47:08 2024 ] Mean training loss: 12.1612764968.
[ Sat Jan 27 01:47:58 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:48:27 2024 ] Epoch: 2, Batch(0/122) done. Loss: 12.09643936 lr:0.000100
[ Sat Jan 27 01:50:20 2024 ] Epoch: 2, Batch(50/122) done. Loss: 11.06025696 lr:0.000100
[ Sat Jan 27 01:52:13 2024 ] Epoch: 2, Batch(100/122) done. Loss: 9.84243107 lr:0.000100
[ Sat Jan 27 01:53:01 2024 ] Mean training loss: 10.5143460211.
[ Sat Jan 27 01:53:52 2024 ] Dev WER: 100.00%
[ Sat Jan 27 01:54:22 2024 ] Epoch: 3, Batch(0/122) done. Loss: 9.38849068 lr:0.000100
[ Sat Jan 27 01:56:19 2024 ] Epoch: 3, Batch(50/122) done. Loss: 9.07399940 lr:0.000100
[ Sat Jan 27 01:58:09 2024 ] Epoch: 3, Batch(100/122) done. Loss: 8.66645050 lr:0.000100
[ Sat Jan 27 01:58:55 2024 ] Mean training loss: 9.0431265127.
[ Sat Jan 27 01:59:45 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:00:12 2024 ] Epoch: 4, Batch(0/122) done. Loss: 8.63507748 lr:0.000100
[ Sat Jan 27 02:02:05 2024 ] Epoch: 4, Batch(50/122) done. Loss: 7.65232229 lr:0.000100
[ Sat Jan 27 02:04:04 2024 ] Epoch: 4, Batch(100/122) done. Loss: 7.27032137 lr:0.000100
[ Sat Jan 27 02:04:47 2024 ] Mean training loss: 7.6128989556.
[ Sat Jan 27 02:05:38 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:06:09 2024 ] Epoch: 5, Batch(0/122) done. Loss: 6.52053165 lr:0.000100
[ Sat Jan 27 02:07:59 2024 ] Epoch: 5, Batch(50/122) done. Loss: 4.85380507 lr:0.000100
[ Sat Jan 27 02:10:03 2024 ] Epoch: 5, Batch(100/122) done. Loss: 7.19156647 lr:0.000100
[ Sat Jan 27 02:10:44 2024 ] Mean training loss: 5.7774419706.
[ Sat Jan 27 02:11:35 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:12:00 2024 ] Epoch: 6, Batch(0/122) done. Loss: 3.87025928 lr:0.000100
[ Sat Jan 27 02:14:02 2024 ] Epoch: 6, Batch(50/122) done. Loss: 3.52518511 lr:0.000100
[ Sat Jan 27 02:16:07 2024 ] Epoch: 6, Batch(100/122) done. Loss: 3.84364915 lr:0.000100
[ Sat Jan 27 02:16:45 2024 ] Mean training loss: 3.9095683430.
[ Sat Jan 27 02:17:36 2024 ] Dev WER: 100.00%
[ Sat Jan 27 02:18:05 2024 ] Epoch: 7, Batch(0/122) done. Loss: 3.43237042 lr:0.000100
[ Sat Jan 27 02:20:00 2024 ] Epoch: 7, Batch(50/122) done. Loss: 2.54930735 lr:0.000100
[ Sat Jan 27 02:21:59 2024 ] Epoch: 7, Batch(100/122) done. Loss: 2.43364787 lr:0.000100
[ Sat Jan 27 02:22:40 2024 ] Mean training loss: 2.6058940282.
[ Sat Jan 27 02:23:30 2024 ] Dev WER: 100.00%

WHAT IS THE PROBLEM GUYS and HOW TO SOLVE IT I AM TRYING IN ETHIOPIA SIGN LANGUAGE DATASET THAT IS AMHARIC CHARACTER

Training Problem

Hi, i used CorrNet to train on the PHOENIX-2014-T dataset, in Training section at your shared page (https://github.com/hulianyuyy/CorrNet) i am taking this error (No module named 'dataset.dataloader_video'). I added dataset and datasets modules via pip install, but still i am taking same error. And, before Training section, if i use this command (python main.py --device your_device --load-weights path_to_weight.pt --phase test), i am taking this error (No such file or directory: 'path_to_weight.pt'
), i don't know how to i get this .pt file.
Could you help me, please.
Thank you.

Ask

Hello, I encountered some confusion while using netron to inspect the model structure. Could you please clarify if the input dimension of 1024x1296 for the provided pre-trained model pertains to the image size? Additionally, could you explain what is meant by 'output' in this context

Getting Diffrent Result When i test_one_video.py file

I trained the model using VAC (ICCV 2021) Repositary. However, when I utilize your test_one_video.py file for testing, I observe wrong result. Is it possible that this file is not compatible with the VAC model ?

Question About Identification Module

Firstly, thank you for your commitment to open source. I noticed in your paper that the temporal dilation rate for the dilated convolution in your Identification Module is 4, but it seems to be set to 1 in your code. Could it be that I misunderstood something? In my understanding, the part that defines the Identification Module is in lines 25-27 of modules/resnet:
self.spatial_aggregation1 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,1,1), groups=reduction_channel)
self.spatial_aggregation2 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,2,2), dilation=(1,2,2), groups=reduction_channel)
self.spatial_aggregation3 = nn.Conv3d(reduction_channel, reduction_channel, kernel_size=(9,3,3), padding=(4,3,3), dilation=(1,3,3), groups=reduction_channel)

CSL dataset preprocessing failed

Hello, can you provide the file tree for the CSL dataset?

When I ran the Python data_process CSL. py -- process image -- multiprocessing code, I couldn't find a feature map resized to 256 x 256.

Is there an error in the CSL dataset I downloaded?

This is the file tree of my CSL dataset

Preprocessing Phoenix 2014 -T

Currently trying to resize the Phoneix-2014-T dataset. I followed the procedure in making the symbolic link as well as setting the dataset-root to the correct filepath. When I run dataset_preprocess-T.py , the tqdm bars load as usual in the command line like what happens with dataset_preprocess.py. At the end of the process however, none of the images are resized to 256x256. There also is no "256x256 " folder made.

I did some digging and found that line 64 may not be working right
img_list = glob.glob(f"{info_dict['prefix']}/{info['folder']}")``

the variable img_list keeps coming up empty whenever I check it. This is however not the case with dataset_preprocess.py. Is there a fix for this?

Train problem

Hi, I use CorrNet to train on the CCL-Daily dataset, but oss and WER have always been very big, each epoch drops very little, I hope you can give some advice.

Parameters/FLOPs

Hello author, I would like to know the parameters and Flops of the model. Therefore, I added code to output the parameter quantity in test_one_video.py. However, I'm unsure about how to output Flops. Do you have any suggestions?Looking forward to receiving your help.

test_one_video.py
.......
def count_parameters(model):
return sum(p.numel() for p in model.parameters() if p.requires_grad)
......

Define model and load state-dict

model = SLRModel( num_classes=1296, c2d_type='resnet18', conv_type=2, use_bn=1, gloss_dict=gloss_dict,
loss_weights={'ConvCTC': 1.0, 'SeqCTC': 1.0, 'Dist': 25.0}, )

num_params = count_parameters(model)
print("Model has {} parameters".format(num_params))
......

报错指南

报错指南.pdf
这是我自己总结的一些错误，参考了某dn还有作者的GitHub讨论区等，都已解决，目前正在训练。等待结果，希望大家可以参考

video_map.txt in dataset_preprocess_CSL-dairy

Hi，author！May i ask you how to get video_map.txt in dataset_preprocess_CSL-dairy?just translate csl2020ct_v1.pkl to video_map.txt?

Unable to train

Hi, I run:

cd ./preprocess
python datataset_preprocess-T.py --process-image --multiprocessing
cd ..
python main.py --device 0

datataset_preprocess is ok, but training phrase was error. Can you provide me with some advice

./dataset/phoenix2014-T
Traceback (most recent call last):
  File "/home/CorrNet/main.py", line 255, in <module>
./dataset/phoenix2014-T/features/fullFrame-256x256px/train/19April_2010_Monday_heute-776/1/*.png
    processor.start()
  File "/home/CorrNet/main.py", line 67, in start
    seq_train(self.data_loader['train'], self.model, self.optimizer,
  File "/home/CorrNet/seq_scripts.py", line 20, in seq_train
    for batch_idx, data in enumerate(tqdm(loader)):
  File "/home/miniconda3/envs/ml/lib/python3.9/site-packages/tqdm/std.py", line 1180, in __iter__
    for obj in iterable:
  File "/home/miniconda3/envs/ml/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 521, in __next__
    data = self._next_data()
  File "/home/miniconda3/envs/ml/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1203, in _next_data
    return self._process_data(data)
  File "/home/miniconda3/envs/ml/lib/python3.9/site-packages/torch/utils/data/dataloader.py", line 1229, in _process_data
    data.reraise()
  File "/home/miniconda3/envs/ml/lib/python3.9/site-packages/torch/_utils.py", line 434, in reraise
    raise exception
IndexError: Caught IndexError in DataLoader worker process 0.
Original Traceback (most recent call last):
  File "/home/miniconda3/envs/ml/lib/python3.9/site-packages/torch/utils/data/_utils/worker.py", line 287, in _worker_loop
    data = fetcher.fetch(index)
  File "/home/miniconda3/envs/ml/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in fetch
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/miniconda3/envs/ml/lib/python3.9/site-packages/torch/utils/data/_utils/fetch.py", line 49, in <listcomp>
    data = [self.dataset[idx] for idx in possibly_batched_index]
  File "/home/CorrNet/dataset/dataloader_video.py", line 50, in __getitem__
    input_data, label = self.normalize(input_data, label)
  File "/home/CorrNet/dataset/dataloader_video.py", line 89, in normalize
    video, label = self.data_aug(video, label, file_id)
  File "/home/CorrNet/utils/video_augmentation.py", line 24, in __call__
    image = t(image)
  File "/home/CorrNet/utils/video_augmentation.py", line 119, in __call__
    if isinstance(clip[0], np.ndarray):
IndexError: list index out of range

sclite: not found

I installed kaldi-asr/kaldi, but I cant find sctk-2.4.10 in ln -s PATH_TO_KALDI/tools/sctk-2.4.10/bin/sclite ./software/sclite. So I tried using sctk-20159b5 instead sctk-2.4.10

After train the network 1st iter, I got an error:

preprocess.sh ./work_dir/baseline_res18/output-hypothesis-dev-conv.ctm ./work_dir/baseline_res18/tmp.ctm ./work_dir/baseline_res18/tmp2.ctm Fri Nov 10 13:17:20 JST 2023 Preprocess Finished. sh: 1: ./software/sclite: not found /bin/sh: 1: ./software/sclite: not found Unexpected error: <class 'IndexError'> Traceback (most recent call last): File "/home/namnt/CorrNet/main.py", line 255, in <module> processor.start() File "/home/namnt/CorrNet/main.py", line 70, in start dev_wer = seq_eval(self.arg, self.data_loader['dev'], self.model, self.device, File "/home/namnt/CorrNet/seq_scripts.py", line 95, in seq_eval del conv_ret UnboundLocalError: local variable 'conv_ret' referenced before assignment

Has anyone encountered the same errors?

RuntimeError: Invalid UTF-8

你好，出现了
RuntimeError: Invalid UTF-8 错误，这是因为ctcdecode的问题吗，已经按照你的要求安装了3.7.1 pyotrch1.10.1 cuda11.1，ctcdecode0.4，是因为评估工具没有安装好嘛 kaldi已经软链接到software文件下，warp-ctc是0.1的版本根据要求都已安装好，不知是什么原因，求指教

Gpu memory issue

Thank you for your prompt replies. I am currently trying to run the model but this is the exact error.

Traceback (most recent call last): File "/mnt/d/Corrnet-main/main.py", line 255, in <module> processor.start() File "/mnt/d/Corrnet-main/main.py", line 67, in start seq_train(self.data_loader['train'], self.model, self.optimizer, File "/mnt/d/Corrnet-main/seq_scripts.py", line 35, in seq_train scaler.scale(loss).backward() File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/torch/_tensor.py", line 487, in backward torch.autograd.backward( File "/home/shizuka/anaconda3/envs/corrnet2/lib/python3.9/site-packages/torch/autograd/__init__.py", line 200, in backward Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 58.00 MiB (GPU 0; 6.00 GiB total capacity; 5.26 GiB already allocated; 0 bytes free; 5.33 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

Any advice on how to deal with less gpu capacity and resolve this issue or parameters that I can change to successfully run it? How much GPU memory does the model take overall?

Inference

Is there anyway we could do inference on video as input and output multiple recognized gloss/word?

Training time and logs

Thanks for your open source spirit 🍻 , and is it possible to share the logs file for others to reproduce? And how much time does the training process consume for each dataset....

Fixed Frames

Sorry if I misunderstood. Usually, we need to set a constant for number of frame in a video when fitting through network. I can't find how you set the number of frames for a video to training (some thing like MAX_SEQ_LENGTH or Fixed_Frames). I can see the class TemporalRescale(object) in utils, but I think it will create clips of different lengths. Can you tell me how did you do?

Thanks!

CSL dataset fails to generate tmp2.ctm to complete validation

Hi author, I got the following error when I am automatically performing the validation after the first round of training on the CSL dataset, I realized that I can't generate the tmp2.ctm file, I tried deleting the line breaks in . /evaluation/slr_eval/preprocess.sh: line 2: $'\r' line breaks are deleted, but it still can't generate tmp2.ctm file, just ". /evaluation/slr_eval/preprocess.sh: line 2: $'\r': command not found", but still can't solve the other problems, what should I do?

Here are the corresponding errors
./evaluation/slr_eval/preprocess.sh: line 2: $'\r': command not found
./evaluation/slr_eval/preprocess.sh: line 21: syntax error: unexpected end of file
Traceback (most recent call last):
File "./evaluation/slr_eval/mergectmstm.py", line 9, in
ctm = open(ctmFile, "r")
FileNotFoundError: [Errno 2] No such file or directory: './work_dir/baseline/tmp2.ctm'
cp: cannot stat './work_dir/baseline/tmp2.ctm': No such file or directory
Unexpected error: <class 'FileNotFoundError'>
Traceback (most recent call last):
File "main.py", line 255, in
processor.start()
File "main.py", line 70, in start
dev_wer = seq_eval(self.arg, self.data_loader['dev'], self.model, self.device,
File "/root/CorrNet (copy)/seq_scripts.py", line 95, in seq_eval
del conv_ret
UnboundLocalError: local variable 'conv_ret' referenced before assignment

Issue with extracting the CSL-Daily Dataset

First of all thank you very much for open sourcing your code. I have an issue unrelated to your code, but I'm hoping I can get some help. I dowloaded the CSL-Daily dataset parts csl-daily-frames-512x512.tar.gz_00 - csl-daily-frames-512x512.tar.gz_09 and followed the instructions to cat them and extract the files. However, I can only extract the files from the first concatenated archive (csl-daily-frames-512x512.tar.gz_00). I would really appreciate your help, if you have experienced this.

issue with installing module like CTC,sclite [kaldi-asr/kaldi] and SeanNaren/warp-ctc

I am having problems installing modules like CTC, scliter [kaldi-asr/kaldi] and SeanNaren/warp-ctc I am using Google Colab to install it. So can you give me basic requirement like PyTorch,python version and other required modules to successfully run the code?

Weigh Tensor fill with tensor of 0

Hello, i try to run your work in windows systeme but i had some error :

Here we see, with the debug, that the weight tensor is correctly initialized :

but when this commande is run in the seq_scripts function :

for batch_idx, data in enumerate(tqdm(loader))

the weigh tensor is put to 0 :

Systeme :
Pythorch 2.0

i work with 1 GPU, when i run the resnet.py alone the wheigh is correctly initialized and the convolution work correctly.

when i try to entre in this line with the debug : for batch_idx, data in enumerate(tqdm(loader))

it send me in the file comm.py in /modules/sync_batchnorm in the class SyncMaster :

def getstate(self):
return {'master_callback': self._master_callback}

Is it possible to utilize a custom DIY dataset with this architecture?

I'm considering implementing it for my country's sign language.

训练问题

我在使用csl-daily数据集当中，按照您的操作，lr设置未0.00005，gaama设置为0.5，但是最后一个禁用时间重采样策略（注释行 121）在 dataloader_video.py 中。我直接将该函数注释掉报错，然后将main.py当中的collate_fn=self.feeder.collate_fn,这条删掉了运行会报AttributeError: 'str' object has no attribute 'shape',。如果不按照你这个设置默认配置，可以直接运行成功训练，但是训练一轮需要4个多小时，gpu是A100，40轮达到了160多小时，和您训练的33小时差距太大，怎么可以改进训练时间呢？属于正常吗还是

Problems with test_one_video.py using CSL-Daily dataset

I tried with the test_one_video.py with CSL-Daily and some issues occurred. I tested it with the following 4 datasets from CSL-Daily, and all of them are not giving out the correct answers.

Datasets:

S000000_P0000_T00
S000003_P0000_T00
S000000_P0008_T00
S000009_P0008_T00

Errors:
2. (S000003_P0000_T00)
(corrNet) ubuntu@Washington:home/CorrNet$ python test_one_video.py
test_one_video.py:54: DeprecationWarning: an integer is required (got type numpy.float64). Implicit conversion to integers using int is deprecated, and may be removed in a future version of Python.
video_length = torch.LongTensor([np.ceil(vid.size(1) / total_stride) * total_stride + 2left_pad ])
output glosses : [[('有', 0)]]
##################################################################
3. (S000000_P0008_T00)
(corrNet) ubuntu@Washington:home/CorrNet$ python test_one_video.py
test_one_video.py:54: DeprecationWarning: an integer is required (got type numpy.float64). Implicit conversion to integers using int is deprecated, and may be removed in a future version of Python.
video_length = torch.LongTensor([np.ceil(vid.size(1) / total_stride) * total_stride + 2left_pad ])
output glosses : [[('2', 0)]]
##################################################################
4. (S000009_P0008_T00)
(corrNet) ubuntu@Washington:~/CorrNet$ python test_one_video.py
test_one_video.py:54: DeprecationWarning: an integer is required (got type numpy.float64). Implicit conversion to integers using int is deprecated, and may be removed in a future version of Python.
video_length = torch.LongTensor([np.ceil(vid.size(1) / total_stride) * total_stride + 2*left_pad ])
output glosses : [[('山', 0), ('雪', 1)]]

Video Maps:

000000|S000000_P0000_T00|52|你们好|你们好！|你们好！|r a w
000003|S000001_P0000_T00|37|对不起|对不起！|对不起！|v w
000000|S000000_P0000_T00|52|你们好|你们好！|你们好！|r a w
000009|S000003_P0000_T00|30|谢谢|谢谢！|谢谢！|v w

PS:
Although dataset_1 and dataset_3 have the same meanings, they are performed by different person and the result seems to be different. For dataset_1, we got [[('5', 0)]], and for dataset_3 we got [[('2', 0)]].

Can you help me with this problem? I am new to this area and I can't really figure out the problem myself. Thank you!

How to visualizations of heatmaps by Grad-CAM in your entire module?

Hello author, how was the visualization implemented in the article? I'm earnestly seeking your assistance.

del/ins

Dear author,
How to know del/ins after training/inference? in logs (or something like log)?

空间不足问题

环境是pytorch1.8.1 python 3.8.18 cuda11.1 ctcdecode0.4成功安装，但是在训练第一轮结束报了内存不足问题，服务器是A100 80g

求告知大概是什么问题，看了您的其他问题说是版本问题，更换了pytorch1.13.0 也成功安装上了ctcdecode 但是在运行时也会直接报ctc的问题，麻烦给个思路谢谢我应该怎么去弄是和gcc版本有问题吗目前是11.4的gcc
或者能否告诉我你的环境是什么吗

How many GPUs are required for this task?

ctcdecode not installing

i am trying to install ctcdecode from the attached link, but am facing issues.

pip install .
Processing /home/ubuntu/Documents/CorrNet/virtualenv/ctcdecode
Preparing metadata (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py egg_info did not run successfully.
│ exit code: 1
╰─> [24 lines of output]
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/home/ubuntu/Documents/CorrNet/virtualenv/ctcdecode/setup.py", line 30, in
download_extract(
File "/home/ubuntu/Documents/CorrNet/virtualenv/ctcdecode/setup.py", line 21, in download_extract
tar.extractall("third_party/")
File "/home/ubuntu/anaconda3/envs/corrnet/lib/python3.9/tarfile.py", line 2250, in extractall
self._extract_one(tarinfo, path, set_attrs=not tarinfo.isdir(),
File "/home/ubuntu/anaconda3/envs/corrnet/lib/python3.9/tarfile.py", line 2313, in _extract_one
self._extract_member(tarinfo, os.path.join(path, tarinfo.name),
File "/home/ubuntu/anaconda3/envs/corrnet/lib/python3.9/tarfile.py", line 2396, in _extract_member
self.makefile(tarinfo, targetpath)
File "/home/ubuntu/anaconda3/envs/corrnet/lib/python3.9/tarfile.py", line 2449, in makefile
copyfileobj(source, target, tarinfo.size, ReadError, bufsize)
File "/home/ubuntu/anaconda3/envs/corrnet/lib/python3.9/tarfile.py", line 251, in copyfileobj
buf = src.read(bufsize)
File "/home/ubuntu/anaconda3/envs/corrnet/lib/python3.9/gzip.py", line 300, in read
return self._buffer.read(size)
File "/home/ubuntu/anaconda3/envs/corrnet/lib/python3.9/_compression.py", line 68, in readinto
data = self.read(len(byte_view))
File "/home/ubuntu/anaconda3/envs/corrnet/lib/python3.9/gzip.py", line 506, in read
raise EOFError("Compressed file ended before the "
EOFError: Compressed file ended before the end-of-stream marker was reached
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed

× Encountered error while generating package metadata.
╰─> See above for output.

note: This is an issue with the package mentioned above, not pip.
hint: See above for details.

I am using python 3.9 for my code.

GPU Memory and Training Problem with Batch Size 2

Hi, I have a problem training the model in batch size 2. I keep getting this error:
RuntimeError: CUDA out of memory. Tried to allocate 488.00 MiB (GPU 0; 9.77 GiB total capacity; 6.90 GiB already allocated; 304.38 MiB free; 6.96 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

My GPU is GeForce RTX 3080 10GB and I think I cannot train the model due to lack of GPU memory. I can train the model by cutting the batch size to 1 but I am afraid that it might affect the training result. Do you have any advice about how to fix this problem without changing my GPU?

Torch not compiled with CUDA

Can this code run without the use of GPUs? I am trying to run this code on my laptop which does not have a dedicated GPU. I am repeatedly getting an error. I know that the code has been made compatible for parallel computing. However, whenever the code passes through Processor > device, it gives out the error mentioned above.

2D-CNN or 3D-CNN?

I apologize if I misunderstood. As shown in the picture, the feature extractor is depicted as a 2D CNN. However, in the code, a ResNet-based model is built using 3D CNN components, such as conv3x3, BatchNorm3d, etc. Could you please explain why the feature extractors are implemented using 2D CNN? Thank you in advance.

Trainning problem

I tried to train on CSL-Daily dataset, but it reported the error "list index out of range". It may be the problem in dataloader_video.py. I notice that you said, we need to disable the temporal resampling strategy. How that actually been done? Should I comment the whole collate_fn function?

Train Problem

Hello! When training on the csl-daily dataset, the following error is reported midway through the first epoch. I don't know what caused the mistake. I am very eager to get your help

calculate correlation

Sorry if i bother you. I dont understand much about correlation module in this code. If I want to calculate correlation of t-n, t, t+n frames instead of t-1, t, t+1 as original code, how should I change the code?

CorrNet/modules/resnet.py

Lines 34 to 45 in 9ad5697

 x2 = self.down_conv2(x) 

 affinities = torch.einsum('bcthw,bctsd->bthwsd', x, torch.concat([x2[:,:,1:], x2[:,:,-1:]], 2)) # repeat the last frame 

 affinities2 = torch.einsum('bcthw,bctsd->bthwsd', x, torch.concat([x2[:,:,:1], x2[:,:,:-1]], 2)) # repeat the first frame  

 features = torch.einsum('bctsd,bthwsd->bcthw', torch.concat([x2[:,:,1:], x2[:,:,-1:]], 2), F.sigmoid(affinities)-0.5 )* self.weights2[0] + \ 

 torch.einsum('bctsd,bthwsd->bcthw', torch.concat([x2[:,:,:1], x2[:,:,:-1]], 2), F.sigmoid(affinities2)-0.5 ) * self.weights2[1] 

 x = self.down_conv(x) 

 aggregated_x = self.spatial_aggregation1(x)*self.weights[0] + self.spatial_aggregation2(x)*self.weights[1] \ 

 + self.spatial_aggregation3(x)*self.weights[2] 

 aggregated_x = self.conv_back(aggregated_x) 

 return features * (F.sigmoid(aggregated_x)-0.5)

Calculate the delta t

CorrNet/dataset/dataloader_video.py

Lines 129 to 140 in 9ad5697

 for layer_idx, ks in enumerate(kernel_sizes): 

 if ks[0] == 'K': 

 left_pad = left_pad * last_stride 

 left_pad += int((int(ks[1])-1)/2) 

 elif ks[0] == 'P': 

 last_stride = int(ks[1]) 

 total_stride = total_stride * last_stride 

 if len(video[0].shape) > 3: 

 max_len = len(video[0]) 

 video_length = torch.LongTensor([np.ceil(len(vid) / total_stride) * total_stride + 2*left_pad for vid in video]) 

 right_pad = int(np.ceil(max_len / total_stride)) * total_stride - max_len + left_pad 

 max_len = max_len + left_pad + right_pad

Hi author, I did not understand why we do that. Can you explain how to calculate the delta t? For example C5-P2-C5-P2? Thank you in advance!

Applied to test questions

I want to know how to apply the model I trained to see the effect? , For example, I can input a picture to recognize, or input a video.

sentence id

Hi, thank you for your contribution, I am currently trying to use this to train on British sign language dataset. Just to check, the id of each sentence in the csv file does not need to be unique, right ?

Also, could I see your output-hypothesis files if possible ?

Inference Issues

I want to input a video separately for inference to obtain results. How should I proceed, and is it possible to write a standalone script for inference.For example, input a Chinese sign language video and provide the corresponding predicted text. Urgently seeking.

Grad-Cam

Thank you for your efforts in this project.

I'm trying to plot the activation maps of the conv2d module, like figure 1 in your paper. I've used this Grad-Cam usage example as a start point: https://github.com/jacobgil/pytorch-grad-cam/blob/master/usage_examples
However, I'm getting an error, which I think is because, the model returns a dic and Grad-CAM methods expect a single output.
Also, the model takes the video and video length as input, but Grad-CAM methods take only one input_tensor (image).

I would really appreciate if you would share the visualization code you used in your paper, or help me figure out these problems.

Thank you again for your continuous support.

RuntimeError：“not enough space”

遇到的问题：
尝试训练该源码的模型时，总在第一轮epoch结束时出现“not enough space”的问题。已尝试了许多方法，例如调整batchsize、num_worker参数等方法，仍然解决不了。

请求帮助：
想询问一下作者训练模型所用的硬件配置，排查一下是否为硬件条件不够的原因。

本人的环境配置：
环境：python3.6.7，pytorch1.9.0，ctcdecode0.4。
硬件：显卡4张4090，内存125G。

trainging time of 1 epoch on phoenix2014-T

Hi, I found that CorrNet need about 3 hours per epoch on phoenix2014-T when using 3090. But I place dataset files on Hard Disk with 7200 RPM. I wonder if this matters?

So I try to use joint coordinates instead of RGB to get a lightweight model with less training time. But I failed on the 3d hand joint data retrieved from RGB using Mediapipe.

I would greatly appreciate it if you could provide relevant information about above two problems.

Best regard.

Inferencing on a single video for CSL-Daily

I used the test_one_video.py code for inferring a single video from CSL-Daily. I changed num_classes to 2001 and encountered an error. Could you please guide me on what needs to be modified? The model is using the pretrained weights provided by you.

Can not reproduce results on phoenix2014 dataset

Hi. I tried run the default code to train phoenix2014 dataset. This is log fie:
log.txt

However, I cannot get same results as the paper (1% worse than reported).

[ Thu Nov 16 04:56:27 2023 ] Dev WER: 20.00%
[ Thu Nov 16 04:56:27 2023 ] Best_dev: 19.80, Epoch : 27

What parameters should I change to reproduce? Thanks!

Is sclite the same as using python to evaluate?

Hi, may I ask could I directly using the python to evaluate rather than sclite? Will they output the same result?

	x2 = self.down_conv2(x)
	affinities = torch.einsum('bcthw,bctsd->bthwsd', x, torch.concat([x2[:,:,1:], x2[:,:,-1:]], 2)) # repeat the last frame
	affinities2 = torch.einsum('bcthw,bctsd->bthwsd', x, torch.concat([x2[:,:,:1], x2[:,:,:-1]], 2)) # repeat the first frame
	features = torch.einsum('bctsd,bthwsd->bcthw', torch.concat([x2[:,:,1:], x2[:,:,-1:]], 2), F.sigmoid(affinities)-0.5 )* self.weights2[0] + \
	torch.einsum('bctsd,bthwsd->bcthw', torch.concat([x2[:,:,:1], x2[:,:,:-1]], 2), F.sigmoid(affinities2)-0.5 ) * self.weights2[1]

	x = self.down_conv(x)
	aggregated_x = self.spatial_aggregation1(x)self.weights[0] + self.spatial_aggregation2(x)self.weights[1] \
	+ self.spatial_aggregation3(x)*self.weights[2]
	aggregated_x = self.conv_back(aggregated_x)

	return features * (F.sigmoid(aggregated_x)-0.5)

	for layer_idx, ks in enumerate(kernel_sizes):
	if ks[0] == 'K':
	left_pad = left_pad * last_stride
	left_pad += int((int(ks[1])-1)/2)
	elif ks[0] == 'P':
	last_stride = int(ks[1])
	total_stride = total_stride * last_stride
	if len(video[0].shape) > 3:
	max_len = len(video[0])
	video_length = torch.LongTensor([np.ceil(len(vid) / total_stride) * total_stride + 2*left_pad for vid in video])
	right_pad = int(np.ceil(max_len / total_stride)) * total_stride - max_len + left_pad
	max_len = max_len + left_pad + right_pad