tgc1997 / rmn Goto Github PK

View Code? Open in Web Editor NEW

79.0 79.0 12.0 71.75 MB

IJCAI2020: Learning to Discretely Compose Reasoning Module Networks for Video Captioning

Python 98.38% Shell 1.62%

rmn's People

Contributors

Stargazers

Watchers

Forkers

cv-ip sunzx97 dorothylyly ammieqi xbegger baiyang4 bigponglee hxynode ps-11 lujun99 xixiareone bangbangtangde

rmn's Issues

evaluate.py vs train.py

Hi, can i check whats the difference between the evaluate.py and train.py? tyvm

TypeError: h5py objects cannot be pickled

When I try to run evaluate.py, I ran into an error. I tried some methods but couldn't solve the problem. I hope I can get your help.

Some questions about hidden size.

According to the code, the hidden size is set to 1300 instead of widely-used 1024 or 2048. What is the main concern in this point?

hi! would like to know how to get these

A refinement report

RMN/train.py

Line 128 in 14a9eff

loss_count /= 10 if bsz == opt.train_batch_size else i % 10

Hi, Ganchao. I found the above judgement may miss some conditions during executing the project.
e.g. When the train_batch_size is set to 2 or 3, the step of the train_loader is 24390 (48779/2=24389.5) and 16260 (48779/3=16259.67) respectively. Here 48779 is the total number of samples for MSVD dataset. Note that the division operation is not completed. It means there are only 1 or 2 samples in the 24390th or 16260th step. And it doesn't meet the condition, bsz == opt.train_batch_szie. so the loss_count will be divided by 0 (i % 10). Ooops! : (
It could be refined like followings:

if bsz == opt.train_batch_size:
    loss_count /= 10
elif bsz < opt.train_batch_size and i % 10 == 0:
    loss_count /= 10
else:
    loss_count /= i % 10

The project on my server restart again now. If it still works well after executing one epoch, I will come back to report.

Is to use the test set to save the best model？

Could you upload your faster rcnn code to extract region features for my own data?

FileNotFoundError: [Errno 2] No such file or directory: 'java': 'java'

Obtained this error when running evaluate.py and train.py
May I know how to solve this issue?

hi! would like to know how to resolve the following issue

When I tried to run evaluatie.py it reported the function incorrectly

(rmn) E:\video_caption\rmn\RMN-master>python evaluate.py --dataset=msvd --model=RMN --result_dir=results/msvd_model --attention=gumbel --use_loc --use_rel --use_func --hidden_size=512 --att_size=512 --test_batch_size=2 --beam_size=
2 --eval_metric=CIDEr
335it [01:21, 4.13it/s]
init COCO-EVAL scorer
tokenization...
Traceback (most recent call last):
File "evaluate.py", line 107, in
metrics = evaluate(opt, net, opt.test_range, opt.test_prediction_txt_path, reference)
File "evaluate.py", line 75, in evaluate
scores, sub_category_score = scorer.score(reference, prediction_json, prediction_json.keys())
File "./caption-eval\cocoeval.py", line 64, in score
print('tokenization...')
OSError: [WinError 1] 函数不正确。

The link of visual and text features cannot be opened

Hi,The link is lost!

Exception: Model not supported: RMN

when I run train.py,the following error appears：

Would I ask one question?

File "/RMN-master/models/allennlp_beamsearch.py", line 257, in search
state_tensor.reshape(batch_size, self.beam_size, *last_dims)
RuntimeError: gather_out_cuda(): Expected dtype int64 for index

text feature processing

Sir, can you share the link to process the text features and from which you have generated a caption.pkl file.

a problem about region_feature file

the shape of sfeats of msvd_region_feature.h5 is 1970 x 26 x 36 x 5,
what's the meaning of the last dimensions?thank you!

problems about feature extraction models

Hi,tgc! I tried using Torch's fasterrcnn_resnet50_fpn pre-trained model to extract the region_features of the video, but found that the feature shapes I extracted were only [823, 4], which is far from [26, 36, 2048] and [26, 36, 5] in the dataset you provided. What does the extra dimension mean, or what do these three dimensions mean respectively?
I wonder that is it feasible to use Torchvision's fasterrcnn_resnet50_fpn model to extract features without using caffe's Fast R-CNN model?The sizeof features extracted using Torchvision's fasterrcnn_resnet50_fpn model is significantly insufficient.How can I extract more features and accurate feature dimensions that meet the requirements?

a problem about msr-vtt_model.pth

Which directory is this msr-vtt_model.pth in?

The link of visual and text features cannot be opened

Please provide a valid link. Thanks

a problem about features

Hello, may I ask what the method do you use to extract features and regional features from videos?Thank you

Question about training time,thanks

I use 8 GPU with 32batchsize, I trained 3epoch whiching need 11 hours.
how long did you use to train 20epoch
thanks for your work!

result reproduce for msr-vtt dataset

Hi, Ganchao!
i have difficulty in reprodcing the experiment results for msr-vtt.
i have executed the project on msr-vtt several times and always got unideal results.
the cider scores just fluctuate from 45 to 46.5 which is far from the results i.e. 49.6 reported in the paper.
would it be convenient for u to share the random seed values set in ur experiments for msr-vtt with me?
training on msr-vtt is too time-consuming, 6 days or so when using a single gpu.
looking forward to ur help, thanks!

Inference on custom raw video

Hey @tgc1997
Thanks for providing the implementations of such an awesome work!!!

I wanted to know how does one go about using the pre-trained models for inferencing on raw custom videos?

a problem about sample.py

when I run sample.py line 102, in
net.load_state_dict(torch.load(opt.model_pth_path))
RuntimeError: Error(s) in loading state_dict for CapModel:
Unexpected key(s) in state_dict: "decoder.module_selection.loc_fc.weight", "decoder.module_selection.loc_fc.bias", "decoder.module_selection.rel_fc.weight", "decoder.module_selection.rel_fc.bias", "decoder.module_selection.func_fc.weight", "decoder.module_selection.func_fc.bias", "decoder.module_selection.module_attn.wh.weight", "decoder.module_selection.module_attn.wh.bias", "decoder.module_selection.module_attn.wv.weight", "decoder.module_selection.module_attn.wv.bias", "decoder.module_selection.module_attn.wa.weight".
can you help me thank you very much

How to get my own extracted-features?

Hi tgc, I'd like to test this model on my own video. How could I get the extracted features as inputs?

the mismatch error happened when using the pretarined model you provide.

awesome work!
when i reproduce the results you report in this repository (i.e. cider metric score is 97.8 on msvd dataset), errors indicating size mismatch for the whole Capmodel occurred as running evaluate.py with your pretrained file results/msvd_model/msvd_best_cider.pth.
e. g.
Runtime error: Error(s) in loading state_dictionary for CapModel:
size mismatch for encoder.bi_lstm1.weight_it_l0: copying a parameters with shape torch.Size([2048,1000]) from checkpoint, the shape in current model is torch.Size([5200,1000]).
size mismatch ……
size mismatch ……
it seems like you have modified the model while don't update the msvd_best_cider.pth.
if you do so please let me know
and i would appreciate it if you provide the new version PTH file so that i can reproduce the results you report in this repository.
by the way why the final high results was not published in the paper?
thanks!

How to download the dataset without an account?

Spatial Feats

Hi,
I see you using the 2D CNN features (1536 dim), 3D CNN features (1024 dim), RCNN features (2048 dim). I also see something called spatial features of 5 dimensions. What are these features? I could not find them mentioned anywhere in the paper?

a bug report

hi! Ganchao here is a bug report.
i find out an error which might lead to inaccurate model reproduction or training results when i debug and reproduce the project.
the att_size=1024 set in run command console does not work. the reason is as follows:
despite the initial att_size parameter inside in initialize function of the SoftAttention & GumbelAttention class is opt.att_size(=1024), all of the att_size parameters within the reference instances actually are opt.hidden_size (=512 for msvd dataset / =1300 for msr-vtt dataset).
related code lines:

RMN/models/RMN.py

Line 18 in 14a9eff

def __init__(self, feat_size, hidden_size, att_size):

RMN/models/RMN.py

Line 45 in 14a9eff

def __init__(self, feat_size, hidden_size, att_size):

RMN/models/RMN.py

Line 171 in 14a9eff

 self.spatial_attn = SoftAttention(opt.region_projected_size, opt.hidden_size, opt.hidden_size) 

RMN/models/RMN.py

Line 175 in 14a9eff

self.temp_attn = SoftAttention(feat_size, opt.hidden_size, opt.hidden_size)

RMN/models/RMN.py

Line 207 in 14a9eff

 self.spatial_attn = SoftAttention(region_feat_size, opt.hidden_size, opt.hidden_size) 

RMN/models/RMN.py

Line 211 in 14a9eff

 self.relation_attn = SoftAttention(2*feat_size, opt.hidden_size, opt.hidden_size) 

RMN/models/RMN.py

Line 245 in 14a9eff

 self.cell_attn = SoftAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size) 

RMN/models/RMN.py

Line 285 in 14a9eff

 self.module_attn = SoftAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size) 

RMN/models/RMN.py

Line 287 in 14a9eff

 self.module_attn = GumbelAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size) 

if this parameter att_size do work as our expected, the opt.hidden_size in the above code lines should be replaced by opt.att_size. is it right ?
thanks!

What's the range of cider score?

POS

您好！我想请教一下您这里的POS词性是怎么获得的，我看到只有0，1，2这三种类型，请问他们分别代表什么词性？谢谢！

tgc1997 / rmn Goto Github PK

rmn's People

Contributors

Stargazers

Watchers

Forkers

rmn's Issues

Recommend Projects

Recommend Topics

Recommend Org