Git Product home page Git Product logo

rmn's People

Contributors

daqingliu avatar tgc1997 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

rmn's Issues

Some questions about hidden size.

According to the code, the hidden size is set to 1300 instead of widely-used 1024 or 2048. What is the main concern in this point?

A refinement report

RMN/train.py

Line 128 in 14a9eff

loss_count /= 10 if bsz == opt.train_batch_size else i % 10

Hi, Ganchao. I found the above judgement may miss some conditions during executing the project.
e.g. When the train_batch_size is set to 2 or 3, the step of the train_loader is 24390 (48779/2=24389.5) and 16260 (48779/3=16259.67) respectively. Here 48779 is the total number of samples for MSVD dataset. Note that the division operation is not completed. It means there are only 1 or 2 samples in the 24390th or 16260th step. And it doesn't meet the condition, bsz == opt.train_batch_szie. so the loss_count will be divided by 0 (i % 10). Ooops! : (
It could be refined like followings:

if bsz == opt.train_batch_size:
    loss_count /= 10
elif bsz < opt.train_batch_size and i % 10 == 0:
    loss_count /= 10
else:
    loss_count /= i % 10

The project on my server restart again now. If it still works well after executing one epoch, I will come back to report.

When I tried to run evaluatie.py it reported the function incorrectly

(rmn) E:\video_caption\rmn\RMN-master>python evaluate.py --dataset=msvd --model=RMN --result_dir=results/msvd_model --attention=gumbel --use_loc --use_rel --use_func --hidden_size=512 --att_size=512 --test_batch_size=2 --beam_size=
2 --eval_metric=CIDEr
335it [01:21, 4.13it/s]
init COCO-EVAL scorer
tokenization...
Traceback (most recent call last):
File "evaluate.py", line 107, in
metrics = evaluate(opt, net, opt.test_range, opt.test_prediction_txt_path, reference)
File "evaluate.py", line 75, in evaluate
scores, sub_category_score = scorer.score(reference, prediction_json, prediction_json.keys())
File "./caption-eval\cocoeval.py", line 64, in score
print('tokenization...')
OSError: [WinError 1] 函数不正确。

Would I ask one question?

File "/RMN-master/models/allennlp_beamsearch.py", line 257, in search
state_tensor.reshape(batch_size, self.beam_size, *last_dims)
RuntimeError: gather_out_cuda(): Expected dtype int64 for index

text feature processing

Sir, can you share the link to process the text features and from which you have generated a caption.pkl file.

problems about feature extraction models

Hi,tgc! I tried using Torch's fasterrcnn_resnet50_fpn pre-trained model to extract the region_features of the video, but found that the feature shapes I extracted were only [823, 4], which is far from [26, 36, 2048] and [26, 36, 5] in the dataset you provided. What does the extra dimension mean, or what do these three dimensions mean respectively?
I wonder that is it feasible to use Torchvision's fasterrcnn_resnet50_fpn model to extract features without using caffe's Fast R-CNN model?The sizeof features extracted using Torchvision's fasterrcnn_resnet50_fpn model is significantly insufficient.How can I extract more features and accurate feature dimensions that meet the requirements?

a problem about features

Hello, may I ask what the method do you use to extract features and regional features from videos?Thank you

result reproduce for msr-vtt dataset

Hi, Ganchao!
i have difficulty in reprodcing the experiment results for msr-vtt.
i have executed the project on msr-vtt several times and always got unideal results.
the cider scores just fluctuate from 45 to 46.5 which is far from the results i.e. 49.6 reported in the paper.
would it be convenient for u to share the random seed values set in ur experiments for msr-vtt with me?
training on msr-vtt is too time-consuming, 6 days or so when using a single gpu.
looking forward to ur help, thanks!

Inference on custom raw video

Hey @tgc1997
Thanks for providing the implementations of such an awesome work!!!

I wanted to know how does one go about using the pre-trained models for inferencing on raw custom videos?

a problem about sample.py

when I run sample.py line 102, in
net.load_state_dict(torch.load(opt.model_pth_path))
RuntimeError: Error(s) in loading state_dict for CapModel:
Unexpected key(s) in state_dict: "decoder.module_selection.loc_fc.weight", "decoder.module_selection.loc_fc.bias", "decoder.module_selection.rel_fc.weight", "decoder.module_selection.rel_fc.bias", "decoder.module_selection.func_fc.weight", "decoder.module_selection.func_fc.bias", "decoder.module_selection.module_attn.wh.weight", "decoder.module_selection.module_attn.wh.bias", "decoder.module_selection.module_attn.wv.weight", "decoder.module_selection.module_attn.wv.bias", "decoder.module_selection.module_attn.wa.weight".
can you help me thank you very much

the mismatch error happened when using the pretarined model you provide.

awesome work!
when i reproduce the results you report in this repository (i.e. cider metric score is 97.8 on msvd dataset), errors indicating size mismatch for the whole Capmodel occurred as running evaluate.py with your pretrained file results/msvd_model/msvd_best_cider.pth.
e. g.
Runtime error: Error(s) in loading state_dictionary for CapModel:
size mismatch for encoder.bi_lstm1.weight_it_l0: copying a parameters with shape torch.Size([2048,1000]) from checkpoint, the shape in current model is torch.Size([5200,1000]).
size mismatch ……
size mismatch ……
it seems like you have modified the model while don't update the msvd_best_cider.pth.
if you do so please let me know
and i would appreciate it if you provide the new version PTH file so that i can reproduce the results you report in this repository.
by the way why the final high results was not published in the paper?
thanks!

Spatial Feats

Hi,
I see you using the 2D CNN features (1536 dim), 3D CNN features (1024 dim), RCNN features (2048 dim). I also see something called spatial features of 5 dimensions. What are these features? I could not find them mentioned anywhere in the paper?

a bug report

hi! Ganchao here is a bug report.
i find out an error which might lead to inaccurate model reproduction or training results when i debug and reproduce the project.
the att_size=1024 set in run command console does not work. the reason is as follows:
despite the initial att_size parameter inside in initialize function of the SoftAttention & GumbelAttention class is opt.att_size(=1024), all of the att_size parameters within the reference instances actually are opt.hidden_size (=512 for msvd dataset / =1300 for msr-vtt dataset).
related code lines:

def __init__(self, feat_size, hidden_size, att_size):

def __init__(self, feat_size, hidden_size, att_size):

self.spatial_attn = SoftAttention(opt.region_projected_size, opt.hidden_size, opt.hidden_size)

self.temp_attn = SoftAttention(feat_size, opt.hidden_size, opt.hidden_size)

self.spatial_attn = SoftAttention(region_feat_size, opt.hidden_size, opt.hidden_size)

self.relation_attn = SoftAttention(2*feat_size, opt.hidden_size, opt.hidden_size)

self.cell_attn = SoftAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

self.module_attn = SoftAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

self.module_attn = GumbelAttention(opt.hidden_size, opt.hidden_size, opt.hidden_size)

bug
if this parameter att_size do work as our expected, the opt.hidden_size in the above code lines should be replaced by opt.att_size. is it right ?
thanks!

POS

您好!我想请教一下您这里的POS词性是怎么获得的,我看到只有0,1,2这三种类型,请问他们分别代表什么词性?谢谢!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.