facebookresearch / video-nonlocal-net Goto Github PK

View Code? Open in Web Editor NEW

2.0K 2.0K 327.0 4.42 MB

Non-local Neural Networks for Video Classification

License: Other

CMake 0.58% C++ 31.80% Cuda 1.15% Python 62.08% Shell 4.39%

video-nonlocal-net's People

Contributors

Stargazers

Watchers

Forkers

shubhampachori12110095 xiaolonw ml-lab codes-kzhan wzmsltw willdamon liuyuying0829 zcrwind infodog starstylesky jangkyung pikerbright daicoolb murari023 elevanth hzhang57 zpert keyky hyzcn nanangarsyad vateye laycoding yangjingyuan aihgf liyuanyaun kingmv kwangsooshin wormcoder chriszhenghaochen qingsong99 winggy dongyangcai jetyingjia wanjinchang wh-forker zhouyonglong statml xshhhm locussam mzk665 hunnudl aust-hansen wuzeen kekedan pplntech wikipedia2008 huangzehao aliaksandrsiarohin gridl youngjt ai3dvision feiyunzhang xxxzhi kellycvcv himankmaan xuchao1688 gitmrzk zhangyaqiang compass-wang w452261940 minchaokang wbb123 yemenr gjyin longchuan1985 happyday521 clover978 gaoxing0031 yunwenhuang xiaoluenbi zhf459 codeaudit zhuxinqimac xqpinitial aileader ewenwan 232136813 luckyboysmith hqz2 zfxu idashuishen muscus353 soonhwan-kwon tqdavid waterbroz mehrdad-shokri junmuzi terrychenism abdelpakey mysee1989 gq124 eglxiang fguney mahmud83 northrend tandychao gavin666github damonliuthu luchencatherine dreadlord1984

video-nonlocal-net's Issues

broadcast_computed_params=False

Sorry to trouble you again. I have two questions.
a. In default, the broadcast_computed_params=True, I find in this porject(/lib/models/model_builder_video.py), broadcast_computed_params=False, the reason is ?

b. about the NCCL(terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at cuda_nccl_gpu.cc:40] status == ncclSuccess. 2 vs 0. Error at: ), if I set the cfg.DEBUG=True, that means
data_parallel_model.Parallelize_GPU(
model,
input_builder_fun=input_builder_fun,
forward_pass_builder_fun=forward_pass_builder_fun,
param_update_builder_fun=param_update_builder_fun,
devices=gpus,
rendezvous=rendezvous_ctx,
broadcast_computed_params=False,
optimize_gradient_memory=cfg.MODEL.MEMONGER,
use_nccl=not cfg.DEBUG, # org: True
)
use_nccl=False, this problem can be solved.
I worry is there effects on the performances.

Thank you in advance!

How does the test script sample an extremely short video?

The network requires video length to be 64 or 128 frames. However, according to my UCF-101 test-log , I observed that all test videos are tested normally although some of them are shorter than 64 frames. how do you handle those videos when testing ?

deprecated pixel format used, make sure you did set range correctly [during training]

I use kinetics dataset to train the non-local model. I get some output during training:

[swscaler @ 0x7d8770033900] deprecated pixel format used, make sure you did set range correctly
| Train ETA: 2 days, 19:18:01 LR: 0.00500000 Iters [150/300000] [0.00ep] Time 0.761 Loss 5.3394 top1 96.875 top5 90.625
| Train ETA: 2 days, 18:39:42 LR: 0.00500000 Iters [160/300000] [0.00ep] Time 0.715 Loss 4.9686 top1 81.250 top5 71.875
| Train ETA: 2 days, 18:38:18 LR: 0.00500000 Iters [170/300000] [0.00ep] Time 0.674 Loss 5.0748 top1 96.875 top5 81.250
| Train ETA: 2 days, 18:20:39 LR: 0.00500000 Iters [180/300000] [0.00ep] Time 0.583 Loss 4.8989 top1 93.750 top5 87.500
| Train ETA: 2 days, 18:09:08 LR: 0.00500000 Iters [190/300000] [0.00ep] Time 0.633 Loss 5.4064 top1 100.000 top5 93.750
| Train ETA: 2 days, 17:54:54 LR: 0.00500000 Iters [200/300000] [0.00ep] Time 0.639 Loss 5.1758 top1 90.625 top5 81.250
| Train ETA: 2 days, 17:35:07 LR: 0.00500000 Iters [210/300000] [0.01ep] Time 0.684 Loss 5.2266 top1 93.750 top5 90.625
| Train ETA: 2 days, 17:14:03 LR: 0.00500000 Iters [220/300000] [0.01ep] Time 0.794 Loss 5.3617 top1 96.875 top5 87.500
| Train ETA: 2 days, 16:52:24 LR: 0.00500000 Iters [230/300000] [0.01ep] Time 0.945 Loss 5.0694 top1 93.750 top5 78.125
[swscaler @ 0x7d875802c5a0] deprecated pixel format used, make sure you did set range correctly

what does this "deprecated pixel format used, make sure you did set range correctly" means?
Does it mean there exist something wrong of my training video?
Will this cause a bad accurary??

Reshape op raise Exception when TEST.BATCH_SIZE is too small

E0523 09:38:47.741405 10627 net_dag.cc:188] Exception from operator chain starting at '' (type 'Conv'): caffe2::EnforceNotMet: [enforce fail at reshape_op.h:93] total_size % size == 0. Argument `shape` does not agree with the input data. (100352 vs 4096) Error from operator: 
input: "gpu_0/nonlocal_conv4_1_phi" output: "gpu_0/nonlocal_conv4_1_phi" output: "gpu_0/nonlocal_conv4_1_phi_shape5d" name: "" type: "Reshape" arg { name: "shape" ints: 8 ints: 512 ints: -1 } device_option { device_type: 1 cuda_gpu_id: 0 }
E0523 09:38:47.742872 10628 net_dag.cc:188] Secondary exception from operator chain starting at '' (type 'Conv'): caffe2::EnforceNotMet: [enforce fail at reshape_op.h:93] total_size % size == 0. Argument `shape` does not agree with the input data. (100352 vs 4096) Error from operator: 
input: "gpu_0/nonlocal_conv4_1_g" output: "gpu_0/nonlocal_conv4_1_g" output: "gpu_0/nonlocal_conv4_1_g_shape5d" name: "" type: "Reshape" arg { name: "shape" ints: 8 ints: 512 ints: -1 } device_option { device_type: 1 cuda_gpu_id: 0 }

I encountered above error when set TEST.BATCH_SIZE 1 and NUM_GPUS 1. and run_test_multicrop.sh
The total_size 100352 = 7^2 * 2^11 while 4096 = 2^12. So I try to change total_size by set batch_size or num_gpus to 2 (or any greater even number), Either way can makes this error disappeared.
I wonder how the reshape param [8, 512, -1] determined. Is there any way that I can downscale test batchsize?
BTW, The model is trained with batch_size 32 on 4 GPUs.

Building with the official Caffe2 merged to Pytorch

Hi @xiaolonw,

I built non-local nn with the official Caffe2 which is merged to Pytorch.
When I run training code, I got this error:

resnet_video_test
Traceback (most recent call last):
File "../tools/train_net_video.py", line 264, in
main()
File "../tools/train_net_video.py", line 260, in main
train(args)
File "../tools/train_net_video.py", line 101, in train
test_model, test_timer, test_meter = create_wrapper(is_train=False)
File "../tools/train_net_video.py", line 63, in create_wrapper
model.build_model()
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 116, in build_model
train=self.train, force_fw_only=self.force_fw_only
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 217, in create_data_parallel_model
use_nccl=not cfg.DEBUG, # org: True
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/data_parallel_model.py", line 34, in Parallelize_GPU
Parallelize(*args, **kwargs)
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/data_parallel_model.py", line 219, in Parallelize
input_builder_fun(model_helper_obj)
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 194, in add_video_input
batch_size=batch_size,
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 159, in AddVideoInput
data, label = model.net.CustomizedVideoInput(
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/core.py", line 2171, in getattr
",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []
I think the custom video ops are not built because of some reason I don't know.

According to the link #3, I should turn on the USE_FFMPEG ON in caffe2/CMakeLists.txt. However, in the new Caffe2 repository does not have 'option(USE_FFMPEG "Use ffmpeg" ON)' line in the caffe2/CMakeLists.txt file. It only has this line in the CMakeLists.txt file in the Caffe2 root directory.

Can you take a look, and tell me how to deal with this issue?

Btw, your Caffe2 repository cannot be cloned due to some dependencies have broken links. So I am using the official Caffe2 repository.

Method CustomizedVideoInput is not a registered operator.

Thank you for this outstanding work about nonlocal-net!
When I run this code, I get some mistakes.
Thank you in advance!

Traceback (most recent call last):
File "../tools/train_net_video.py", line 264, in
main()
File "../tools/train_net_video.py", line 260, in main
train(args)
File "../tools/train_net_video.py", line 101, in train
test_model, test_timer, test_meter = create_wrapper(is_train=False)
File "../tools/train_net_video.py", line 63, in create_wrapper
model.build_model()
File "/home/users/buyingjia/video-nonlocal-net-master/lib/models/model_builder_video.py", line 116, in build_model
train=self.train, force_fw_only=self.force_fw_only
File "/home/users/buyingjia/video-nonlocal-net-master/lib/models/model_builder_video.py", line 217, in create_data_parallel_model
use_nccl=not cfg.DEBUG, # org: True
File "/home/users/buyingjia/caffe2/build/caffe2/python/data_parallel_model.py", line 45, in Parallelize_GPU
Parallelize(*args, **kwargs)
File "/home/users/buyingjia/caffe2/build/caffe2/python/data_parallel_model.py", line 206, in Parallelize
input_builder_fun(model_helper_obj)
File "/home/users/buyingjia/video-nonlocal-net-master/lib/models/model_builder_video.py", line 194, in add_video_input
batch_size=batch_size,
File "/home/users/buyingjia/video-nonlocal-net-master/lib/models/model_builder_video.py", line 159, in AddVideoInput
data, label = model.net.CustomizedVideoInput(
File "/home/users/buyingjia/caffe2/build/caffe2/python/core.py", line 2044, in getattr
",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []

Is thers something I forget to do?

i3d_baseline_8x8_IN_pretrain_300k.log

2018-02-17 16:11:35.414 [INFO: test_net.py: 204]: gt_labels 19761, sample_num 19761
2018-02-17 16:11:39.536 [INFO: test_net.py: 249]: Num of empty videos: 0
2018-02-17 16:11:39.536 [INFO: test_net.py: 250]: Num of corrupted videos: 0
2018-02-17 16:11:39.536 [INFO: test_net.py: 251]: Max num of clips in a video: 10
2018-02-17 16:11:39.536 [INFO: test_net.py: 252]: Min num of clips in a video: 10
2018-02-17 16:11:39.536 [INFO: test_net.py: 256]: Clip1 accuracy: 64.66 percent (12777/19761)
2018-02-17 16:11:39.536 [INFO: test_net.py: 260]: Clip accuracy: 64.43 percent (127330/197610)
2018-02-17 16:11:39.995 [INFO: test_net.py: 283]: --------------------------------------------------------------------------------
2018-02-17 16:11:39.995 [INFO: test_net.py: 284]: top-1 accuracy: 71.89 percent
2018-02-17 16:11:39.995 [INFO: test_net.py: 285]: top-5 accuracy: 90.29 percent
2018-02-17 16:11:39.996 [INFO: test_net.py: 286]: --------------------------------------------------------------------------------
2018-02-17 16:18:15.690 [INFO: test_net.py: 182]: Temporary file saved to: ./results_probs.pkl
2018-02-17 16:18:16.287 [INFO: train_net.py: 253]: 10-clip spatial fcn testing finished

In this log, the performance is 71.89%, but in the web is 73.2%:

script	input frames	freeze bn?	3D conv?	non-local?	top1	top5	model	logs
run_i3d_baseline_300k.sh	8	-	Yes	-	73.2	90.8	link	link

Is there something I missing or 73.2% is testing in another testing type?

Testing slower with model from run_i3d_nlnet_affine_400k.sh ?

I test same video(39 video clips about 10s) using two models on one gpu:

run_i3d_nlnet_affine_400k_128f.sh ---- model_1
run_i3d_nlnet_affine_400k.sh --- model_2

the shell command is :

CUDA_VISIBLE_DEVICES=1 python /root/code/nonlocal/video-nonlocal-net2/tools/../tools/test_net_video.py --config_file /root/code/nonlocal/video-nonlocal-net2/tools/../configs/DBG_kinetics_resnet_8gpu_c2d_nonlocal_affine_400k.yaml NUM_GPUS 1 TRAIN.BATCH_SIZE 32 TEST.BATCH_SIZE 8 TEST.PARAMS_FILE /root/code/nonlocal/video-nonlocal-net2/tools/../data/checkpoints/run_i3d_nlnet_affine_400k_128f/checkpoints/i3d_nonlocal_128x1_I3D_pretrain_400k.pkl VIDEO_DECODER_THREADS 5 NONLOCAL.CONV3_NONLOCAL True NONLOCAL.CONV4_NONLOCAL True TEST.VIDEO_LENGTH 128 TEST.SAMPLE_RATE 1 MODEL.MODEL_NAME resnet_video_org MODEL.VIDEO_ARC_CHOICE 2 TRAIN.DROPOUT_RATE 0.5 CHECKPOINT.DIR /root/code/nonlocal/video-nonlocal-net2/demo_videos/demo_result/471750300__3200_3600/./checkpoints/ DATADIR /root/code/nonlocal/video-nonlocal-net2/tools/../data/lmdb/kinetics_lmdb_multicrop/ FILENAME_GT /root/code/nonlocal/video-nonlocal-net2/tools/../process_data/kinetics/testlist.txt TEST.TEST_FULLY_CONV True TEST.DATASET_SIZE 39 ----- for model_1
CUDA_VISIBLE_DEVICES=1 python /root/code/nonlocal/video-nonlocal-net2/tools/../tools/test_net_video.py --config_file /root/code/nonlocal/video-nonlocal-net2/tools/../configs/DBG_kinetics_resnet_8gpu_c2d_nonlocal_affine_400k.yaml NUM_GPUS 1 TRAIN.BATCH_SIZE 6 TEST.BATCH_SIZE 6 TEST.PARAMS_FILE /root/code/nonlocal/video-nonlocal-net2/tools/../data/checkpoints/run_i3d_nlnet_affine_400k/checkpoints/i3d_nonlocal_32x4_I3D_pretrain_400k.pkl VIDEO_DECODER_THREADS 5 NONLOCAL.CONV3_NONLOCAL True NONLOCAL.CONV4_NONLOCAL True MODEL.VIDEO_ARC_CHOICE 2 TRAIN.DROPOUT_RATE 0.5 CHECKPOINT.DIR /root/code/nonlocal/video-nonlocal-net2/demo_videos/demo_result/471750300__3200_3600/./checkpoints/ DATADIR /root/code/nonlocal/video-nonlocal-net2/tools/../data/lmdb/kinetics_lmdb_multicrop/ FILENAME_GT /root/code/nonlocal/video-nonlocal-net2/tools/../process_data/kinetics/testlist.txt TEST.TEST_FULLY_CONV True TEST.DATASET_SIZE 39 --- for model_2

Why model_1 takes less gpu memory than model_2 while test batch_size for model 1 is 8 , for model_2 is 6?
7701MiB for model_1 during testing
22919MiB for model_2 during testing
Why model_1 is much faster than model_2 on the whole processing time?

Is there anything that i miss?

Questions about the BN layer and affine layers

In the readme file:
To train the i3d Non-local Networks with longer clips (32-frame input), we first need to obtain the model trained from "run_i3d_baseline_400k.sh" as a pre-trained model. Then we convert the Batch Normalization layers into Affine layers by running: python modify_caffe2_ftvideo.py xxxx

what is an affine layer? Is it a conv layer without batch normalizaion? Does the model with affine layer has more parameters than the one with BN layer arrording the gpu memory taking? Is there some other operation in affine layer?

Question about the COMPUTE_PRECISE_BN?

hello,
in the update_bn_stats_gpu function,
workspace.FeedBlob(
'gpu{}/'.format(i) + bn_layer + '_bn_rm',
np.array(self._meanX_dict[bn_layer], dtype=np.float32),

meanX of 200 * batch_size * num_gpu training samples is computed, then rewrite the mem of bn_layer + '_bn_rm'.
so why not use the running mean accumulated during training?
why the mean computed during COMPUTE_PRECISE_BN switch is more precise?

Can't find affine_model_400k.pkl

Hi,
I met some problem When I want to train the model i3d_nlnet_affine_400k, I noted that you said a converted pretrained model was provided in pretrained_model.tar.gz. But I did't find it. The only model I can use is affine_model_30k.pkl in run_i3d_baseline.

Can I use this converted model to train i3d_nlnet_affine_400k ?
Or if it works that download your i3d_baseline model and convert it into affine_400k.pkl ?

Thanks!

Error when running run_test.sh

Hi Xionglong

Since I failed to "git clone --recursive https://github.com/xiaolonw/caffe2". I follow your instruction to replace ./caffe2/video files in official caffe2 to install. When I running test_net_multicrop.sh, I met following error:

[INFO: checkpoints.py:  247]: res5_2_branch2c_bn_rm loaded from weights file into: gpu_0/res5_2_branch2c_bn_rm (2048,)
[INFO: checkpoints.py:  247]: res5_2_branch2c_bn_riv loaded from weights file into: gpu_0/res5_2_branch2c_bn_riv (2048,)
*** Aborted at 1525166815 (unix time) try "date -d @1525166815" if you are using GNU date ***
PC: @     0x7f7d8b793bd9 caffe2::CustomizedVideoInputOp<>::DecodeAndTransform()
*** SIGSEGV (@0x48) received by PID 25821 (TID 0x7f7d3eb9e700) from PID 72; stack trace: ***
    @     0x7f7d9bffa4b0 (unknown)
    @     0x7f7d8b793bd9 caffe2::CustomizedVideoInputOp<>::DecodeAndTransform()
    @     0x7f7d8b790c23 std::_Function_handler<>::_M_invoke()
    @     0x7f7d8b744d0b caffe2::TaskThreadPool::main_loop()
    @     0x7f7d9076cc5c execute_native_thread_routine_compat
    @     0x7f7d9caa66ba start_thread
    @     0x7f7d9c0cc41d clone
    @                0x0 (unknown)
Segmentation fault (core dumped)

It seems an error caused by customizedvideoinput, how can I correct it? Thanks so much!

cpuinfo undefined reference

I meet some error like below:
../lib/libcaffe2.so：对‘cpuinfo_deinitialize’未定义的引用
../lib/libcaffe2.so：对‘cpuinfo_get_l4_cache’未定义的引用
../lib/libcaffe2_gpu.so：对‘google::FlagRegisterer::FlagRegisterer<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >(char const*, char const*, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)’未定义的引用
../lib/libcaffe2.so：对‘cpuinfo_get_l2_cache’未定义的引用
../lib/libcaffe2.so：对‘cpuinfo_get_l1d_cache’未定义的引用
../lib/libcaffe2.so：对‘cpuinfo_initialize’未定义的引用

do you know how to solve it ? I found there is no cpuinfo submodule in third_party. After i install cpuinfo, the error also happens

lost accuracy

hi! I add your non-local module in Detectron but lower accuracy than baseline, I only add non-local on backbone, what would I ignore

The influence of the CONV_INIT_STD and BN_MOMENTUM

Hi, I find that there exist two interesting two parameters, one is the CONV_INIT_STD and the other one is BN_MOMENTUM. The default choice is channel dependent or 0.1 separately.

But, you set them as fixed value such as 0.1 and 0.9.

Very interesting, it would be great if you could share the influence of such parameters.

caffe2:This caffe2 python run does not have GPU support. Will run in CPU only mode

when i do : python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
output : Success
but when i do:python -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'
output :
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
0

when i install caffe2,i already do : conda install cudnn=6.0.21=cuda8.0_0,now i do it again,it output :# All requested packages already installed.

what should i do ,i want use GPU,thanks

Question about testing settings

In the scripts/run_test_multicrop.sh file, I find TRAIN.BATCH_SIZE. Is this a parameter about training epoch? Why we should set this value during testing?

cfg.TRAIN.COMPUTE_PRECISE_BN

 I find when save the checkpoint, the code will "compute_and_update_bn_stats", it will compute BN in a sufficiently large training batch.  Through this operation it will get a "PRECISE_BN", which is good than 8 traing samples(per GPU 8 samples with 8 GPUs). This operation will get a good performnce?

I find in my own caffe code, the performance (16 samples per GPU) is better than (8 samples per GPU), about 1.5%.

Your code use whole raw video instead of video segmentations according to annotation file when training?

Traditionally, the training video input should be segmentations cropped from raw videos according to the annotation file which tells the segmentation when to start and when to end. Then the training input will sample the clip from these segmentations. However, I cannot find your code doing such thing. Do your code just sample the clip from the whole raw video data without any cropping work?

cmake errors

Hi, I was trying to follow the installation example of Xiaolong. However, I got the error messages as follows when doing "cmake -DCMAKE_INSTALL_PREFIX:PATH=/path/to/caffe2/build/install .." :

cmake: /my/path/to/anaconda2/envs/caffe2/bin/../lib/libstdc++.so.6: version 'GLIBCXX_3.4.20' not found (requested by cmake)
cmake: /my/path/to/anaconda2/envs/caffe2/bin/../lib/libstdc++.so.6: version 'CXXABI_1.3.9' not found (requested by cmake)
cmake: /my/path/to/anaconda2/envs/caffe2/bin/../lib/libstdc++.so.6: version 'GLIBCXX_3.4.21' not found (requested by cmake)

I am wondering why this happens. Thank you.

run_test_multicrop.sh ERROR

Hi,
Xiaolong,Sorry to intrupt you again!
I meet a new problem when I use run_test_multicrop.sh

for test_iter in range(total_test_net_iters):
# timer.tic()
workspace.RunNet(test_model.net.Proto().name)
# timer.toc()

*** Aborted at 1525368135 (unix time) try "date -d @1525368135" if you are using GNU date ***
PC: @ 0x7f91145255ef caffe2::CustomizedVideoInputOp<>::GetClipAndLabelFromDBValue()
*** SIGSEGV (@0x40) received by PID 28 (TID 0x7f9087fff700) from PID 64; stack trace: ***
@ 0x7f916f8654b0 (unknown)
@ 0x7f91145255ef caffe2::CustomizedVideoInputOp<>::GetClipAndLabelFromDBValue()
@ 0x7f9114525d20 caffe2::CustomizedVideoInputOp<>::DecodeAndTransform()
@ 0x7f9114522ef3 std::_Function_handler<>::_M_invoke()
@ 0x7f91144fbb2b caffe2::TaskThreadPool::main_loop()
@ 0x7f90db4e4c80 (unknown)
@ 0x7f916fc016ba start_thread
@ 0x7f916f93741d clone
@ 0x0 (unknown)
Segmentation fault (core dumped)

Wait for your watch!

Code for object detection?

Hi, the original paper tested a Mask R-CNN with a non-local block (by adding one non-local block
right before the last residual block of res4 ).

I want to use your non-local block for object detection along with Mask R-CNN. Do you have code for this? If yes, can you please upload it?

Failed to clone 'third_party/eigen'

i meet this problem:Failed to clone 'third_party/eigen',anyone can help me?

Caffe2 installation error

Hi @xiaolonw,

I cannot clone your Caffe2 repo recursively.
When I type
git clone --recursive https://github.com/xiaolonw/caffe2

It shows me
Clone of 'https://github.com/RLovelett/eigen.git' into submodule path 'third_party/eigen' failed

So I modified .gitmodule file to have eigen url as
https://github.com/eigenteam/eigen-git-mirror.git

It fixes the problem regarding eigen. However when I try to update submodules by typing "git submodule update --init --recursive", I face this:
Unable to checkout 'c80d6f7a924b53942b569b45278517565ea43d82' in submodule path 'third_party/aten'

Can you take a look?

If there exist a docker image?

Thanks for your contribution in open source!

Having trouble in building the caffe2 and NL net long time, could you please provide a built docker image like TSN.

Best wishes for you!

Segmentation fault when testing while training

You should change the bug code total_test_iters = int( math.ceil(cfg.TEST.DATASET_SIZE / float(cfg.TEST.BATCH_SIZE))) to total_test_iters = int( math.floor(cfg.TEST.DATASET_SIZE / float(cfg.TEST.BATCH_SIZE))). The source code will let the reader_ of lmdb to read over max length of test data list.

Is there some change after publishing?

I find total temporal stride is 4 in paper, but this repo is 2.
also I find 3 temporal kernel in paper but this repo use 2 temporal kernel.

Is it correct? Is there some change after publishing?

Question about the data in Non-local block

Hi everyone,
This is the code of non-local block without sum op.

# 3d spacetime nonlocal (v1: spatial downsample)
def spacetime_nonlocal(
        model, blob_in, dim_in, dim_out, batch_size, prefix, dim_inner,
        is_test, max_pool_stride=2):
    # ---------------------
    cur = blob_in
    # we do projection to convert each spacetime location to a feature
    # theta original size
    # e.g., (8, 1024, 4, 14, 14) => (8, 1024, 4, 14, 14)

    theta = model.ConvNd(
        cur, prefix + '_theta',
        dim_in,
        dim_inner,
        [1, 1, 1],
        strides=[1, 1, 1],
        pads=[0, 0, 0] * 2,
        weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
        bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)

    # phi and g: half spatial size
    # e.g., (8, 1024, 4, 14, 14) => (8, 1024, 4, 7, 7)
    if cfg.NONLOCAL.USE_MAXPOOL is True:
        max_pool = model.MaxPool(
            cur, prefix + '_pool',
            kernels=[1, max_pool_stride, max_pool_stride],
            strides=[1, max_pool_stride, max_pool_stride],
            pads=[0, 0, 0] * 2,
        )
    else:
        max_pool = cur

    phi = model.ConvNd(
        max_pool, prefix + '_phi',
        dim_in,
        dim_inner,
        [1, 1, 1],
        strides=[1, 1, 1],
        pads=[0, 0, 0] * 2,
        weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
        bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)

    g = model.ConvNd(
        max_pool, prefix + '_g',
        dim_in,
        dim_inner,
        [1, 1, 1],
        strides=[1, 1, 1],
        pads=[0, 0, 0] * 2,
        weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
        bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)

    # we have to use explicit batch size (to support arbitrary spacetime size)
    # e.g., (8, 1024, 4, 14, 14) => (8, 1024, 784)
    theta, theta_shape_5d = model.Reshape(
        theta, [theta + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else theta,
            theta + '_shape5d'],
        shape=(batch_size, dim_inner, -1))
    phi, phi_shape_5d = model.Reshape(
        phi, [phi + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else phi,
            phi + '_shape5d'],
        shape=(batch_size, dim_inner, -1))
    g, g_shape_5d = model.Reshape(
        g, [g + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else g,
            g + '_shape5d'],
        shape=(batch_size, dim_inner, -1))

    # e.g., (8, 1024, 784) * (8, 1024, 784) => (8, 784, 784)
    theta_phi = model.net.BatchMatMul([theta, phi], prefix + '_affinity', trans_a=1)
    if cfg.NONLOCAL.USE_SOFTMAX is True:
        if cfg.NONLOCAL.USE_SCALE is True:
            theta_phi_sc = model.Scale(theta_phi, theta_phi, scale=dim_inner**-.5)
        else:
            theta_phi_sc = theta_phi
        # softmax
        # sum(p[i, j, :]) == 1, for any i, j
        p = model.Softmax(theta_phi_sc, theta_phi + '_prob', engine='CUDNN', axis=2)
    else:
        ones = model.net.ConstantFill([theta_phi], [theta_phi + '_ones'], value=1.)
        ones = model.net.ReduceBackSum([ones], [theta_phi + '_const'])

        zeros = model.net.ConstantFill([theta_phi], [theta_phi + '_zeros'], value=0.)
        denom = model.net.Add(
            [zeros, ones], [theta_phi + '_denom'], broadcast=1, axis=0)

        model.StopGradient(denom, denom)
        p = model.net.Div([theta_phi, denom], [theta_phi + '_sc'])

    # note: g's axis[2] corresponds to p's axis[2]
    # e.g., g(8, 1024, 784_2) * p(8, 784_1, 784_2) => (8, 1024, 784_1)
    t = model.net.BatchMatMul([g, p], prefix + '_y', trans_b=1)

    # reshape back:
    # e.g., (8, 1024, 784) => (8, 1024, 4, 14, 14)
    t_re, t_shape = model.Reshape(
        [t, theta_shape_5d],
        [t + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else t,
            t + '_shape3d'])
    blob_out = t_re

    blob_out = model.ConvNd(
        blob_out, prefix + '_out',
        dim_inner,
        dim_out,
        [1, 1, 1],
        strides=[1, 1, 1],
        pads=[0, 0, 0] * 2,
        weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD})
        if not cfg.NONLOCAL.USE_ZERO_INIT_CONV else
        ('ConstantFill', {'value': 0.}),
        bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)

    if cfg.NONLOCAL.USE_BN is True:
        blob_out = model.SpatialBN(
            blob_out, prefix + "_bn", dim_out,
            epsilon=cfg.NONLOCAL.BN_EPSILON, momentum=cfg.NONLOCAL.BN_MOMENTUM,
            is_test=is_test
        )
        model.param_init_net.ConstantFill(
            [prefix + "_bn_s"], prefix + "_bn_s", value=cfg.NONLOCAL.BN_INIT_GAMMA)

    if cfg.NONLOCAL.USE_AFFINE is True:
        blob_out = model.AffineNd(blob_out, prefix + "_bn", dim_out)

    return blob_out


def add_nonlocal(model, blob_in, dim_in, dim_out, batch_size, prefix, dim_inner):
    is_test = model.split in ['test', 'val']
    blob_out = spacetime_nonlocal(
        model, blob_in, dim_in, dim_out, batch_size, prefix, dim_inner, is_test)
    blob_out = model.net.Sum([blob_in, blob_out], prefix + "_sum")
    return blob_out

the matmul op
Theta is (8, 1024, 4, 14, 14). In the cfg file max pool is true. Thus, phi is (8, 1024, 4, 7, 7) and g is (8, 1024, 4, 7, 7)
Then Theta reshapes to (8, 1024, 784), phi and g reshape to (8, 1024, 192).
They can't do the matmul op.

Moreover, in the documents of caffe:

Batch Matrix multiplication Yi = Ai * Bi, where A has shape (dim0, dim1, … M, K), B has shape (dim0, dim1, … K, N), Y has shape (dim0, dim1, … M, N) and i ranges from 0 to (dim0 * dim1 …) - 1. rank(A) == rank(B) >= 2. In case of A and B being two diemnsional, it behaves like normal matrix multiplication.

even Theta, phi, and g both (8, 1024, 784), they can't do the matmul op either. because Theta's M = 1024, K = 784, phi's K = 1024, N = 784.

I'm doing some work with sensorflow, so I need to know all detail in the source code of Non-local block. Then write the same one with tensorflow.

channels' num
In the paper, the input's channel is 1024, then theta, phi, and g are 512. Why are they both 1024 in the source code? The performance has a notably improved?
In the paper, The writer didn't say AffineNd. Although AffineNd op is false in the cfg file, I want to know what is the motivation to add AffineNd layer.

ffmpeg error with the new caffe2_customized_ops module when training with UCF101 data

Dear Xiaolonw,

Hi, this is an excellent job!

I am using UCF101 as training and testing data. After compiled and installed caffe2 with the new video module in caffe2_customized_ops folder, I can run the code. However, I met some error before the training started. Here is the output in command:
[INFO: checkpoints.py: 128]: No checkpoint found; training from scratch...
[INFO: train_net_video.py: 127]: ------------- Training model... -------------
[INFO: metrics.py: 57]: Resetting train metrics...
[swscaler @ 0x7ea2c0021be0] (null) is not supported as input pixel format
[swscaler @ 0x7ea2b4030a20] (null) is not supported as input pixel format
[IMGUTILS @ 0x7fa2d33e6bc0] Picture size 1x0 is invalid
[IMGUTILS @ 0x7fa2d33e6bc0] Picture size 1x0 is invalid
*** Aborted at 1530171970 (unix time) try "date -d @1530171970" if you are using GNU date ***
[IMGUTILS @ 0x7fa2d1fd4bc0] Picture size 1x0 is invalid
[IMGUTILS @ 0x7fa2d1fd4bc0] Picture size 1x0 is invalid
PC: @ 0x7fa30d0bfd1a sws_scale
*** SIGSEGV (@0x40) received by PID 40756 (TID 0x7fa2d33e9700) from PID 64; stack trace: ***
@ 0x7fa389b72390 (unknown)
@ 0x7fa30d0bfd1a sws_scale
@ 0x7fa3793eece0 caffe2::CustomVideoDecoder::decodeLoop()
@ 0x7fa3793f058e caffe2::CustomVideoDecoder::decodeFile()
@ 0x7fa3793fb187 caffe2::DecodeClipFromVideoFileFlex()
@ 0x7fa342c206fc caffe2::CustomizedVideoInputOp<>::GetClipAndLabelFromDBValue()
@ 0x7fa342c20b70 caffe2::CustomizedVideoInputOp<>::DecodeAndTransform()
@ 0x7fa342c1d8b3 std::_Function_handler<>::_M_invoke()
@ 0x7fa342bf02eb caffe2::TaskThreadPool::main_loop()
@ 0x7fa3124c18f0 (unknown)
@ 0x7fa389b686ba start_thread
@ 0x7fa38918e41d clone
Segmentation fault (core dumped)

I have checked with the path of lmdb, and it is true. The error happens in this line.

More details of my experiment:

I prepare the data just as mentioned in DATASET.md, here are the steps:
(1) divide the data into train set(70%) and test set(30%)
(2) shuffle train set
(3) create lmdb with create_video_lmdb.py
I use only 1 GPU and modified the NUM_GPUS variable to 1 in configs/DBG_kinetics_resnet_8gpu_c2d_nonlocal_affine_400k.yaml.
I did not use the pre-trained model, so it trains from scratch.
My script of running the program are as follows:
CHECKPOINT_DIR=../data/checkpoints/run_i3d_nlnet_affine_400k_128f
mkdir ${CHECKPOINT_DIR}
python ../tools/train_net_video.py
--config_file ../configs/DBG_kinetics_resnet_8gpu_c2d_nonlocal_affine_400k.yaml
VIDEO_DECODER_THREADS 2
NONLOCAL.CONV3_NONLOCAL True
NONLOCAL.CONV4_NONLOCAL True
TRAIN.VIDEO_LENGTH 128
TRAIN.SAMPLE_RATE 1
TEST.VIDEO_LENGTH 128
TEST.SAMPLE_RATE 1
MODEL.MODEL_NAME resnet_video_org
MODEL.VIDEO_ARC_CHOICE 2
TRAIN.DROPOUT_RATE 0.5
CHECKPOINT.DIR ${CHECKPOINT_DIR}
DATADIR /home/lyj/video-nonlocal-net-master/data/lmdb/kinetics_lmdb_multicrop/
FILENAME_GT /home/lyj/vclf/data/ucfTrainTestlist/nltestlist.txt
2>&1 | tee ${CHECKPOINT_DIR}/log.txt

The reason of this error may be: (1) UCF101 dataset is not suitable for the code, and it needs to be processed before training. (2) caffe2 is not well installed....
If you need any data or details of my experiment, feel free to tell me. Hope for your response, Thanks!

training with 4gpus with non-local

I have trained a model using run_i3d_baseline_300k_4gpu.sh on the kinetics dataset according to the guide in ReadMe.md. I get same top-1 and top-5 accuracy in the README.md. I want to train a i3d model with non-local using 4gpus. But there is no top-1 and top-5 accuracy result in the ReadMe.md file. Is this possible for me to train this kind of model and getting higher accuracy than baseline? Why this model is not released?

error when training: status == ncclSuccess. 2 vs 0

centos system

I prepared the lmdb file for training and val set. then i run

sh run_i3d_baseline_300k_4gpu.sh
I get this error:

[E net_dag.cc:188] Exception from operator chain starting at '' (type 'NCCLAllreduce'): caffe2::EnforceNotMet: [enforce fail at cuda_nccl_gpu.cc:24] status == ncclSuccess. 2 vs 0. Error at: /root/code/official_caffe2/caffe2/caffe2/contrib/nccl/cuda_nccl_gpu.cc24: system error Error from operator:
input: "gpu_0/pred_b_grad" input: "gpu_1/pred_b_grad" input: "gpu_2/pred_b_grad" input: "gpu_3/pred_b_grad" output: "gpu_0/pred_b_grad" output: "gpu_1/pred_b_grad" output: "gpu_2/pred_b_grad" output: "gpu_3/pred_b_grad" name: "" type: "NCCLAllreduce" device_option { device_type: 1 cuda_gpu_id: 0 }

What should I do?

Low training speed

I've try to run your script "run_i3d_baseline_300k_4gpu.sh", but it takes more than 5 days instead of your 3 days.

why scale operator in non_local operator

When building non_local operator, I found both USE_SOFTMAX and USE_SCALE are True. Could you please explain the reason to implement a model.Scale in the script? Moreover, where can I find the source code of the operator and why scale=dim_inner**-.5?

Is epoch proceeding calculated by a set value in the logs?

When finetuning, the proceeding seems to be calculated by kinetic dataset size. I just can't find training size in config.py.

Do you use multi-head?

I am wondering whether you use the multi-head method to improve the performance.

The original self-attention use the multi-head method.

Do you use the positional encoding?

It seems that both the original self-attention and the related work(Relation Networks for Object Detection) all use the "positional encoding" help learn better affinity matrix.

I am wondering have you tried using the positional features, if so, it would be great if you could share the performance with the positional features.

Caffe2 installation error

Hi, when I install Caffe2 by "make -j16 install" I met an error message saying:

/usr/bin/ld: /usr/local/lib/libadvcodec.a(allcodecs.o): relocation R_X86_64_32 agaisnt 'ff_h264_cuvid_hwaccel' can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libavcodec.a: could not read symbols: Bad value

I guess it is caused by gflags, but I don't know how to fix it exactly. Do you have any suggestions? Thank you.

How to organize files/directories of a deep learning project

Dear,
I'm studying deep learning, and searching for a guideline that can help me organize files and directories in a deep learning project greatly. Unfortunately, I have not found one. I submit this issue to look for a guideline or some great deep learning projects written by Python.
Thanks.

PS. The link in the CODE_OF_CONDUCT should be update.

Have you know this error under cmake?

[ 85%] Linking CXX shared module python/caffe2_pybind11_state_gpu.so
[ 85%] Built target caffe2_pybind11_state_gpu
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2#

this is my cmake-output:
cmake_output.txt

The dir ：/build/Makefile--140 line is :

can't find -lnnpack

I have install NNPACK using ninja, but when make caffe2, I got an error
usr/bin/ld: 找不到 -lnnpack
/usr/bin/ld: 找不到 -lpthreadpool
collect2: error: ld returned 1 exit status
caffe2/CMakeFiles/caffe2.dir/build.make:6645: recipe for target 'lib/libcaffe2.so' failed
make[2]: *** [lib/libcaffe2.so] Error 1
CMakeFiles/Makefile2:1138: recipe for target 'caffe2/CMakeFiles/caffe2.dir/all' failed
make[1]: *** [caffe2/CMakeFiles/caffe2.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2

Method CustomizedVideoInput is not a registered operator. Did you mean: []

Hi, I've installed pytorch(caffe2), and executed the command 'cp -r caffe2-video-nlnet/caffe2_customized_ops/video pytorch/caffe2/'
when I run bash run_c2d_baseline_400k_32f.sh,
It turns out:
AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []
Anyone can help me?

after “cmake -DCMAKE_INSTALL_PREFIX:PATH=/path/to/caffe2/build/install ..”

there also have no file-install to implement make -j16 install.

so i check "/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeError.log"
i want know how to solve it . thanks
Determining if the pthread_create exist failed with the following output:
Change Dir: /home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_d9137/fast"
/usr/bin/make -f CMakeFiles/cmTC_d9137.dir/build.make CMakeFiles/cmTC_d9137.dir/build
make[1]: Entering directory '/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_d9137.dir/CheckSymbolExists.c.o
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cc -o CMakeFiles/cmTC_d9137.dir/CheckSymbolExists.c.o -c /home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c
Linking C executable cmTC_d9137
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cmake -E cmake_link_script CMakeFiles/cmTC_d9137.dir/link.txt --verbose=1
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cc -rdynamic CMakeFiles/cmTC_d9137.dir/CheckSymbolExists.c.o -o cmTC_d9137
CMakeFiles/cmTC_d9137.dir/CheckSymbolExists.c.o: In function main': CheckSymbolExists.c:(.text+0x16): undefined reference to pthread_create'
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_d9137.dir/build.make:97: recipe for target 'cmTC_d9137' failed
make[1]: *** [cmTC_d9137] Error 1
make[1]: Leaving directory '/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp'
Makefile:126: recipe for target 'cmTC_d9137/fast' failed
make: *** [cmTC_d9137/fast] Error 2

File /home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
/* */
#include <pthread.h>

int main(int argc, char** argv)
{
(void)argv;
#ifndef pthread_create
return ((int*)(&pthread_create))[argc];
#else
(void)argc;
return 0;
#endif
}

Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp

Run Build Command:"/usr/bin/make" "cmTC_2ff92/fast"
/usr/bin/make -f CMakeFiles/cmTC_2ff92.dir/build.make CMakeFiles/cmTC_2ff92.dir/build
make[1]: Entering directory '/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_2ff92.dir/CheckFunctionExists.c.o
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_2ff92.dir/CheckFunctionExists.c.o -c /home/kong.ye/install_pakage/yes/envs/caffe2/share/cmake-3.9/Modules/CheckFunctionExists.c
Linking C executable cmTC_2ff92
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cmake -E cmake_link_script CMakeFiles/cmTC_2ff92.dir/link.txt --verbose=1
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_2ff92.dir/CheckFunctionExists.c.o -o cmTC_2ff92 -lpthreads
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_2ff92.dir/build.make:97: recipe for target 'cmTC_2ff92' failed
make[1]: *** [cmTC_2ff92] Error 1
make[1]: Leaving directory '/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp'
Makefile:126: recipe for target 'cmTC_2ff92/fast' failed
make: *** [cmTC_2ff92/fast] Error 2

ImportError: cannot import name config

Hello,
Thank you for the nice code.But I got a problem when I train the model.
Traceback (most recent call last):
File "train_net_video.py", line 21, in
from core.config import config as cfg
ImportError: cannot import name config

Wait for your watch!

When can you provide the pytorch implementation~

Great repo.

I am wondering whether you could privide us the pytorch version~

AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []

Hi,
Sorry to truble you again,and I have a new problem:
"AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []"

Nonlocal block initialization

As said in your paper, the nonlocal block is initialized as zero weights, while with its shortcut connection, it will not influence the original network. But in your implementation, nonlocal block is initialized just the same as normal conv, as shown in code
__C.NONLOCAL.USE_ZERO_INIT_CONV = False
__C.NONLOCAL.CONV_INIT_STD = 0.01
Is there any difference in performance between these two implementations?

Inference example code for single video input

Hi Xiaolong
Thank you for this outstanding work about Non Local network!
Since I want to test Kinetics-pretrained non-local network on other dataset (such as UCF-101) to extract feature, will you provide a inference example which read in a single video clip and output softmax score?

Thanks so much!

cv::resize ocl error when using cuda

Because of conflict between opencl and cuda, you should add cv::ocl::setUseOpenCL(false); and include the head file #include <opencv2/core/ocl.hpp>.
By the way, I have realized data blob fetched from preprocessed video RGB images, which will dramaticly shorten the whole training cost from 30 days which process raw video data while runtime to 4 days. We can talk about this code to ensure its correctness if you'd like to.

Spatial MaxPool for reducing computation

Hi,

I am wondering if max pooling is used for phi and g function, the spatial dimension is reduced,
how can it be added back to the input (which has larger spatial resolution)?

Thanks.

IN The folder ： process_data/kinetics,i don't find the file of change_listname.py

IN The folder ： process_data/kinetics,i don't find the file of change_listname.py.
please add it.thanks