facebookresearch / video-nonlocal-net Goto Github PK
View Code? Open in Web Editor NEWNon-local Neural Networks for Video Classification
License: Other
Non-local Neural Networks for Video Classification
License: Other
Sorry to trouble you again. I have two questions.
a. In default, the broadcast_computed_params=True, I find in this porject(/lib/models/model_builder_video.py), broadcast_computed_params=False, the reason is ?
b. about the NCCL(terminate called after throwing an instance of 'caffe2::EnforceNotMet' what(): [enforce fail at cuda_nccl_gpu.cc:40] status == ncclSuccess. 2 vs 0. Error at: ), if I set the cfg.DEBUG=True, that means
data_parallel_model.Parallelize_GPU(
model,
input_builder_fun=input_builder_fun,
forward_pass_builder_fun=forward_pass_builder_fun,
param_update_builder_fun=param_update_builder_fun,
devices=gpus,
rendezvous=rendezvous_ctx,
broadcast_computed_params=False,
optimize_gradient_memory=cfg.MODEL.MEMONGER,
use_nccl=not cfg.DEBUG, # org: True
)
use_nccl=False, this problem can be solved.
I worry is there effects on the performances.
Thank you in advance!
The network requires video length to be 64 or 128 frames. However, according to my UCF-101 test-log , I observed that all test videos are tested normally although some of them are shorter than 64 frames. how do you handle those videos when testing ?
I use kinetics dataset to train the non-local model. I get some output during training:
[swscaler @ 0x7d8770033900] deprecated pixel format used, make sure you did set range correctly
| Train ETA: 2 days, 19:18:01 LR: 0.00500000 Iters [150/300000] [0.00ep] Time 0.761 Loss 5.3394 top1 96.875 top5 90.625
| Train ETA: 2 days, 18:39:42 LR: 0.00500000 Iters [160/300000] [0.00ep] Time 0.715 Loss 4.9686 top1 81.250 top5 71.875
| Train ETA: 2 days, 18:38:18 LR: 0.00500000 Iters [170/300000] [0.00ep] Time 0.674 Loss 5.0748 top1 96.875 top5 81.250
| Train ETA: 2 days, 18:20:39 LR: 0.00500000 Iters [180/300000] [0.00ep] Time 0.583 Loss 4.8989 top1 93.750 top5 87.500
| Train ETA: 2 days, 18:09:08 LR: 0.00500000 Iters [190/300000] [0.00ep] Time 0.633 Loss 5.4064 top1 100.000 top5 93.750
| Train ETA: 2 days, 17:54:54 LR: 0.00500000 Iters [200/300000] [0.00ep] Time 0.639 Loss 5.1758 top1 90.625 top5 81.250
| Train ETA: 2 days, 17:35:07 LR: 0.00500000 Iters [210/300000] [0.01ep] Time 0.684 Loss 5.2266 top1 93.750 top5 90.625
| Train ETA: 2 days, 17:14:03 LR: 0.00500000 Iters [220/300000] [0.01ep] Time 0.794 Loss 5.3617 top1 96.875 top5 87.500
| Train ETA: 2 days, 16:52:24 LR: 0.00500000 Iters [230/300000] [0.01ep] Time 0.945 Loss 5.0694 top1 93.750 top5 78.125
[swscaler @ 0x7d875802c5a0] deprecated pixel format used, make sure you did set range correctly
what does this "deprecated pixel format used, make sure you did set range correctly" means?
Does it mean there exist something wrong of my training video?
Will this cause a bad accurary??
E0523 09:38:47.741405 10627 net_dag.cc:188] Exception from operator chain starting at '' (type 'Conv'): caffe2::EnforceNotMet: [enforce fail at reshape_op.h:93] total_size % size == 0. Argument `shape` does not agree with the input data. (100352 vs 4096) Error from operator:
input: "gpu_0/nonlocal_conv4_1_phi" output: "gpu_0/nonlocal_conv4_1_phi" output: "gpu_0/nonlocal_conv4_1_phi_shape5d" name: "" type: "Reshape" arg { name: "shape" ints: 8 ints: 512 ints: -1 } device_option { device_type: 1 cuda_gpu_id: 0 }
E0523 09:38:47.742872 10628 net_dag.cc:188] Secondary exception from operator chain starting at '' (type 'Conv'): caffe2::EnforceNotMet: [enforce fail at reshape_op.h:93] total_size % size == 0. Argument `shape` does not agree with the input data. (100352 vs 4096) Error from operator:
input: "gpu_0/nonlocal_conv4_1_g" output: "gpu_0/nonlocal_conv4_1_g" output: "gpu_0/nonlocal_conv4_1_g_shape5d" name: "" type: "Reshape" arg { name: "shape" ints: 8 ints: 512 ints: -1 } device_option { device_type: 1 cuda_gpu_id: 0 }
I encountered above error when set TEST.BATCH_SIZE 1
and NUM_GPUS 1
. and run_test_multicrop.sh
The total_size
100352 = 7^2 * 2^11 while 4096 = 2^12. So I try to change total_size by set batch_size
or num_gpus
to 2 (or any greater even number), Either way can makes this error disappeared.
I wonder how the reshape param [8, 512, -1] determined. Is there any way that I can downscale test batchsize?
BTW, The model is trained with batch_size
32 on 4 GPUs.
Hi @xiaolonw,
I built non-local nn with the official Caffe2 which is merged to Pytorch.
When I run training code, I got this error:
resnet_video_test
Traceback (most recent call last):
File "../tools/train_net_video.py", line 264, in
main()
File "../tools/train_net_video.py", line 260, in main
train(args)
File "../tools/train_net_video.py", line 101, in train
test_model, test_timer, test_meter = create_wrapper(is_train=False)
File "../tools/train_net_video.py", line 63, in create_wrapper
model.build_model()
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 116, in build_model
train=self.train, force_fw_only=self.force_fw_only
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 217, in create_data_parallel_model
use_nccl=not cfg.DEBUG, # org: True
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/data_parallel_model.py", line 34, in Parallelize_GPU
Parallelize(*args, **kwargs)
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/data_parallel_model.py", line 219, in Parallelize
input_builder_fun(model_helper_obj)
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 194, in add_video_input
batch_size=batch_size,
File "/home/jinchoi/src/video-nonlocal-net/lib/models/model_builder_video.py", line 159, in AddVideoInput
data, label = model.net.CustomizedVideoInput(
File "/home/jinchoi/src/nl-caffe2_new/build/caffe2/python/core.py", line 2171, in getattr
",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []
I think the custom video ops are not built because of some reason I don't know.
According to the link #3, I should turn on the USE_FFMPEG ON
in caffe2/CMakeLists.txt. However, in the new Caffe2 repository does not have 'option(USE_FFMPEG "Use ffmpeg" ON)' line in the caffe2/CMakeLists.txt file. It only has this line in the CMakeLists.txt file in the Caffe2 root directory.
Can you take a look, and tell me how to deal with this issue?
Btw, your Caffe2 repository cannot be cloned due to some dependencies have broken links. So I am using the official Caffe2 repository.
Thank you for this outstanding work about nonlocal-net!
When I run this code, I get some mistakes.
Thank you in advance!
Traceback (most recent call last):
File "../tools/train_net_video.py", line 264, in
main()
File "../tools/train_net_video.py", line 260, in main
train(args)
File "../tools/train_net_video.py", line 101, in train
test_model, test_timer, test_meter = create_wrapper(is_train=False)
File "../tools/train_net_video.py", line 63, in create_wrapper
model.build_model()
File "/home/users/buyingjia/video-nonlocal-net-master/lib/models/model_builder_video.py", line 116, in build_model
train=self.train, force_fw_only=self.force_fw_only
File "/home/users/buyingjia/video-nonlocal-net-master/lib/models/model_builder_video.py", line 217, in create_data_parallel_model
use_nccl=not cfg.DEBUG, # org: True
File "/home/users/buyingjia/caffe2/build/caffe2/python/data_parallel_model.py", line 45, in Parallelize_GPU
Parallelize(*args, **kwargs)
File "/home/users/buyingjia/caffe2/build/caffe2/python/data_parallel_model.py", line 206, in Parallelize
input_builder_fun(model_helper_obj)
File "/home/users/buyingjia/video-nonlocal-net-master/lib/models/model_builder_video.py", line 194, in add_video_input
batch_size=batch_size,
File "/home/users/buyingjia/video-nonlocal-net-master/lib/models/model_builder_video.py", line 159, in AddVideoInput
data, label = model.net.CustomizedVideoInput(
File "/home/users/buyingjia/caffe2/build/caffe2/python/core.py", line 2044, in getattr
",".join(workspace.C.nearby_opnames(op_type)) + ']'
AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []
Is thers something I forget to do?
2018-02-17 16:11:35.414 [INFO: test_net.py: 204]: gt_labels 19761, sample_num 19761
2018-02-17 16:11:39.536 [INFO: test_net.py: 249]: Num of empty videos: 0
2018-02-17 16:11:39.536 [INFO: test_net.py: 250]: Num of corrupted videos: 0
2018-02-17 16:11:39.536 [INFO: test_net.py: 251]: Max num of clips in a video: 10
2018-02-17 16:11:39.536 [INFO: test_net.py: 252]: Min num of clips in a video: 10
2018-02-17 16:11:39.536 [INFO: test_net.py: 256]: Clip1 accuracy: 64.66 percent (12777/19761)
2018-02-17 16:11:39.536 [INFO: test_net.py: 260]: Clip accuracy: 64.43 percent (127330/197610)
2018-02-17 16:11:39.995 [INFO: test_net.py: 283]: --------------------------------------------------------------------------------
2018-02-17 16:11:39.995 [INFO: test_net.py: 284]: top-1 accuracy: 71.89 percent
2018-02-17 16:11:39.995 [INFO: test_net.py: 285]: top-5 accuracy: 90.29 percent
2018-02-17 16:11:39.996 [INFO: test_net.py: 286]: --------------------------------------------------------------------------------
2018-02-17 16:18:15.690 [INFO: test_net.py: 182]: Temporary file saved to: ./results_probs.pkl
2018-02-17 16:18:16.287 [INFO: train_net.py: 253]: 10-clip spatial fcn testing finished
In this log, the performance is 71.89%, but in the web is 73.2%:
script | input frames | freeze bn? | 3D conv? | non-local? | top1 | top5 | model | logs |
---|---|---|---|---|---|---|---|---|
run_i3d_baseline_300k.sh | 8 | - | Yes | - | 73.2 | 90.8 | link | link |
Is there something I missing or 73.2% is testing in another testing type?
I test same video(39 video clips about 10s) using two models on one gpu:
the shell command is :
Why model_1 takes less gpu memory than model_2 while test batch_size for model 1 is 8 , for model_2 is 6?
7701MiB for model_1 during testing
22919MiB for model_2 during testing
Why model_1 is much faster than model_2 on the whole processing time?
Is there anything that i miss?
In the readme file:
To train the i3d Non-local Networks with longer clips (32-frame input), we first need to obtain the model trained from "run_i3d_baseline_400k.sh" as a pre-trained model. Then we convert the Batch Normalization layers into Affine layers by running: python modify_caffe2_ftvideo.py xxxx
what is an affine layer? Is it a conv layer without batch normalizaion? Does the model with affine layer has more parameters than the one with BN layer arrording the gpu memory taking? Is there some other operation in affine layer?
hello,
in the update_bn_stats_gpu function,
workspace.FeedBlob(
'gpu{}/'.format(i) + bn_layer + '_bn_rm',
np.array(self._meanX_dict[bn_layer], dtype=np.float32),
meanX of 200 * batch_size * num_gpu training samples is computed, then rewrite the mem of bn_layer + '_bn_rm'.
so why not use the running mean accumulated during training?
why the mean computed during COMPUTE_PRECISE_BN switch is more precise?
Hi,
I met some problem When I want to train the model i3d_nlnet_affine_400k, I noted that you said a converted pretrained model was provided in pretrained_model.tar.gz. But I did't find it. The only model I can use is affine_model_30k.pkl in run_i3d_baseline.
Can I use this converted model to train i3d_nlnet_affine_400k ?
Or if it works that download your i3d_baseline model and convert it into affine_400k.pkl ?
Thanks!
Hi Xionglong
Since I failed to "git clone --recursive https://github.com/xiaolonw/caffe2". I follow your instruction to replace ./caffe2/video files in official caffe2 to install. When I running test_net_multicrop.sh, I met following error:
[INFO: checkpoints.py: 247]: res5_2_branch2c_bn_rm loaded from weights file into: gpu_0/res5_2_branch2c_bn_rm (2048,)
[INFO: checkpoints.py: 247]: res5_2_branch2c_bn_riv loaded from weights file into: gpu_0/res5_2_branch2c_bn_riv (2048,)
*** Aborted at 1525166815 (unix time) try "date -d @1525166815" if you are using GNU date ***
PC: @ 0x7f7d8b793bd9 caffe2::CustomizedVideoInputOp<>::DecodeAndTransform()
*** SIGSEGV (@0x48) received by PID 25821 (TID 0x7f7d3eb9e700) from PID 72; stack trace: ***
@ 0x7f7d9bffa4b0 (unknown)
@ 0x7f7d8b793bd9 caffe2::CustomizedVideoInputOp<>::DecodeAndTransform()
@ 0x7f7d8b790c23 std::_Function_handler<>::_M_invoke()
@ 0x7f7d8b744d0b caffe2::TaskThreadPool::main_loop()
@ 0x7f7d9076cc5c execute_native_thread_routine_compat
@ 0x7f7d9caa66ba start_thread
@ 0x7f7d9c0cc41d clone
@ 0x0 (unknown)
Segmentation fault (core dumped)
It seems an error caused by customizedvideoinput, how can I correct it? Thanks so much!
I meet some error like below:
../lib/libcaffe2.so:对‘cpuinfo_deinitialize’未定义的引用
../lib/libcaffe2.so:对‘cpuinfo_get_l4_cache’未定义的引用
../lib/libcaffe2_gpu.so:对‘google::FlagRegisterer::FlagRegisterer<std::__cxx11::basic_string<char, std::char_traits, std::allocator > >(char const*, char const*, char const*, std::__cxx11::basic_string<char, std::char_traits, std::allocator >, std::__cxx11::basic_string<char, std::char_traits, std::allocator >)’未定义的引用
../lib/libcaffe2.so:对‘cpuinfo_get_l2_cache’未定义的引用
../lib/libcaffe2.so:对‘cpuinfo_get_l1d_cache’未定义的引用
../lib/libcaffe2.so:对‘cpuinfo_initialize’未定义的引用
do you know how to solve it ? I found there is no cpuinfo submodule in third_party. After i install cpuinfo, the error also happens
hi! I add your non-local module in Detectron but lower accuracy than baseline, I only add non-local on backbone, what would I ignore
Hi, I find that there exist two interesting two parameters, one is the CONV_INIT_STD and the other one is BN_MOMENTUM. The default choice is channel dependent or 0.1 separately.
But, you set them as fixed value such as 0.1 and 0.9.
Very interesting, it would be great if you could share the influence of such parameters.
when i do : python2 -c 'from caffe2.python import core' 2>/dev/null && echo "Success" || echo "Failure"
output : Success
but when i do:python -c 'from caffe2.python import workspace; print(workspace.NumCudaDevices())'
output :
WARNING:root:This caffe2 python run does not have GPU support. Will run in CPU only mode.
WARNING:root:Debug message: No module named caffe2_pybind11_state_gpu
0
when i install caffe2,i already do : conda install cudnn=6.0.21=cuda8.0_0,now i do it again,it output :# All requested packages already installed.
what should i do ,i want use GPU,thanks
In the scripts/run_test_multicrop.sh file, I find TRAIN.BATCH_SIZE. Is this a parameter about training epoch? Why we should set this value during testing?
I find when save the checkpoint, the code will "compute_and_update_bn_stats", it will compute BN in a sufficiently large training batch. Through this operation it will get a "PRECISE_BN", which is good than 8 traing samples(per GPU 8 samples with 8 GPUs). This operation will get a good performnce?
I find in my own caffe code, the performance (16 samples per GPU) is better than (8 samples per GPU), about 1.5%.
Traditionally, the training video input should be segmentations cropped from raw videos according to the annotation file which tells the segmentation when to start and when to end. Then the training input will sample the clip from these segmentations. However, I cannot find your code doing such thing. Do your code just sample the clip from the whole raw video data without any cropping work?
Hi, I was trying to follow the installation example of Xiaolong. However, I got the error messages as follows when doing "cmake -DCMAKE_INSTALL_PREFIX:PATH=/path/to/caffe2/build/install .." :
cmake: /my/path/to/anaconda2/envs/caffe2/bin/../lib/libstdc++.so.6: version 'GLIBCXX_3.4.20' not found (requested by cmake)
cmake: /my/path/to/anaconda2/envs/caffe2/bin/../lib/libstdc++.so.6: version 'CXXABI_1.3.9' not found (requested by cmake)
cmake: /my/path/to/anaconda2/envs/caffe2/bin/../lib/libstdc++.so.6: version 'GLIBCXX_3.4.21' not found (requested by cmake)
I am wondering why this happens. Thank you.
Hi,
Xiaolong,Sorry to intrupt you again!
I meet a new problem when I use run_test_multicrop.sh
for test_iter in range(total_test_net_iters):
# timer.tic()
workspace.RunNet(test_model.net.Proto().name)
# timer.toc()
*** Aborted at 1525368135 (unix time) try "date -d @1525368135" if you are using GNU date ***
PC: @ 0x7f91145255ef caffe2::CustomizedVideoInputOp<>::GetClipAndLabelFromDBValue()
*** SIGSEGV (@0x40) received by PID 28 (TID 0x7f9087fff700) from PID 64; stack trace: ***
@ 0x7f916f8654b0 (unknown)
@ 0x7f91145255ef caffe2::CustomizedVideoInputOp<>::GetClipAndLabelFromDBValue()
@ 0x7f9114525d20 caffe2::CustomizedVideoInputOp<>::DecodeAndTransform()
@ 0x7f9114522ef3 std::_Function_handler<>::_M_invoke()
@ 0x7f91144fbb2b caffe2::TaskThreadPool::main_loop()
@ 0x7f90db4e4c80 (unknown)
@ 0x7f916fc016ba start_thread
@ 0x7f916f93741d clone
@ 0x0 (unknown)
Segmentation fault (core dumped)
Wait for your watch!
Hi, the original paper tested a Mask R-CNN with a non-local block (by adding one non-local block
right before the last residual block of res4 ).
I want to use your non-local block for object detection along with Mask R-CNN. Do you have code for this? If yes, can you please upload it?
i meet this problem:Failed to clone 'third_party/eigen',anyone can help me?
Hi @xiaolonw,
I cannot clone your Caffe2 repo recursively.
When I type
git clone --recursive https://github.com/xiaolonw/caffe2
It shows me
Clone of 'https://github.com/RLovelett/eigen.git' into submodule path 'third_party/eigen' failed
So I modified .gitmodule file to have eigen url as
https://github.com/eigenteam/eigen-git-mirror.git
It fixes the problem regarding eigen. However when I try to update submodules by typing "git submodule update --init --recursive", I face this:
Unable to checkout 'c80d6f7a924b53942b569b45278517565ea43d82' in submodule path 'third_party/aten'
Can you take a look?
Thanks for your contribution in open source!
Having trouble in building the caffe2 and NL net long time, could you please provide a built docker image like TSN.
Best wishes for you!
You should change the bug code total_test_iters = int( math.ceil(cfg.TEST.DATASET_SIZE / float(cfg.TEST.BATCH_SIZE)))
to total_test_iters = int( math.floor(cfg.TEST.DATASET_SIZE / float(cfg.TEST.BATCH_SIZE)))
. The source code will let the reader_ of lmdb to read over max length of test data list.
I find total temporal stride is 4 in paper, but this repo is 2.
also I find 3 temporal kernel in paper but this repo use 2 temporal kernel.
Is it correct? Is there some change after publishing?
Hi everyone,
This is the code of non-local block without sum op.
# 3d spacetime nonlocal (v1: spatial downsample)
def spacetime_nonlocal(
model, blob_in, dim_in, dim_out, batch_size, prefix, dim_inner,
is_test, max_pool_stride=2):
# ---------------------
cur = blob_in
# we do projection to convert each spacetime location to a feature
# theta original size
# e.g., (8, 1024, 4, 14, 14) => (8, 1024, 4, 14, 14)
theta = model.ConvNd(
cur, prefix + '_theta',
dim_in,
dim_inner,
[1, 1, 1],
strides=[1, 1, 1],
pads=[0, 0, 0] * 2,
weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)
# phi and g: half spatial size
# e.g., (8, 1024, 4, 14, 14) => (8, 1024, 4, 7, 7)
if cfg.NONLOCAL.USE_MAXPOOL is True:
max_pool = model.MaxPool(
cur, prefix + '_pool',
kernels=[1, max_pool_stride, max_pool_stride],
strides=[1, max_pool_stride, max_pool_stride],
pads=[0, 0, 0] * 2,
)
else:
max_pool = cur
phi = model.ConvNd(
max_pool, prefix + '_phi',
dim_in,
dim_inner,
[1, 1, 1],
strides=[1, 1, 1],
pads=[0, 0, 0] * 2,
weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)
g = model.ConvNd(
max_pool, prefix + '_g',
dim_in,
dim_inner,
[1, 1, 1],
strides=[1, 1, 1],
pads=[0, 0, 0] * 2,
weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD}),
bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)
# we have to use explicit batch size (to support arbitrary spacetime size)
# e.g., (8, 1024, 4, 14, 14) => (8, 1024, 784)
theta, theta_shape_5d = model.Reshape(
theta, [theta + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else theta,
theta + '_shape5d'],
shape=(batch_size, dim_inner, -1))
phi, phi_shape_5d = model.Reshape(
phi, [phi + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else phi,
phi + '_shape5d'],
shape=(batch_size, dim_inner, -1))
g, g_shape_5d = model.Reshape(
g, [g + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else g,
g + '_shape5d'],
shape=(batch_size, dim_inner, -1))
# e.g., (8, 1024, 784) * (8, 1024, 784) => (8, 784, 784)
theta_phi = model.net.BatchMatMul([theta, phi], prefix + '_affinity', trans_a=1)
if cfg.NONLOCAL.USE_SOFTMAX is True:
if cfg.NONLOCAL.USE_SCALE is True:
theta_phi_sc = model.Scale(theta_phi, theta_phi, scale=dim_inner**-.5)
else:
theta_phi_sc = theta_phi
# softmax
# sum(p[i, j, :]) == 1, for any i, j
p = model.Softmax(theta_phi_sc, theta_phi + '_prob', engine='CUDNN', axis=2)
else:
ones = model.net.ConstantFill([theta_phi], [theta_phi + '_ones'], value=1.)
ones = model.net.ReduceBackSum([ones], [theta_phi + '_const'])
zeros = model.net.ConstantFill([theta_phi], [theta_phi + '_zeros'], value=0.)
denom = model.net.Add(
[zeros, ones], [theta_phi + '_denom'], broadcast=1, axis=0)
model.StopGradient(denom, denom)
p = model.net.Div([theta_phi, denom], [theta_phi + '_sc'])
# note: g's axis[2] corresponds to p's axis[2]
# e.g., g(8, 1024, 784_2) * p(8, 784_1, 784_2) => (8, 1024, 784_1)
t = model.net.BatchMatMul([g, p], prefix + '_y', trans_b=1)
# reshape back:
# e.g., (8, 1024, 784) => (8, 1024, 4, 14, 14)
t_re, t_shape = model.Reshape(
[t, theta_shape_5d],
[t + '_re' if not cfg.MODEL.ALLOW_INPLACE_RESHAPE else t,
t + '_shape3d'])
blob_out = t_re
blob_out = model.ConvNd(
blob_out, prefix + '_out',
dim_inner,
dim_out,
[1, 1, 1],
strides=[1, 1, 1],
pads=[0, 0, 0] * 2,
weight_init=('GaussianFill', {'std': cfg.NONLOCAL.CONV_INIT_STD})
if not cfg.NONLOCAL.USE_ZERO_INIT_CONV else
('ConstantFill', {'value': 0.}),
bias_init=('ConstantFill', {'value': 0.}), no_bias=cfg.NONLOCAL.NO_BIAS)
if cfg.NONLOCAL.USE_BN is True:
blob_out = model.SpatialBN(
blob_out, prefix + "_bn", dim_out,
epsilon=cfg.NONLOCAL.BN_EPSILON, momentum=cfg.NONLOCAL.BN_MOMENTUM,
is_test=is_test
)
model.param_init_net.ConstantFill(
[prefix + "_bn_s"], prefix + "_bn_s", value=cfg.NONLOCAL.BN_INIT_GAMMA)
if cfg.NONLOCAL.USE_AFFINE is True:
blob_out = model.AffineNd(blob_out, prefix + "_bn", dim_out)
return blob_out
def add_nonlocal(model, blob_in, dim_in, dim_out, batch_size, prefix, dim_inner):
is_test = model.split in ['test', 'val']
blob_out = spacetime_nonlocal(
model, blob_in, dim_in, dim_out, batch_size, prefix, dim_inner, is_test)
blob_out = model.net.Sum([blob_in, blob_out], prefix + "_sum")
return blob_out
Moreover, in the documents of caffe:
Batch Matrix multiplication Yi = Ai * Bi, where A has shape (dim0, dim1, … M, K), B has shape (dim0, dim1, … K, N), Y has shape (dim0, dim1, … M, N) and i ranges from 0 to (dim0 * dim1 …) - 1. rank(A) == rank(B) >= 2. In case of A and B being two diemnsional, it behaves like normal matrix multiplication.
even Theta, phi, and g both (8, 1024, 784), they can't do the matmul op either. because Theta's M = 1024, K = 784, phi's K = 1024, N = 784.
I'm doing some work with sensorflow, so I need to know all detail in the source code of Non-local block. Then write the same one with tensorflow.
Dear Xiaolonw,
Hi, this is an excellent job!
I am using UCF101 as training and testing data. After compiled and installed caffe2 with the new video module in caffe2_customized_ops folder, I can run the code. However, I met some error before the training started. Here is the output in command:
[INFO: checkpoints.py: 128]: No checkpoint found; training from scratch...
[INFO: train_net_video.py: 127]: ------------- Training model... -------------
[INFO: metrics.py: 57]: Resetting train metrics...
[swscaler @ 0x7ea2c0021be0] (null) is not supported as input pixel format
[swscaler @ 0x7ea2b4030a20] (null) is not supported as input pixel format
[IMGUTILS @ 0x7fa2d33e6bc0] Picture size 1x0 is invalid
[IMGUTILS @ 0x7fa2d33e6bc0] Picture size 1x0 is invalid
*** Aborted at 1530171970 (unix time) try "date -d @1530171970" if you are using GNU date ***
[IMGUTILS @ 0x7fa2d1fd4bc0] Picture size 1x0 is invalid
[IMGUTILS @ 0x7fa2d1fd4bc0] Picture size 1x0 is invalid
PC: @ 0x7fa30d0bfd1a sws_scale
*** SIGSEGV (@0x40) received by PID 40756 (TID 0x7fa2d33e9700) from PID 64; stack trace: ***
@ 0x7fa389b72390 (unknown)
@ 0x7fa30d0bfd1a sws_scale
@ 0x7fa3793eece0 caffe2::CustomVideoDecoder::decodeLoop()
@ 0x7fa3793f058e caffe2::CustomVideoDecoder::decodeFile()
@ 0x7fa3793fb187 caffe2::DecodeClipFromVideoFileFlex()
@ 0x7fa342c206fc caffe2::CustomizedVideoInputOp<>::GetClipAndLabelFromDBValue()
@ 0x7fa342c20b70 caffe2::CustomizedVideoInputOp<>::DecodeAndTransform()
@ 0x7fa342c1d8b3 std::_Function_handler<>::_M_invoke()
@ 0x7fa342bf02eb caffe2::TaskThreadPool::main_loop()
@ 0x7fa3124c18f0 (unknown)
@ 0x7fa389b686ba start_thread
@ 0x7fa38918e41d clone
Segmentation fault (core dumped)
I have checked with the path of lmdb, and it is true. The error happens in this line.
More details of my experiment:
I prepare the data just as mentioned in DATASET.md, here are the steps:
(1) divide the data into train set(70%) and test set(30%)
(2) shuffle train set
(3) create lmdb with create_video_lmdb.py
I use only 1 GPU and modified the NUM_GPUS variable to 1 in configs/DBG_kinetics_resnet_8gpu_c2d_nonlocal_affine_400k.yaml.
I did not use the pre-trained model, so it trains from scratch.
My script of running the program are as follows:
CHECKPOINT_DIR=../data/checkpoints/run_i3d_nlnet_affine_400k_128f
mkdir ${CHECKPOINT_DIR}
python ../tools/train_net_video.py
--config_file ../configs/DBG_kinetics_resnet_8gpu_c2d_nonlocal_affine_400k.yaml
VIDEO_DECODER_THREADS 2
NONLOCAL.CONV3_NONLOCAL True
NONLOCAL.CONV4_NONLOCAL True
TRAIN.VIDEO_LENGTH 128
TRAIN.SAMPLE_RATE 1
TEST.VIDEO_LENGTH 128
TEST.SAMPLE_RATE 1
MODEL.MODEL_NAME resnet_video_org
MODEL.VIDEO_ARC_CHOICE 2
TRAIN.DROPOUT_RATE 0.5
CHECKPOINT.DIR ${CHECKPOINT_DIR}
DATADIR /home/lyj/video-nonlocal-net-master/data/lmdb/kinetics_lmdb_multicrop/
FILENAME_GT /home/lyj/vclf/data/ucfTrainTestlist/nltestlist.txt
2>&1 | tee ${CHECKPOINT_DIR}/log.txt
The reason of this error may be: (1) UCF101 dataset is not suitable for the code, and it needs to be processed before training. (2) caffe2 is not well installed....
If you need any data or details of my experiment, feel free to tell me. Hope for your response, Thanks!
I have trained a model using run_i3d_baseline_300k_4gpu.sh on the kinetics dataset according to the guide in ReadMe.md. I get same top-1 and top-5 accuracy in the README.md. I want to train a i3d model with non-local using 4gpus. But there is no top-1 and top-5 accuracy result in the ReadMe.md file. Is this possible for me to train this kind of model and getting higher accuracy than baseline? Why this model is not released?
centos system
I prepared the lmdb file for training and val set. then i run
sh run_i3d_baseline_300k_4gpu.sh
I get this error:
[E net_dag.cc:188] Exception from operator chain starting at '' (type 'NCCLAllreduce'): caffe2::EnforceNotMet: [enforce fail at cuda_nccl_gpu.cc:24] status == ncclSuccess. 2 vs 0. Error at: /root/code/official_caffe2/caffe2/caffe2/contrib/nccl/cuda_nccl_gpu.cc24: system error Error from operator:
input: "gpu_0/pred_b_grad" input: "gpu_1/pred_b_grad" input: "gpu_2/pred_b_grad" input: "gpu_3/pred_b_grad" output: "gpu_0/pred_b_grad" output: "gpu_1/pred_b_grad" output: "gpu_2/pred_b_grad" output: "gpu_3/pred_b_grad" name: "" type: "NCCLAllreduce" device_option { device_type: 1 cuda_gpu_id: 0 }
What should I do?
I've try to run your script "run_i3d_baseline_300k_4gpu.sh", but it takes more than 5 days instead of your 3 days.
When building non_local operator, I found both USE_SOFTMAX and USE_SCALE are True. Could you please explain the reason to implement a model.Scale in the script? Moreover, where can I find the source code of the operator and why scale=dim_inner**-.5?
When finetuning, the proceeding seems to be calculated by kinetic dataset size. I just can't find training size in config.py.
I am wondering whether you use the multi-head method to improve the performance.
The original self-attention use the multi-head method.
It seems that both the original self-attention and the related work(Relation Networks for Object Detection) all use the "positional encoding" help learn better affinity matrix.
I am wondering have you tried using the positional features, if so, it would be great if you could share the performance with the positional features.
Hi, when I install Caffe2 by "make -j16 install" I met an error message saying:
/usr/bin/ld: /usr/local/lib/libadvcodec.a(allcodecs.o): relocation R_X86_64_32 agaisnt 'ff_h264_cuvid_hwaccel' can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libavcodec.a: could not read symbols: Bad value
I guess it is caused by gflags, but I don't know how to fix it exactly. Do you have any suggestions? Thank you.
Dear,
I'm studying deep learning, and searching for a guideline that can help me organize files and directories in a deep learning project greatly. Unfortunately, I have not found one. I submit this issue to look for a guideline or some great deep learning projects written by Python.
Thanks.
PS. The link in the CODE_OF_CONDUCT should be update.
[ 85%] Linking CXX shared module python/caffe2_pybind11_state_gpu.so
[ 85%] Built target caffe2_pybind11_state_gpu
Makefile:140: recipe for target 'all' failed
make: *** [all] Error 2#
this is my cmake-output:
cmake_output.txt
The dir :/build/Makefile--140 line is :
I have install NNPACK using ninja, but when make caffe2, I got an error
usr/bin/ld: 找不到 -lnnpack
/usr/bin/ld: 找不到 -lpthreadpool
collect2: error: ld returned 1 exit status
caffe2/CMakeFiles/caffe2.dir/build.make:6645: recipe for target 'lib/libcaffe2.so' failed
make[2]: *** [lib/libcaffe2.so] Error 1
CMakeFiles/Makefile2:1138: recipe for target 'caffe2/CMakeFiles/caffe2.dir/all' failed
make[1]: *** [caffe2/CMakeFiles/caffe2.dir/all] Error 2
Makefile:138: recipe for target 'all' failed
make: *** [all] Error 2
Hi, I've installed pytorch(caffe2), and executed the command 'cp -r caffe2-video-nlnet/caffe2_customized_ops/video pytorch/caffe2/'
when I run bash run_c2d_baseline_400k_32f.sh,
It turns out:
AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []
Anyone can help me?
there also have no file-install to implement make -j16 install.
so i check "/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeError.log"
i want know how to solve it . thanks
Determining if the pthread_create exist failed with the following output:
Change Dir: /home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp
Run Build Command:"/usr/bin/make" "cmTC_d9137/fast"
/usr/bin/make -f CMakeFiles/cmTC_d9137.dir/build.make CMakeFiles/cmTC_d9137.dir/build
make[1]: Entering directory '/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_d9137.dir/CheckSymbolExists.c.o
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cc -o CMakeFiles/cmTC_d9137.dir/CheckSymbolExists.c.o -c /home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c
Linking C executable cmTC_d9137
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cmake -E cmake_link_script CMakeFiles/cmTC_d9137.dir/link.txt --verbose=1
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cc -rdynamic CMakeFiles/cmTC_d9137.dir/CheckSymbolExists.c.o -o cmTC_d9137
CMakeFiles/cmTC_d9137.dir/CheckSymbolExists.c.o: In function main': CheckSymbolExists.c:(.text+0x16): undefined reference to
pthread_create'
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_d9137.dir/build.make:97: recipe for target 'cmTC_d9137' failed
make[1]: *** [cmTC_d9137] Error 1
make[1]: Leaving directory '/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp'
Makefile:126: recipe for target 'cmTC_d9137/fast' failed
make: *** [cmTC_d9137/fast] Error 2
File /home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp/CheckSymbolExists.c:
/* */
#include <pthread.h>
int main(int argc, char** argv)
{
(void)argv;
#ifndef pthread_create
return ((int*)(&pthread_create))[argc];
#else
(void)argc;
return 0;
#endif
}
Determining if the function pthread_create exists in the pthreads failed with the following output:
Change Dir: /home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp
Run Build Command:"/usr/bin/make" "cmTC_2ff92/fast"
/usr/bin/make -f CMakeFiles/cmTC_2ff92.dir/build.make CMakeFiles/cmTC_2ff92.dir/build
make[1]: Entering directory '/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp'
Building C object CMakeFiles/cmTC_2ff92.dir/CheckFunctionExists.c.o
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -o CMakeFiles/cmTC_2ff92.dir/CheckFunctionExists.c.o -c /home/kong.ye/install_pakage/yes/envs/caffe2/share/cmake-3.9/Modules/CheckFunctionExists.c
Linking C executable cmTC_2ff92
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cmake -E cmake_link_script CMakeFiles/cmTC_2ff92.dir/link.txt --verbose=1
/home/kong.ye/install_pakage/yes/envs/caffe2/bin/cc -DCHECK_FUNCTION_EXISTS=pthread_create -rdynamic CMakeFiles/cmTC_2ff92.dir/CheckFunctionExists.c.o -o cmTC_2ff92 -lpthreads
/usr/bin/ld: cannot find -lpthreads
collect2: error: ld returned 1 exit status
CMakeFiles/cmTC_2ff92.dir/build.make:97: recipe for target 'cmTC_2ff92' failed
make[1]: *** [cmTC_2ff92] Error 1
make[1]: Leaving directory '/home/kong.ye/action_recognize/caffe2/build/CMakeFiles/CMakeTmp'
Makefile:126: recipe for target 'cmTC_2ff92/fast' failed
make: *** [cmTC_2ff92/fast] Error 2
Hello,
Thank you for the nice code.But I got a problem when I train the model.
Traceback (most recent call last):
File "train_net_video.py", line 21, in
from core.config import config as cfg
ImportError: cannot import name config
Wait for your watch!
Great repo.
I am wondering whether you could privide us the pytorch version~
Hi,
Sorry to truble you again,and I have a new problem:
"AttributeError: Method CustomizedVideoInput is not a registered operator. Did you mean: []"
As said in your paper, the nonlocal block is initialized as zero weights, while with its shortcut connection, it will not influence the original network. But in your implementation, nonlocal block is initialized just the same as normal conv, as shown in code
__C.NONLOCAL.USE_ZERO_INIT_CONV = False
__C.NONLOCAL.CONV_INIT_STD = 0.01
Is there any difference in performance between these two implementations?
Hi Xiaolong
Thank you for this outstanding work about Non Local network!
Since I want to test Kinetics-pretrained non-local network on other dataset (such as UCF-101) to extract feature, will you provide a inference example which read in a single video clip and output softmax score?
Thanks so much!
Because of conflict between opencl and cuda, you should add cv::ocl::setUseOpenCL(false);
and include the head file #include <opencv2/core/ocl.hpp>
.
By the way, I have realized data blob fetched from preprocessed video RGB images, which will dramaticly shorten the whole training cost from 30 days which process raw video data while runtime to 4 days. We can talk about this code to ensure its correctness if you'd like to.
Hi,
I am wondering if max pooling is used for phi and g function, the spatial dimension is reduced,
how can it be added back to the input (which has larger spatial resolution)?
Thanks.
IN The folder : process_data/kinetics,i don't find the file of change_listname.py.
please add it.thanks
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.