xg-chu / crowddet Goto Github PK
View Code? Open in Web Editor NEW[CVPR 2020] Detection in Crowded Scenes: One Proposal, Multiple Predictions
License: MIT License
[CVPR 2020] Detection in Crowded Scenes: One Proposal, Multiple Predictions
License: MIT License
(base) root@dgx2:~/data/gvision/CrowdDet-master/tools# python3 train.py -md rcnn_fpn_baseline
Num of GPUs:2, learning rate:0.00500, mini batch size:2,
train_epoch:30, iter_per_epoch:3750, decay_epoch:[20, 26]
Init multi-processing training...
Traceback (most recent call last):
File "train.py", line 174, in
run_train()
File "train.py", line 171, in run_train
multi_train(args, config, Network)
File "train.py", line 155, in multi_train
torch.multiprocessing.spawn(train_worker, nprocs=num_gpus, args=(train_config, network, config))
File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 0 terminated with the following error:
Traceback (most recent call last):
File "/opt/conda/lib/python3.6/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/root/data/gvision/CrowdDet-master/tools/train.py", line 82, in train_worker
backbone_dict = torch.load(train_config.init_weights)
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 593, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/opt/conda/lib/python3.6/site-packages/torch/serialization.py", line 765, in _legacy_load
raise RuntimeError("Invalid magic number; corrupt file?")
RuntimeError: Invalid magic number; corrupt file?
Your implementation of EMD loss here seems to be the same as smoothL1 loss between anchor and its pred boxes.What's the difference them?Maybe I don't figure it out clearly..
Error running test.py in Windows Environment:module 'det_tools_cuda' has no attribute 'nms'。Can you tell me what to do?
Thanks a lot!
Thanks a lot for your well done work!
I want to know how to debug your project. I don't have the visual interface console such as Pycharm, I could only debug the project by "import pdb pdb.set_trace()". But I find that the pdb module can't work in your project and always automatically exit the debugging state.
Do you have any suggestions? Thanks a lot!
你好,想请问一下目前公开的代码可以训练coco数据集么,还是只能训练crowd human数据集?
How do I continue training from the last break point?
How do I continue training(to 50 epoch) from the result(from 30 epoch)?
I met this assertionError when I was training this model.
Can you guys help me?
Traceback (most recent call last):
File "/anaconda3/envs/fasterRCNN/lib/python3.7/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/CrowdDet/tools/train.py", line 109, in train_worker
do_train_epoch(net, data_iter, optimizer, rank, epoch_id, train_config)
File "/CrowdDet/tools/train.py", line 58, in do_train_epoch
assert torch.isfinite(total_loss).all(), outputs
AssertionError: {'loss_rpn_cls': tensor(nan, device='cuda:0', grad_fn=<MulBackward0>), 'loss_rpn_loc': tensor(inf, device='cuda:0', grad_fn=<MulBackward0>), 'loss_rcnn_loc': tensor(nan, device='cuda:0', grad_fn=<MulBackward0>), 'loss_rcnn_cls': tensor(nan, device='cuda:0', grad_fn=<MulBackward0>)}```
Hi, I tried to apply your method to one-stage detector. Here, I used RetinaNet. However, the AP of pedestrian dropped drastically (something like 80%->50%). I have checked my code and I could not find any problem. I have visualized the results and I found that the detector has a high miss rate even after setting a low score threshold (e.g. 0.3). Furthermore, for the advantage emphasized by your paper, i.e., better detection for occluded pedestrians, I found that instead of detecting occluded pedestrians, the detector seems to give duplicate predictions (detections not dropped by nms). I trained the detector on a proprietary dataset. I used Focal Loss for the classifier and I think this should be able to handle the imbalance of positive and negative examples. Could you please give me some suggestions for dealing with this problem?
I met the error:
RuntimeError : copy_if failed to synchronize : cudaErrorIllegalAddress
an illegal memory access was encountered.
Can anyone help me? Thanks a lot.
Hi, I tried it on FCOS ; it predicts 2 instance maps at the features of overlap regions, but the final confidence map of one instance is high(about 0.7), and the other is very low (about 0.3), do you know why ?
KeyError Traceback (most recent call last)
in
9 "emd_redine.onnx",
10 verbose=True,
---> 11 opset_version=11
12 )
/work/anaconda3/envs/crowdDet/lib/python3.6/site-packages/torch/onnx/init.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
146 operator_export_type, opset_version, _retain_param_name,
147 do_constant_folding, example_outputs,
--> 148 strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
149
150
/work/anaconda3/envs/crowdDet/lib/python3.6/site-packages/torch/onnx/utils.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs)
64 _retain_param_name=_retain_param_name, do_constant_folding=do_constant_folding,
65 example_outputs=example_outputs, strip_doc_string=strip_doc_string,
---> 66 dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs)
67
68
/work/anaconda3/envs/crowdDet/lib/python3.6/site-packages/torch/onnx/utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, propagate, opset_version, _retain_param_name, do_constant_folding, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size)
414 example_outputs, propagate,
415 _retain_param_name, do_constant_folding,
--> 416 fixed_batch_size=fixed_batch_size)
417
418 # TODO: Don't allocate a in-memory string for the protobuf
/work/anaconda3/envs/crowdDet/lib/python3.6/site-packages/torch/onnx/utils.py in _model_to_graph(model, args, verbose, training, input_names, output_names, operator_export_type, example_outputs, propagate, _retain_param_name, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size)
294 graph = _optimize_graph(graph, operator_export_type,
295 _disable_torch_constant_prop=_disable_torch_constant_prop,
--> 296 fixed_batch_size=fixed_batch_size, params_dict=params_dict)
297
298 if isinstance(model, torch.jit.ScriptModule) or isinstance(model, torch.jit.ScriptFunction):
/work/anaconda3/envs/crowdDet/lib/python3.6/site-packages/torch/onnx/utils.py in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict)
133 torch._C._jit_pass_erase_number_types(graph)
134
--> 135 graph = torch._C._jit_pass_onnx(graph, operator_export_type)
136 torch._C._jit_pass_lint(graph)
137
/work/anaconda3/envs/crowdDet/lib/python3.6/site-packages/torch/onnx/init.py in _run_symbolic_function(*args, **kwargs)
177 def _run_symbolic_function(*args, **kwargs):
178 from torch.onnx import utils
--> 179 return utils._run_symbolic_function(*args, **kwargs)
180
181
/work/anaconda3/envs/crowdDet/lib/python3.6/site-packages/torch/onnx/utils.py in _run_symbolic_function(g, n, inputs, env, operator_export_type)
654 "torch.onnx.symbolic_opset{}.{} does not exist"
655 .format(op_name, opset_version, op_name))
--> 656 op_fn = sym_registry.get_registered_op(op_name, '', opset_version)
657 return op_fn(g, *inputs, **attrs)
658
/work/anaconda3/envs/crowdDet/lib/python3.6/site-packages/torch/onnx/symbolic_registry.py in get_registered_op(opname, domain, version)
89 warnings.warn("ONNX export failed. The ONNX domain and/or version are None.")
90 global _registry
---> 91 return _registry[(domain, version)][opname]
KeyError: 'linspace'
我使用了如下指令
cd tools
python3 inference.py -md rcnn_fpn_baseline -r 40 -i your_image_path.png
每一张图片的处理时间大约是4秒,我的显卡是3090,按理说不应该这么慢。
所以我想请问下您的代码应该怎么修改才能让他使用gpu。我自己进行了一些修改,但是会卡死。
我的修改:
net = network().cuda(0) # inference函数下的代码
sorry to bother you! when I run test.py ,the command line is python3 test.py -r 2 -d 14-15
, and error happens, I am training now and produced dump-2.pth just now.
0%| | 0/4370 [00:00<?, ?it/s]Process Process-1:
Traceback (most recent call last):
File "/home/joseph/anaconda3/envs/py36det/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/joseph/anaconda3/envs/py36det/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "test.py", line 68, in inference
image, gt_boxes, im_info, ID = get_data(record, device)
File "test.py", line 109, in get_data
image, config.eval_image_short_size, config.eval_image_max_size)
File "/home/joseph/ml_test/CrowdDet/model/crowd_fpn_baseline/dataset.py", line 113, in resize_img_by_short_and_max_size
img.shape[0], img.shape[1], short_size, max_size)
AttributeError: 'NoneType' object has no attribute 'shape'
Process Process-2:
Traceback (most recent call last):
File "/home/joseph/anaconda3/envs/py36det/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
self.run()
File "/home/joseph/anaconda3/envs/py36det/lib/python3.6/multiprocessing/process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "test.py", line 68, in inference
image, gt_boxes, im_info, ID = get_data(record, device)
File "test.py", line 109, in get_data
image, config.eval_image_short_size, config.eval_image_max_size)
File "/home/joseph/ml_test/CrowdDet/model/crowd_fpn_baseline/dataset.py", line 113, in resize_img_by_short_and_max_size
img.shape[0], img.shape[1], short_size, max_size)
AttributeError: 'NoneType' object has no attribute 'shape'
thanks for your reply!!
have you test on one-stage method such as retinaNet? what about the results?
When the image only have one object box,the CrowdDet will meet a error.
I also use original format dataset,using a image with one object,while the CrowdDet also meets the error.
BatchSize: 12
base lr: 1e-5
epoch: 40
best result:
AP: 0.8621
MR: 0.5121
JI: 0.7651
epoch: 18
不管我怎么更改lr, loss都会震荡, 例如, 从2.6->1.8->2.4
不知道是不是CrowdHuman数据集的问题, 请问您训练过程中有这样的情况吗
谢谢
python inference.py -md rcnn_emd_simple -r 50 -i "D:/project/51.jpg"
Traceback (most recent call last):
File "inference.py", line 118, in
run_inference()
File "inference.py", line 115, in run_inference
inference(args, config, Network)
File "inference.py", line 29, in inference
pred_boxes = post_process(pred_boxes, config, im_info[0, 2])
File "inference.py", line 68, in post_process
if pred_boxes.shape[0] > config.detection_per_image and
AttributeError: 'Config' object has no attribute 'detection_per_image'
作者您好:
最近我在使用Pytorch-Yolov3(github地址:https://github.com/eriklindernoren/PyTorch-YOLOv3)训练CrowdHuman数据集时报了以下错误:
/opt/conda/conda-bld/pytorch_1565287025495/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [0,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
/opt/conda/conda-bld/pytorch_1565287025495/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: lambda ->auto::operator()(int)->auto: block: [0,0,0], thread: [1,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds"
failed.
Traceback (most recent call last):
File "train.py", line 105, in
loss, outputs = model(imgs, targets)
File "/home/dengjie/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/dengjie/dengjie/Paper/detection/PyTorch-YOLOv3/models.py", line 262, in forward
x, layer_loss = module[0](x, targets, img_dim)
File "/home/dengjie/anaconda3/envs/pytorch/lib/python3.7/site-packages/torch/nn/modules/module.py", line 547, in call
result = self.forward(*input, **kwargs)
File "/home/dengjie/dengjie/Paper/detection/PyTorch-YOLOv3/models.py", line 189, in forward
ignore_thres=self.ignore_thres,
File "/home/dengjie/dengjie/Paper/detection/PyTorch-YOLOv3/utils/utils.py", line 307, in build_targets
noobj_mask[b[i], anchor_ious > ignore_thres, gj[i], gi[i]] = 0
RuntimeError: CUDA error: device-side assert triggered
我自己根据odgt文件制作了训练需要的文件名和标签文件,运行时报了以上错误,请问下这是什么原因?
感谢。
您好 我最近在您的代码上做一下实验,但是我发现在train.py中,您似乎没有给dataloader提供distributed sampler,这样会导致每张卡可能会获得同样的图片作为训练数据,那么有可能有些图片就一直训练不到,这是否对crowdhuman的训练造成影响呢?
here is your code in function box_overlap_ignore_opr:
area_box = (box[:, 2] - box[:, 0] + 1) * (box[:, 3] - box[:, 1] + 1)
area_gt = (gt[:, 2] - gt[:, 0] + 1) * (gt[:, 3] - gt[:, 1] + 1)
width_height = torch.min(box[:, None, 2:], gt[:, 2:4]) - torch.max(
box[:, None, :2], gt[:, :2]) # [N,M,2]
my question is that when you calculating the area of gt and box , you plus 1 to the width and height
but when you calculate the width_height of intersect box , you do not plus 1 , which means even a gt box can not have a very high iou intersection with it self
so i think the right way to calculate width_height is like this:
width_height = torch.min(box[:, None, 2:], gt[:, 2:4]) - torch.max(
box[:, None, :2], gt[:, :2]) + 1
am i right :)
Hi, many thanks for your nice work! May I ask is this code suitable for head detection in crowdhuman dataset? I saw you use full body detection instead. If possible, where should I change for head detection?
This code is currently unlicensed. Would it be possible to add a resonably permissive license?
关于tools/eval_json.py工具的具体使用,json文件和gt文件的格式应该是什么样子的
solved
Hi, I am interested in your work and I found that the FPN baseline reported in your paper is AP85.8 but the result here is AP87.13. There's a huge margin between them. Are there some setting differences between these two results? I notice the test_nms
is set to 0.5 for both.
Thanks for your great works!
I would like to ask a question about evalution process.
Can evaluate/compute_APMR.py get the AP value of Citypersons?
Thanks a lot!
您好,我用其他框架复现了cascade_rmd_simple,但MR和JI只能跑到41.0和82.8,和faster r-cnn差不多,请问能不能上传一下cascade的代码?
what do the image_mean and image_std mean? thanks
In the paper, it is mentioned K=2 best suite for the most of the datasets. How the results behaves if higher value of K is used?
Traceback (most recent call last):
File "train.py", line 189, in
run_train()
File "train.py", line 186, in run_train
multi_train(args, config, Network)
File "train.py", line 170, in multi_train
torch.multiprocessing.spawn(train_worker, nprocs=num_gpus, args=(train_config, network, config))
File "/home/dxc/miniconda3/envs/lmrl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 200, in spawn
return start_processes(fn, args, nprocs, join, daemon, start_method='spawn')
File "/home/dxc/miniconda3/envs/lmrl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 158, in start_processes
while not context.join():
File "/home/dxc/miniconda3/envs/lmrl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 119, in join
raise Exception(msg)
Exception:
-- Process 1 terminated with the following error:
Traceback (most recent call last):
File "/home/dxc/miniconda3/envs/lmrl/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 20, in _wrap
fn(i, *args)
File "/home/dxc/project/CrowdDet/tools/train.py", line 100, in train_worker
net.resnet50.load_state_dict(backbone_dict['state_dict'])
File "/home/dxc/miniconda3/envs/lmrl/lib/python3.8/site-packages/torch/nn/modules/module.py", line 846, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for ResNet50:
Missing key(s) in state_dict: "conv1.bias".
Hi,
I tried to reproduce the result of EMD_simple without SetNMS. In the paper, such setting should be better than the baseline.
I just changed the value of if_set_nms
to False
in this line and test with the given EMD_simple model. But I got really bad results for mAP:0.5202, mMR:0.9828. I've checked the code carefully and I didn't find anything wrong. Results for SetNMS + MIP is normal (mAP:0.9041, mMR:0.4255).
Is there anything else that I didn't notice for applying normal NMS? Any suggestions?
您好,请问:
1.有计划将论文方法加到mmdet里面去吗?感觉加入进去后,更方便和其他方法组合,另外也可以比较公平的来对比各个方法的性能
2.有个问题,如果一个proposal和两个gt boxes匹配上了,并且这两个gt boxes的标签都是同一类,比如说person,请问训练时,这个proposal 的label就是两个1,是吗?(这里的1代表person类的标签)
3. 另外,文章中的refinement module,是不是和cascade的**是一样的呢?拿第一次得到的预测结果,然后再做roi pooling后,再做一次分类和回归?我看代码实现由one stage和two stage两种,是不是refinement 模块只能用在two stage上呢?
感觉这里存在问题,Retinanet部分中,取了与gt重合度最大的iou的下标,然后将其对应的gt标签置为gt的标签,因为前面取了top_k个与anchor对应的gt,假设这里top_k=2,anchor数量为A,那么labels数量就是A*2,此时gt_assignment_for_gt中anchor的下标就和原来anchor下标无法对应了,因为anchor相当于取了top_k份,那么顺序就发生了改变
if config.allow_low_quality:
labels[gt_assignment_for_gt] = gt_boxes_perimg[:, 4]
low_quality_bbox_targets = bbox_transform_opr(
anchors[gt_assignment_for_gt], gt_boxes_perimg[:, :4])
bbox_targets[gt_assignment_for_gt] = low_quality_bbox_targets
想请教一下作者,在数据集cityperson上训练需要做除了代码中给的数据增强之外的一些增强方法扩充网络训练的数据集数量吗?如果用cityperson的子集resonable训练网络会出现过拟合现象吗?
config:
nums_gpu: 1,
batch_size: 4,
lr: 1e-4 * 4.25,
The train loss is very volatile. for example, the loss jump from 1.4 to 2.5 and then decrease to 1.6.
20 epoch later, the loss is still drastic change, Is the epoch too small to get good result or the lr is too high?
when i use python3 inference.py -md rcnn_fpn_baseline -r 40 -i test.jpeg
../model/rcnn_fpn_baseline/outputs/model_dump/dump-40.pth
Traceback (most recent call last):
File "inference.py", line 118, in
run_inference()
File "inference.py", line 115, in run_inference
inference(args, config, Network)
File "inference.py", line 28, in inference
pred_boxes = net(resized_img, im_info).numpy()
File "/home/xs/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "../model/rcnn_fpn_baseline/network.py", line 31, in forward
return self._forward_test(image, im_info)
File "../model/rcnn_fpn_baseline/network.py", line 49, in _forward_test
pred_bbox = self.RCNN(fpn_fms, rpn_rois)
File "/home/xs/.local/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in call
result = self.forward(*input, **kwargs)
File "../model/rcnn_fpn_baseline/network.py", line 76, in forward
pool_features = roi_pooler(fpn_fms, rcnn_rois, stride, (7, 7), "ROIAlignV2")
File "../lib/layers/pooler.py", line 42, in roi_pooler
sampling_ratio=-1, aligned=pooler_aligned)
TypeError: roi_align() got an unexpected keyword argument 'aligned'
你好,看了您的论文受益匪浅,我目前在尝试修改anchor-free的算法。正在参考retinanet+EMD的修改思路,但是看不太懂retina_anchor_target.py部分的代码。
想请问下你改进retinanet的思路是什么,主要改了哪些部分。
特别想问的就是,每个先验框anchor预测两组回归值,那怎么把GT与两组anchor结果对应起来计算损失呢?
比如某个点只有一个GT框,但是会预测两个Bbox,怎么对应起来呢?
谢谢!
torch1.5 seems not support cuda10.0 :)
Besides, what's your version of torchvision? I got no operator nms Exception when calling torchvision.nms
论文中写道在coco和CityPersons上进行了实验。是否是需要将这两个数据集转换成与CrowdHuman相同的格式?是否可以提供实现格式转换的文件?
hi,
when using my own data, one error occured.
File "../../lib/det_opr/fpn_roi_target.py", line 47, in fpn_roi_target
fg_mask = fg_mask.reshape(-1, top_k)
RuntimeError: shape '[-1, 2]' is invalid for input of size 2001
I check the code and cannot find where to modified.
gt_boxes_perimg = gt_boxes[bid, :int(im_info[bid, 5]), :]
IndexError: index 1 is out of bounds for dimension 0 with size 1
Hi, I am trying to implement CrowdDet on my own. However, I wonder what's your strategy when handling multiple class detection? In your code fpn_roi_target.py
, I notice that you simply take the top-2 iou label as the target for each proposal, so maybe they can be two different classes?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.