lbin / centernet-better-plus Goto Github PK
View Code? Open in Web Editor NEWcenternet
License: MIT License
centernet
License: MIT License
代码有问题,完全不收敛
dataset :coco2017
IMS_PER_BATCH:2
[08/21 15:47:57 d2.data.common]: Serializing 117266 elements to byte tensors and concatenating them all ...
[08/21 15:48:01 d2.data.common]: Serialized dataset takes 451.21 MiB
[08/21 15:48:01 d2.data.build]: Using training sampler TrainingSampler
[08/21 15:48:05 fvcore.common.checkpoint]: No checkpoint found. Initializing model from scratch
[08/21 15:48:05 d2.engine.train_loop]: Starting training from iteration 0
[08/21 15:48:06 d2.utils.events]: eta: 2:36:14 iter: 19 total_loss: 20.71 loss_cls: 18.63 loss_box_wh: 3.366 loss_center_reg: 0.3933 time: 0.0750 data_time: 0.0180 lr: 0.00039962 max_mem: 411M
[08/21 15:48:08 d2.utils.events]: eta: 2:39:49 iter: 39 total_loss: 12.1 loss_cls: 8.956 loss_box_wh: 1.814 loss_center_reg: 0.368 time: 0.0758 data_time: 0.0027 lr: 0.00079922 max_mem: 411M
[08/21 15:48:10 d2.utils.events]: eta: 2:40:21 iter: 59 total_loss: 10.76 loss_cls: 6.582 loss_box_wh: 3.11 loss_center_reg: 0.3979 time: 0.0760 data_time: 0.0024 lr: 0.0011988 max_mem: 411M
[08/21 15:48:11 d2.utils.events]: eta: 2:40:19 iter: 79 total_loss: 11.63 loss_cls: 7.729 loss_box_wh: 2.653 loss_center_reg: 0.2949 time: 0.0762 data_time: 0.0027 lr: 0.0015984 max_mem: 411M
[08/21 15:48:13 d2.utils.events]: eta: 2:39:15 iter: 99 total_loss: 11.51 loss_cls: 6.495 loss_box_wh: 2.633 loss_center_reg: 0.2932 time: 0.0757 data_time: 0.0027 lr: 0.001998 max_mem: 411M
[08/21 15:48:14 d2.utils.events]: eta: 2:39:43 iter: 119 total_loss: 17.47 loss_cls: 11.55 loss_box_wh: 2.588 loss_center_reg: 0.2923 time: 0.0757 data_time: 0.0025 lr: 0.0023976 max_mem: 411M
[08/21 15:48:16 d2.utils.events]: eta: 2:38:11 iter: 139 total_loss: 13.35 loss_cls: 8.074 loss_box_wh: 3.39 loss_center_reg: 0.267 time: 0.0755 data_time: 0.0024 lr: 0.0027972 max_mem: 411M
[08/21 15:48:17 d2.utils.events]: eta: 2:39:10 iter: 159 total_loss: 12.71 loss_cls: 8.765 loss_box_wh: 2.91 loss_center_reg: 0.2659 time: 0.0758 data_time: 0.0026 lr: 0.0031968 max_mem: 411M
ERROR [08/21 15:48:18 d2.engine.train_loop]: Exception during training:
Traceback (most recent call last):
File "/home/ma-user/work/Projects/CenterNet-better-plus/detectron2-master/detectron2/engine/train_loop.py", line 141, in train
self.run_step()
File "/home/ma-user/work/Projects/CenterNet-better-plus/detectron2-master/detectron2/engine/train_loop.py", line 244, in run_step
self._detect_anomaly(losses, loss_dict)
File "/home/ma-user/work/Projects/CenterNet-better-plus/detectron2-master/detectron2/engine/train_loop.py", line 257, in _detect_anomaly
self.iter, loss_dict
FloatingPointError: Loss became infinite or NaN at iteration=167!
loss_dict = {'loss_cls': tensor(inf, device='cuda:0', grad_fn=<MulBackward0>), 'loss_box_wh': tensor(2.2988, device='cuda:0', grad_fn=<MulBackward0>), 'loss_center_reg': tensor(0.2414, device='cuda:0', grad_fn=<MulBackward0>), 'data_time': 0.0025475993752479553}
[08/21 15:48:18 d2.engine.hooks]: Overall training speed: 165 iterations in 0:00:12 (0.0762 s / it)
[08/21 15:48:18 d2.engine.hooks]: Total training time: 0:00:12 (0:00:00 on hooks)
Traceback (most recent call last):
File "train_net.py", line 67, in <module>
args=(args,),
File "/home/ma-user/work/Projects/CenterNet-better-plus/detectron2-master/detectron2/engine/launch.py", line 62, in launch
main_func(*args)
File "train_net.py", line 55, in main
return trainer.train()
File "/home/ma-user/work/Projects/CenterNet-better-plus/detectron2-master/detectron2/engine/defaults.py", line 402, in train
super().train(self.start_iter, self.max_iter)
File "/home/ma-user/work/Projects/CenterNet-better-plus/detectron2-master/detectron2/engine/train_loop.py", line 141, in train
self.run_step()
File "/home/ma-user/work/Projects/CenterNet-better-plus/detectron2-master/detectron2/engine/train_loop.py", line 244, in run_step
self._detect_anomaly(losses, loss_dict)
File "/home/ma-user/work/Projects/CenterNet-better-plus/detectron2-master/detectron2/engine/train_loop.py", line 257, in _detect_anomaly
self.iter, loss_dict
FloatingPointError: Loss became infinite or NaN at iteration=167!
loss_dict = {'loss_cls': tensor(inf, device='cuda:0', grad_fn=<MulBackward0>), 'loss_box_wh': tensor(2.2988, device='cuda:0', grad_fn=<MulBackward0>), 'loss_center_reg': tensor(0.2414, device='cuda:0', grad_fn=<MulBackward0>), 'data_time': 0.0025475993752479553}
为啥loss会异常呢,有哪里不对吗,我只是把IMS_PER_BATCH从128改成了2,因为内存不够
I install pytorch 1.4.0 which is not compatible with the current detectron2's version
It remind me "ModuleNotFoundError: No module named 'torch.utils.hipify"
So I want to know your detectron2's version
On resnet18, I didn't change any configuration files. I just got a 22.8 map.
My config:
CUDNN_BENCHMARK: false
DATALOADER:
ASPECT_RATIO_GROUPING: true
FILTER_EMPTY_ANNOTATIONS: true
NUM_WORKERS: 4
REPEAT_THRESHOLD: 0.0
SAMPLER_TRAIN: TrainingSampler
DATASETS:
PRECOMPUTED_PROPOSAL_TOPK_TEST: 1000
PRECOMPUTED_PROPOSAL_TOPK_TRAIN: 2000
PROPOSAL_FILES_TEST: []
PROPOSAL_FILES_TRAIN: []
TEST:
How to test inference speed(FPS)?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.