facebookresearch / detr Goto Github PK
View Code? Open in Web Editor NEWEnd-to-End Object Detection with Transformers
License: Apache License 2.0
End-to-End Object Detection with Transformers
License: Apache License 2.0
Environment
pytorch 1.3.1
torchvision 0.4.2
I am able to train the model successfully. However, the following mistake appear when I run the evaluation independetly.
srun --gres gpu:1 python main.py --batch_size 2 --no_aux_loss --eval --resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth --coco_path ../../dataset/
Traceback (most recent call last):
File "main.py", line 248, in
main(args)
File "main.py", line 106, in main
utils.init_distributed_mode(args)
File "/mnt/lustre/chenyuntao1/homes/gaopeng/mask_detr/detr/util/misc.py", line 416, in init_distributed_mode
world_size=args.world_size, rank=args.rank)
File "/mnt/lustre/chenyuntao1/homes/gaopeng/anaconda3/envs/detr/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group
store, rank, world_size = next(rendezvous(url))
File "/mnt/lustre/chenyuntao1/homes/gaopeng/anaconda3/envs/detr/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 130, in _env_rendezvous_handler
raise _env_error("MASTER_ADDR")
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set
Describe what you want to do, including:
Will code be added/released to generate the attention decoder heatmaps like in the paper? (i.e. the zebra and elephant images).
I've found heatmaps to be very useful for helping with training and understanding model performance, so hoping the code used in the paper will be added so we can generate these for our DETR models?
NOTE:
Only general answers are provided.
If you want to ask about "why X did not work", please use the
Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to DETR, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.
Hi, DETR teams,
According to the implementation in ultralytics/yolov3#310 (comment), and a similar are discussed in AlexeyAB/darknet#3114 (comment). It seems that such augmentation such as the mosaic techniques is helpful to detect smaller size object. I quote Jocher's conclusions below.
The smaller cars are detected earlier with less blinking and cars of all sizes show better behaved bounding boxes.
I check make_coco_transforms
of this repo, and visualized the augmented images and labels in
VOC dataset (use the same config of make_coco_transforms
here). Since the utilization of RandomSizeCrop
, all the labels associated to an image may be cropped. (So this repo supports training with no targets in an image? ๐ค๏ธ)
I want to know whether is there some plan about data augmentation.
Thank you!
Sorry for the naive question.
However, this is never made clear in the paper.
When scaling 8 GPU to 16 GPU, I guess we need to double the learning rate accordingly?
Hi Team,
I am.working on a custom dataset , which has 7 classes and I have 1500 images , I wanna train on DeTR, help me out how should I train the model
Thanks in advance
Describe what you want to do, including:
NOTE:
Only general answers are provided.
If you want to ask about "why X did not work", please use the
Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to DETR, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.
How to train a new model for Custom object detection in google colab.
Very nice repo!
I want to do a simple inference with the panoptic segmentation model. How can I visualize the output of the panoptic model after the "panoptic post-processing"?
Thanks ;)
hi, how to inference one image with detr?
run torch.onnx.export
on the demo model provided here and on a model from torchhub. The demo model is successfully exported while other models fail.
#works
torch.onnx.export(detr_demo, sample_input, 'detr_demo.onnx', opset_version = 10)
#does not work
detr = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
detr.eval()
torch.onnx.export(detr, sample_input, 'detr.onnx', opset_version = 10)
see full code here
The error log is as follows:
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:59: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:60: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
/usr/local/lib/python3.6/dist-packages/torch/tensor.py:467: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
'incorrect results).', category=RuntimeWarning)
/root/.cache/torch/hub/facebookresearch_detr_master/util/misc.py:294: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
batch_shape = (len(tensor_list),) + max_size
/root/.cache/torch/hub/facebookresearch_detr_master/util/misc.py:301: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
/root/.cache/torch/hub/facebookresearch_detr_master/util/misc.py:302: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
m[: img.shape[1], :img.shape[2]] = False
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
<ipython-input-19-968e97398387> in <module>()
11
12 torch.onnx.export(detr_demo, sample_input, 'detr_demo.onnx', opset_version = 10)
---> 13 torch.onnx.export(detr, sample_input, 'detr.onnx', opset_version = 10)
8 frames
/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
166 do_constant_folding, example_outputs,
167 strip_doc_string, dynamic_axes, keep_initializers_as_inputs,
--> 168 custom_opsets, enable_onnx_checker, use_external_data_format)
169
170
/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
67 dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs,
68 custom_opsets=custom_opsets, enable_onnx_checker=enable_onnx_checker,
---> 69 use_external_data_format=use_external_data_format)
70
71
/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, propagate, opset_version, _retain_param_name, do_constant_folding, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, enable_onnx_checker, use_external_data_format)
486 example_outputs, propagate,
487 _retain_param_name, val_do_constant_folding,
--> 488 fixed_batch_size=fixed_batch_size)
489
490 # TODO: Don't allocate a in-memory string for the protobuf
/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in _model_to_graph(model, args, verbose, training, input_names, output_names, operator_export_type, example_outputs, propagate, _retain_param_name, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size)
349 graph = _optimize_graph(graph, operator_export_type,
350 _disable_torch_constant_prop=_disable_torch_constant_prop,
--> 351 fixed_batch_size=fixed_batch_size, params_dict=params_dict)
352
353 if isinstance(model, torch.jit.ScriptModule) or isinstance(model, torch.jit.ScriptFunction):
/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict)
152 torch._C._jit_pass_erase_number_types(graph)
153
--> 154 graph = torch._C._jit_pass_onnx(graph, operator_export_type)
155 torch._C._jit_pass_lint(graph)
156
/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py in _run_symbolic_function(*args, **kwargs)
197 def _run_symbolic_function(*args, **kwargs):
198 from torch.onnx import utils
--> 199 return utils._run_symbolic_function(*args, **kwargs)
200
201
/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in _run_symbolic_function(g, n, inputs, env, operator_export_type)
738 .format(op_name, opset_version, op_name))
739 op_fn = sym_registry.get_registered_op(op_name, '', opset_version)
--> 740 return op_fn(g, *inputs, **attrs)
741
742 elif ns == "prim":
/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_helper.py in wrapper(g, *args)
127 assert len(arg_descriptors) >= len(args)
128 args = [_parse_arg(arg, arg_desc) for arg, arg_desc in zip(args, arg_descriptors)]
--> 129 return fn(g, *args)
130 # In Python 2 functools.wraps chokes on partially applied functions, so we need this as a workaround
131 try:
/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_opset9.py in ones(g, sizes, dtype, layout, device, pin_memory)
1409 dtype = 6 # float
1410 return g.op("ConstantOfShape", sizes,
-> 1411 value_t=torch.tensor([1], dtype=sym_help.scalar_type_to_pytorch_type[dtype]))
1412
1413
IndexError: list index out of range
It should be possible to export a model from torchhub similar to the demo model.
Google colab
Hi.
In original paper, it mentioned in Sec. 4 that
To optimize for AP, we override the prediction of these slots with the second highest scoring class, using the corresponding con dence. This improves AP by 2 points compared to filtering out empty slots.
But I didn't see any corresponding code in this repo. Did I miss something or it is not implement here?
Thank you.
@alexholdenmiller @leaderj1001 @alcinos @snyxan
Thank you for your hard work,
Seeing the transformers learn to understand instances was truly amazing, your work is amazing.
Further research into optimization is vital in-order to make training and inferencing feasible for the
average person.
Is there a plan for optimizing detr, pruning, distilling, searching for better students, etc... ?
https://github.com/mit-han-lab/hardware-aware-transformers
https://github.com/mit-han-lab/gan-compression
http://news.mit.edu/2020/foolproof-way-shrink-deep-learning-models-0430
First of all, excellent work!
I know that probably the integration with detectron2 is not automatic since there are differences in the training architecture compared with the default detectron2 procedure. But there are any plans to integrate DETR in detectron2?
Thank you!
Describe what you want to do, including:
I'm trying to make use of the plot_logs function in /util/plot_utils.py
In Jupyter, I'm passing a Pathlib to the dir with my log.txt...but that immediately generates a TypeError: 'PosixPath' object is not iterable...which makes sense, I'm just passing in the dir of the single log.txt, so nothing to iterate.
I changed the code to not iterate and just read the single file into a df and ultimately got the graphs to print...but clearly I'm not calling it correctly?
Is there a preferred way to call/plot a single log file without removing all the list comprehensions?
NOTE:
Only general answers are provided.
If you want to ask about "why X did not work", please use the
Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to DETR, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.
Hi, so I've tried training with a personal dataset and COCO2017 for a sanity check. My class_error stays at 100.00 for most of the training, with very few 75 - 100 errors. I average around 99 for my class errors after a couple of epochs for both training and validation scores (both my dataset and COCO2017). Wanted to know if anyone has experienced similar issues?
To add, I only changed num_queries flag for my personal dataset. COCO2017 kept its original arguments. My loss does seem to drop, however. Any direction would be greatly appreciated!
Can you share the code to run 16 GPUS over 2 nodes using slurm?
I have a quick question: is the background class id 0 or 91? (DETR used 91 COCO categories to train)
It seems the targets
object returned by dataloader uses 1-91 for all the object categories, but the loss_labels
function used 91 instead of 0 for background. I am not sure if I missed something.
Thanks.
How can I leverage this architecture for image calssifcation tasks? I tried using the example in colab notebook but had trouble with batch sizes. The example is intended for batch_size = 1 but gives errors when using a larger batch size. How can I overcome this?
Great paper and repo btw, congrats!
I wanted to do a simple single image inference with the panoptic segmentation model. I archived that in a very 'hacky' way by editing the main.sh
file (Can be seen here).
Are you planing to release a demo notebook like the one for object detection or uploading the .pth files to torch.Hub?
Can someone please explain me how you calculated the positional encoding?
I know what positional encoding is, but models.positional_encoding.py is but overwhelming. I want to know what are considered as positional encoding while working with images. Are these calculated for feature maps or somewhat else?
How do you calculate masks when using images in transformers?
I know what masks are, but how do we calculate these when dealing with images?
I found no answers to these questions anywhere so posting it here.
Have you experimented this with landmarks/joints regression other then bounding boxes?
Some of the methods mentioned in the paper like "Object as points" was applied also on these joint tasks.
I am trying to adopt this repository to OCR task and facing same dilemma
While training you have 3 different sizes of image encoded in dataset
So if you try to print boxes of a dataset of an element for batch size bigger than 1 (I check it with 5)
You will get behaviour there for the same picture due to random batch sampling will have different box coordinates
Look below. This function will print you boxes on image correctly but only in case of batch_size=1 or if all pictures in your dataset are the same size or in case if you use for W, H scaling from target["size"] which is wrong.
# img = (3, H, W) tensor from batch with the samme H and W
# target - labels for this particular image
def showImageFromBatch(img, target):
from PIL import Image, ImageDraw, ImageFont
draw = ImageDraw.Draw(img)
boxes = target['boxes']
cl = target['labels']
if 1:#boxes.max() <= 1:
boxes = box_cxcywh_to_xyxy(boxes)
print('Image:', (img.height, img.width), target['size'], target['orig_size'])
H, W = target['size'] <<< Works well only with that
W, H = img.width, img.height <<< But must works with this!!!
boxes[:, 0::2] *= W
boxes[:, 1::2] *= H
for i in range(len(boxes)):
x1, y1, x2, y2 = boxes[i]
draw.rectangle((x1, y1, x2, y2), outline=(0, 255, 0) if cl[i] >= 0 else (0, 0, 0), width=3)
draw.text((x1, y1), str(cl[i].item()), (0, 255, 0) if cl[i] >= 0 else (0, 0, 0),
font=ImageFont.truetype("DejaVuSansMono.ttf", 20))
img.show()
Please clarify this situation. Thank you in advance.
I'm trying to run the example as-is, and i'm running into this issue. I did have to adjust the number of gpus because the VM I'm working on only has 1. I'm also working on a Windows 10 machine with pytorch version 1.5.0, CUDA version 10.1, and CUDA compiler driver v10.0.130.
| distributed init (rank 0): env://
Traceback (most recent call last):
File "main.py", line 248, in <module>
main(args)
File "main.py", line 106, in main
utils.init_distributed_mode(args)
File "C:\Users\-user-\Documents\Projects\detr\util\misc.py", line 374, in init_distributed_mode
torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
AttributeError: module 'torch.distributed' has no attribute 'init_process_group'
Traceback (most recent call last):
File "C:\Anaconda\envs\detr\lib\runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Anaconda\envs\detr\lib\runpy.py", line 87, in _run_code
exec(code, run_globals)
File "C:\Anaconda\envs\detr\lib\site-packages\torch\distributed\launch.py", line 263, in <module>
main()
File "C:\Anaconda\envs\detr\lib\site-packages\torch\distributed\launch.py", line 258, in main
raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['C:\\Anaconda\\envs\\detr\\python.exe', '-u', 'main.py', '--coco_path', 'F:/coco-data']' returned non-zero exit status 1.```
Have you experimented any techniques to learning losses coeffs (from Multi-task learning literature) hard-coded at https://github.com/facebookresearch/detr/blob/master/main.py#L73?
Edit:
E.g. Just to make a recent example https://arxiv.org/abs/2001.02223
Hi!
The colab demo doesn't seem to be working with images with wide aspect ratio (e.g. 16:9). The resulting bounding boxes are shifted to the right a bit and sometimes the inference crashes with a RuntimeError
. Please see this colab notebook.
The bounding boxes look good after I change T.Resize(800)
to something explicit, like T.Resize((800,600))
. But I'm not sure if that's the correct way of addressing that (see the giraffe detections in my notebook). What would be a correct way of dealing with different aspect ratios?
The only things I changed are the urls of the input images and the transformation pipeline (in the second case) ๐
As in your code, tgt
of the decoderlayer was firstly assigned with zeros, and use these zeros as v
to calculate a new ouput with qkv attention operation, take the pre-norm forward part for example:
def forward_pre(self, tgt, memory,
tgt_mask: Optional[Tensor] = None,
memory_mask: Optional[Tensor] = None,
tgt_key_padding_mask: Optional[Tensor] = None,
memory_key_padding_mask: Optional[Tensor] = None,
pos: Optional[Tensor] = None,
query_pos: Optional[Tensor] = None):
tgt2 = self.norm1(tgt)
q = k = self.with_pos_embed(tgt2, query_pos)
tgt2 = self.self_attn(q, k, value=tgt2, attn_mask=tgt_mask,
key_padding_mask=tgt_key_padding_mask)[0]
tgt = tgt + self.dropout1(tgt2)
tgt2 = self.norm2(tgt)
tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt2, query_pos),
key=self.with_pos_embed(memory, pos),
value=memory, attn_mask=memory_mask,
key_padding_mask=memory_key_padding_mask)[0]
tgt = tgt + self.dropout2(tgt2)
tgt2 = self.norm3(tgt)
tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))
tgt = tgt + self.dropout3(tgt2)
return tgt
I mean, if it was the first decoderlayer, tgt
was token-wisely zero, then tgt2
will be token-wisely same after the first layernorm, how will that make any sense to get weighed output from this tgt2
? No matter what the q
and k
is, nothing but a featureless bias will be learned I think.
Hi,
Thank you for your great job.
I have a question with training own dataset. The result for eval is always zeros like that:
**(base) [detr]$ python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --lr 1e-3 --batch_size 4 --epochs 10 --coco_path datasets/shape/coco**
| distributed init (rank 0): env://
git:
sha: 0af41930d1b6c2244e33bbef76dff6c537dd53c0, status: clean, branch: master
**Namespace(aux_loss=True, backbone='resnet50', batch_size=4, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='datasets/shape/coco', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=10, eval=False, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, lr=0.001, lr_backbone=1e-05, lr_drop=200, mask_loss_coef=1, masks=False, nheads=8, num_queries=100, num_workers=2, output_dir='', position_embedding='sine', pre_norm=False, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=1)**
number of params: 41302368
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Start training
Epoch: [0] [ 0/225] eta: 0:02:30 lr: 0.001000 class_error: 100.00 loss: 75.9316 (75.9316) loss_ce: 4.8402 (4.8402) loss_bbox: 5.6168 (5.6168) loss_giou: 2.2340 (2.2340) loss_ce_0: 4.4001 (4.4001) loss_bbox_0: 5.4950 (5.4950) loss_giou_0: 2.2311 (2.2311) loss_ce_1: 4.8179 (4.8179) loss_bbox_1: 5.6163 (5.6163) loss_giou_1: 2.2393 (2.2393) loss_ce_2: 4.7843 (4.7843) loss_bbox_2: 5.6247 (5.6247) loss_giou_2: 2.2343 (2.2343) loss_ce_3: 4.9645 (4.9645) loss_bbox_3: 5.6222 (5.6222) loss_giou_3: 2.2467 (2.2467) loss_ce_4: 4.9737 (4.9737) loss_bbox_4: 5.7800 (5.7800) loss_giou_4: 2.2105 (2.2105) loss_ce_unscaled: 4.8402 (4.8402) class_error_unscaled: 100.0000 (100.0000) loss_bbox_unscaled: 1.1234 (1.1234) loss_giou_unscaled: 1.1170 (1.1170) cardinality_error_unscaled: 96.7500 (96.7500) loss_ce_0_unscaled: 4.4001 (4.4001) loss_bbox_0_unscaled: 1.0990 (1.0990) loss_giou_0_unscaled: 1.1155 (1.1155) cardinality_error_0_unscaled: 96.7500 (96.7500) loss_ce_1_unscaled: 4.8179 (4.8179) loss_bbox_1_unscaled: 1.1233 (1.1233) loss_giou_1_unscaled: 1.1197 (1.1197) cardinality_error_1_unscaled: 96.7500 (96.7500) loss_ce_2_unscaled: 4.7843 (4.7843) loss_bbox_2_unscaled: 1.1249 (1.1249) loss_giou_2_unscaled: 1.1172 (1.1172) cardinality_error_2_unscaled: 96.7500 (96.7500) loss_ce_3_unscaled: 4.9645 (4.9645) loss_bbox_3_unscaled: 1.1244 (1.1244) loss_giou_3_unscaled: 1.1234 (1.1234) cardinality_error_3_unscaled: 96.7500 (96.7500) loss_ce_4_unscaled: 4.9737 (4.9737) loss_bbox_4_unscaled: 1.1560 (1.1560) loss_giou_4_unscaled: 1.1053 (1.1053) cardinality_error_4_unscaled: 96.7500 (96.7500) time: 0.6674 data: 0.2966 max mem: 2899
Epoch: [0] [ 10/225] eta: 0:01:16 lr: 0.001000 class_error: 100.00 loss: 40.5536 (44.2869) loss_ce: 0.8244 (1.3583) loss_bbox: 3.0293 (3.3114) loss_giou: 2.7750 (2.7253) loss_ce_0: 0.8599 (1.2429) loss_bbox_0: 3.0818 (3.3239) loss_giou_0: 2.7982 (2.7456) loss_ce_1: 0.8457 (1.3100) loss_bbox_1: 3.1305 (3.3230) loss_giou_1: 2.7961 (2.7431) loss_ce_2: 0.8787 (1.3171) loss_bbox_2: 3.0785 (3.3198) loss_giou_2: 2.8003 (2.7389) loss_ce_3: 0.8455 (1.3657) loss_bbox_3: 3.0552 (3.3092) loss_giou_3: 2.7829 (2.7311) loss_ce_4: 0.8526 (1.3903) loss_bbox_4: 3.0473 (3.3076) loss_giou_4: 2.7943 (2.7239) loss_ce_unscaled: 0.8244 (1.3583) class_error_unscaled: 100.0000 (93.9394) loss_bbox_unscaled: 0.6059 (0.6623) loss_giou_unscaled: 1.3875 (1.3626) cardinality_error_unscaled: 3.5000 (19.9773) loss_ce_0_unscaled: 0.8599 (1.2429) loss_bbox_0_unscaled: 0.6164 (0.6648) loss_giou_0_unscaled: 1.3991 (1.3728) cardinality_error_0_unscaled: 3.0000 (11.3636) loss_ce_1_unscaled: 0.8457 (1.3100) loss_bbox_1_unscaled: 0.6261 (0.6646) loss_giou_1_unscaled: 1.3981 (1.3716) cardinality_error_1_unscaled: 3.5000 (12.5909) loss_ce_2_unscaled: 0.8787 (1.3171) loss_bbox_2_unscaled: 0.6157 (0.6640) loss_giou_2_unscaled: 1.4001 (1.3694) cardinality_error_2_unscaled: 3.5000 (19.2500) loss_ce_3_unscaled: 0.8455 (1.3657) loss_bbox_3_unscaled: 0.6110 (0.6618) loss_giou_3_unscaled: 1.3915 (1.3656) cardinality_error_3_unscaled: 3.5000 (19.9773) loss_ce_4_unscaled: 0.8526 (1.3903) loss_bbox_4_unscaled: 0.6095 (0.6615) loss_giou_4_unscaled: 1.3971 (1.3619) cardinality_error_4_unscaled: 3.5000 (19.9773) time: 0.3566 data: 0.0417 max mem: 4100
Epoch: [0] [ 20/225] eta: 0:01:11 lr: 0.001000 class_error: 100.00 loss: 38.5593 (40.8082) loss_ce: 0.7449 (1.0389) loss_bbox: 2.5801 (2.9656) loss_giou: 2.8444 (2.8061) loss_ce_0: 0.7395 (0.9770) loss_bbox_0: 2.5890 (2.9779) loss_giou_0: 2.8093 (2.8087) loss_ce_1: 0.7514 (1.0102) loss_bbox_1: 2.5790 (2.9720) loss_giou_1: 2.8446 (2.8075) loss_ce_2: 0.7420 (1.0158) loss_bbox_2: 2.5879 (2.9603) loss_giou_2: 2.8726 (2.8131) loss_ce_3: 0.7451 (1.0461) loss_bbox_3: 2.5694 (2.9725) loss_giou_3: 2.8119 (2.8037) loss_ce_4: 0.7562 (1.0560) loss_bbox_4: 2.5707 (2.9682) loss_giou_4: 2.8531 (2.8084) loss_ce_unscaled: 0.7449 (1.0389) class_error_unscaled: 100.0000 (96.8254) loss_bbox_unscaled: 0.5160 (0.5931) loss_giou_unscaled: 1.4222 (1.4031) cardinality_error_unscaled: 2.7500 (11.7857) loss_ce_0_unscaled: 0.7395 (0.9770) loss_bbox_0_unscaled: 0.5178 (0.5956) loss_giou_0_unscaled: 1.4046 (1.4043) cardinality_error_0_unscaled: 2.7500 (7.2738) loss_ce_1_unscaled: 0.7514 (1.0102) loss_bbox_1_unscaled: 0.5158 (0.5944) loss_giou_1_unscaled: 1.4223 (1.4037) cardinality_error_1_unscaled: 2.7500 (7.9167) loss_ce_2_unscaled: 0.7420 (1.0158) loss_bbox_2_unscaled: 0.5176 (0.5921) loss_giou_2_unscaled: 1.4363 (1.4066) cardinality_error_2_unscaled: 2.7500 (11.4048) loss_ce_3_unscaled: 0.7451 (1.0461) loss_bbox_3_unscaled: 0.5139 (0.5945) loss_giou_3_unscaled: 1.4059 (1.4019) cardinality_error_3_unscaled: 2.7500 (11.7857) loss_ce_4_unscaled: 0.7562 (1.0560) loss_bbox_4_unscaled: 0.5141 (0.5936) loss_giou_4_unscaled: 1.4265 (1.4042) cardinality_error_4_unscaled: 2.7500 (11.7857) time: 0.3333 data: 0.0155 max mem: 4763
Epoch: [0] [ 30/225] eta: 0:01:06 lr: 0.001000 class_error: 100.00 loss: 36.1775 (39.1106) loss_ce: 0.6629 (0.9151) loss_bbox: 2.5098 (2.8222) loss_giou: 2.8313 (2.7752) loss_ce_0: 0.6502 (0.8715) loss_bbox_0: 2.5458 (2.8372) loss_giou_0: 2.8088 (2.7885) loss_ce_1: 0.6497 (0.8934) loss_bbox_1: 2.5484 (2.8363) loss_giou_1: 2.7917 (2.7771) loss_ce_2: 0.6553 (0.9005) loss_bbox_2: 2.4740 (2.8460) loss_giou_2: 2.8586 (2.7895) loss_ce_3: 0.6577 (0.9152) loss_bbox_3: 2.5694 (2.8280) loss_giou_3: 2.8119 (2.7908) loss_ce_4: 0.6433 (0.9227) loss_bbox_4: 2.5183 (2.8121) loss_giou_4: 2.8352 (2.7893) loss_ce_unscaled: 0.6629 (0.9151) class_error_unscaled: 100.0000 (97.8495) loss_bbox_unscaled: 0.5020 (0.5644) loss_giou_unscaled: 1.4156 (1.3876) cardinality_error_unscaled: 2.7500 (8.8468) loss_ce_0_unscaled: 0.6502 (0.8715) loss_bbox_0_unscaled: 0.5092 (0.5674) loss_giou_0_unscaled: 1.4044 (1.3943) cardinality_error_0_unscaled: 2.7500 (5.7903) loss_ce_1_unscaled: 0.6497 (0.8934) loss_bbox_1_unscaled: 0.5097 (0.5673) loss_giou_1_unscaled: 1.3958 (1.3885) cardinality_error_1_unscaled: 2.7500 (6.2258) loss_ce_2_unscaled: 0.6553 (0.9005) loss_bbox_2_unscaled: 0.4948 (0.5692) loss_giou_2_unscaled: 1.4293 (1.3948) cardinality_error_2_unscaled: 2.7500 (8.5887) loss_ce_3_unscaled: 0.6577 (0.9152) loss_bbox_3_unscaled: 0.5139 (0.5656) loss_giou_3_unscaled: 1.4059 (1.3954) cardinality_error_3_unscaled: 2.7500 (8.8468) loss_ce_4_unscaled: 0.6433 (0.9227) loss_bbox_4_unscaled: 0.5037 (0.5624) loss_giou_4_unscaled: 1.4176 (1.3947) cardinality_error_4_unscaled: 2.7500 (8.8468) time: 0.3356 data: 0.0145 max mem: 5477
Epoch: [0] [ 40/225] eta: 0:01:03 lr: 0.001000 class_error: 100.00 loss: 32.6019 (36.9908) loss_ce: 0.6956 (0.8742) loss_bbox: 2.1999 (2.5978) loss_giou: 2.5164 (2.6636) loss_ce_0: 0.6839 (0.8402) loss_bbox_0: 2.2153 (2.6161) loss_giou_0: 2.4194 (2.6766) loss_ce_1: 0.7205 (0.8589) loss_bbox_1: 2.1482 (2.6176) loss_giou_1: 2.4094 (2.6681) loss_ce_2: 0.6871 (0.8615) loss_bbox_2: 2.3961 (2.6406) loss_giou_2: 2.5275 (2.6903) loss_ce_3: 0.6944 (0.8725) loss_bbox_3: 2.1964 (2.6045) loss_giou_3: 2.5396 (2.6689) loss_ce_4: 0.6961 (0.8798) loss_bbox_4: 2.2572 (2.6533) loss_giou_4: 2.6218 (2.7062) loss_ce_unscaled: 0.6956 (0.8742) class_error_unscaled: 100.0000 (98.3740) loss_bbox_unscaled: 0.4400 (0.5196) loss_giou_unscaled: 1.2582 (1.3318) cardinality_error_unscaled: 2.7500 (7.5122) loss_ce_0_unscaled: 0.6839 (0.8402) loss_bbox_0_unscaled: 0.4431 (0.5232) loss_giou_0_unscaled: 1.2097 (1.3383) cardinality_error_0_unscaled: 2.7500 (5.2012) loss_ce_1_unscaled: 0.7205 (0.8589) loss_bbox_1_unscaled: 0.4296 (0.5235) loss_giou_1_unscaled: 1.2047 (1.3341) cardinality_error_1_unscaled: 2.7500 (5.5305) loss_ce_2_unscaled: 0.6871 (0.8615) loss_bbox_2_unscaled: 0.4792 (0.5281) loss_giou_2_unscaled: 1.2638 (1.3451) cardinality_error_2_unscaled: 2.7500 (7.3171) loss_ce_3_unscaled: 0.6944 (0.8725) loss_bbox_3_unscaled: 0.4393 (0.5209) loss_giou_3_unscaled: 1.2698 (1.3345) cardinality_error_3_unscaled: 2.7500 (7.5122) loss_ce_4_unscaled: 0.6961 (0.8798) loss_bbox_4_unscaled: 0.4514 (0.5307) loss_giou_4_unscaled: 1.3109 (1.3531) cardinality_error_4_unscaled: 2.7500 (7.5122) time: 0.3336 data: 0.0145 max mem: 5477
Epoch: [0] [ 50/225] eta: 0:00:59 lr: 0.001000 class_error: 100.00 loss: 27.3266 (34.7739) loss_ce: 0.7699 (0.8480) loss_bbox: 1.7696 (2.3824) loss_giou: 2.1386 (2.5234) loss_ce_0: 0.7753 (0.8237) loss_bbox_0: 1.7433 (2.4192) loss_giou_0: 2.1408 (2.5614) loss_ce_1: 0.7667 (0.8363) loss_bbox_1: 1.7529 (2.4163) loss_giou_1: 2.1323 (2.5346) loss_ce_2: 0.7698 (0.8400) loss_bbox_2: 1.7657 (2.4392) loss_giou_2: 2.2232 (2.5671) loss_ce_3: 0.7478 (0.8485) loss_bbox_3: 1.6155 (2.3823) loss_giou_3: 2.0623 (2.5389) loss_ce_4: 0.7658 (0.8536) loss_bbox_4: 1.6977 (2.4118) loss_giou_4: 2.0993 (2.5472) loss_ce_unscaled: 0.7699 (0.8480) class_error_unscaled: 100.0000 (98.6928) loss_bbox_unscaled: 0.3539 (0.4765) loss_giou_unscaled: 1.0693 (1.2617) cardinality_error_unscaled: 3.5000 (6.6765) loss_ce_0_unscaled: 0.7753 (0.8237) loss_bbox_0_unscaled: 0.3487 (0.4838) loss_giou_0_unscaled: 1.0704 (1.2807) cardinality_error_0_unscaled: 3.5000 (4.8186) loss_ce_1_unscaled: 0.7667 (0.8363) loss_bbox_1_unscaled: 0.3506 (0.4833) loss_giou_1_unscaled: 1.0662 (1.2673) cardinality_error_1_unscaled: 3.5000 (5.0833) loss_ce_2_unscaled: 0.7698 (0.8400) loss_bbox_2_unscaled: 0.3531 (0.4878) loss_giou_2_unscaled: 1.1116 (1.2836) cardinality_error_2_unscaled: 3.5000 (6.5196) loss_ce_3_unscaled: 0.7478 (0.8485) loss_bbox_3_unscaled: 0.3231 (0.4765) loss_giou_3_unscaled: 1.0311 (1.2694) cardinality_error_3_unscaled: 3.5000 (6.6765) loss_ce_4_unscaled: 0.7658 (0.8536) loss_bbox_4_unscaled: 0.3395 (0.4824) loss_giou_4_unscaled: 1.0496 (1.2736) cardinality_error_4_unscaled: 3.5000 (6.6765) time: 0.3331 data: 0.0146 max mem: 5477
Epoch: [0] [ 60/225] eta: 0:00:55 lr: 0.001000 class_error: 100.00 loss: 23.1808 (32.6448) loss_ce: 0.6915 (0.8191) loss_bbox: 1.2613 (2.2106) loss_giou: 1.8641 (2.3880) loss_ce_0: 0.7082 (0.7989) loss_bbox_0: 1.3390 (2.2320) loss_giou_0: 1.7792 (2.4135) loss_ce_1: 0.7016 (0.8092) loss_bbox_1: 1.2421 (2.2059) loss_giou_1: 1.7174 (2.3772) loss_ce_2: 0.6996 (0.8123) loss_bbox_2: 1.5305 (2.2715) loss_giou_2: 1.8414 (2.4372) loss_ce_3: 0.7185 (0.8226) loss_bbox_3: 1.3289 (2.1920) loss_giou_3: 1.7667 (2.3846) loss_ce_4: 0.6862 (0.8221) loss_bbox_4: 1.3238 (2.2314) loss_giou_4: 1.8016 (2.4168) loss_ce_unscaled: 0.6915 (0.8191) class_error_unscaled: 100.0000 (98.9071) loss_bbox_unscaled: 0.2523 (0.4421) loss_giou_unscaled: 0.9320 (1.1940) cardinality_error_unscaled: 3.0000 (6.0123) loss_ce_0_unscaled: 0.7082 (0.7989) loss_bbox_0_unscaled: 0.2678 (0.4464) loss_giou_0_unscaled: 0.8896 (1.2068) cardinality_error_0_unscaled: 3.0000 (4.4590) loss_ce_1_unscaled: 0.7016 (0.8092) loss_bbox_1_unscaled: 0.2484 (0.4412) loss_giou_1_unscaled: 0.8587 (1.1886) cardinality_error_1_unscaled: 3.0000 (4.6803) loss_ce_2_unscaled: 0.6996 (0.8123) loss_bbox_2_unscaled: 0.3061 (0.4543) loss_giou_2_unscaled: 0.9207 (1.2186) cardinality_error_2_unscaled: 3.0000 (5.8811) loss_ce_3_unscaled: 0.7185 (0.8226) loss_bbox_3_unscaled: 0.2658 (0.4384) loss_giou_3_unscaled: 0.8833 (1.1923) cardinality_error_3_unscaled: 3.0000 (6.0123) loss_ce_4_unscaled: 0.6862 (0.8221) loss_bbox_4_unscaled: 0.2648 (0.4463) loss_giou_4_unscaled: 0.9008 (1.2084) cardinality_error_4_unscaled: 3.0000 (6.0123) time: 0.3338 data: 0.0146 max mem: 5477
Epoch: [0] [ 70/225] eta: 0:00:52 lr: 0.001000 class_error: 100.00 loss: 21.3303 (30.9936) loss_ce: 0.6637 (0.8001) loss_bbox: 1.1000 (2.0557) loss_giou: 1.5853 (2.2778) loss_ce_0: 0.6661 (0.7836) loss_bbox_0: 1.1435 (2.0696) loss_giou_0: 1.6672 (2.3055) loss_ce_1: 0.6682 (0.7913) loss_bbox_1: 1.0864 (2.0510) loss_giou_1: 1.5471 (2.2684) loss_ce_2: 0.6681 (0.7951) loss_bbox_2: 1.1686 (2.0983) loss_giou_2: 1.5572 (2.3080) loss_ce_3: 0.6903 (0.8043) loss_bbox_3: 1.1644 (2.0551) loss_giou_3: 1.6332 (2.2966) loss_ce_4: 0.6541 (0.8009) loss_bbox_4: 1.2588 (2.1030) loss_giou_4: 1.7398 (2.3293) loss_ce_unscaled: 0.6637 (0.8001) class_error_unscaled: 100.0000 (99.0610) loss_bbox_unscaled: 0.2200 (0.4111) loss_giou_unscaled: 0.7926 (1.1389) cardinality_error_unscaled: 2.2500 (5.5563) loss_ce_0_unscaled: 0.6661 (0.7836) loss_bbox_0_unscaled: 0.2287 (0.4139) loss_giou_0_unscaled: 0.8336 (1.1528) cardinality_error_0_unscaled: 2.2500 (4.2218) loss_ce_1_unscaled: 0.6682 (0.7913) loss_bbox_1_unscaled: 0.2173 (0.4102) loss_giou_1_unscaled: 0.7736 (1.1342) cardinality_error_1_unscaled: 2.2500 (4.4120) loss_ce_2_unscaled: 0.6681 (0.7951) loss_bbox_2_unscaled: 0.2337 (0.4197) loss_giou_2_unscaled: 0.7786 (1.1540) cardinality_error_2_unscaled: 2.2500 (5.4401) loss_ce_3_unscaled: 0.6903 (0.8043) loss_bbox_3_unscaled: 0.2329 (0.4110) loss_giou_3_unscaled: 0.8166 (1.1483) cardinality_error_3_unscaled: 2.2500 (5.5563) loss_ce_4_unscaled: 0.6541 (0.8009) loss_bbox_4_unscaled: 0.2518 (0.4206) loss_giou_4_unscaled: 0.8699 (1.1647) cardinality_error_4_unscaled: 2.2500 (5.5563) time: 0.3378 data: 0.0146 max mem: 5477
Epoch: [0] [ 80/225] eta: 0:00:48 lr: 0.001000 class_error: 100.00 loss: 20.5233 (29.6911) loss_ce: 0.6906 (0.7851) loss_bbox: 1.0481 (1.9269) loss_giou: 1.5147 (2.1781) loss_ce_0: 0.7025 (0.7712) loss_bbox_0: 1.0714 (1.9364) loss_giou_0: 1.4821 (2.1900) loss_ce_1: 0.6821 (0.7767) loss_bbox_1: 1.1409 (1.9650) loss_giou_1: 1.6625 (2.2206) loss_ce_2: 0.6765 (0.7826) loss_bbox_2: 1.0265 (1.9785) loss_giou_2: 1.4500 (2.2144) loss_ce_3: 0.6955 (0.7901) loss_bbox_3: 1.0413 (1.9257) loss_giou_3: 1.5327 (2.1867) loss_ce_4: 0.6909 (0.7869) loss_bbox_4: 1.2588 (2.0151) loss_giou_4: 1.7385 (2.2610) loss_ce_unscaled: 0.6906 (0.7851) class_error_unscaled: 100.0000 (99.1770) loss_bbox_unscaled: 0.2096 (0.3854) loss_giou_unscaled: 0.7574 (1.0890) cardinality_error_unscaled: 3.0000 (5.2191) loss_ce_0_unscaled: 0.7025 (0.7712) loss_bbox_0_unscaled: 0.2143 (0.3873) loss_giou_0_unscaled: 0.7411 (1.0950) cardinality_error_0_unscaled: 3.0000 (4.0494) loss_ce_1_unscaled: 0.6821 (0.7767) loss_bbox_1_unscaled: 0.2282 (0.3930) loss_giou_1_unscaled: 0.8313 (1.1103) cardinality_error_1_unscaled: 3.0000 (4.2160) loss_ce_2_unscaled: 0.6765 (0.7826) loss_bbox_2_unscaled: 0.2053 (0.3957) loss_giou_2_unscaled: 0.7250 (1.1072) cardinality_error_2_unscaled: 3.0000 (5.1173) loss_ce_3_unscaled: 0.6955 (0.7901) loss_bbox_3_unscaled: 0.2083 (0.3851) loss_giou_3_unscaled: 0.7663 (1.0933) cardinality_error_3_unscaled: 3.0000 (5.2191) loss_ce_4_unscaled: 0.6909 (0.7869) loss_bbox_4_unscaled: 0.2518 (0.4030) loss_giou_4_unscaled: 0.8693 (1.1305) cardinality_error_4_unscaled: 3.0000 (5.2191) time: 0.3316 data: 0.0145 max mem: 5477
Epoch: [0] [ 90/225] eta: 0:00:45 lr: 0.001000 class_error: 100.00 loss: 20.0966 (28.7047) loss_ce: 0.7220 (0.7781) loss_bbox: 0.9469 (1.8428) loss_giou: 1.4894 (2.1132) loss_ce_0: 0.7117 (0.7650) loss_bbox_0: 1.0125 (1.8465) loss_giou_0: 1.4151 (2.1172) loss_ce_1: 0.7015 (0.7705) loss_bbox_1: 1.2264 (1.8691) loss_giou_1: 1.6625 (2.1449) loss_ce_2: 0.7058 (0.7745) loss_bbox_2: 1.1000 (1.8894) loss_giou_2: 1.5367 (2.1538) loss_ce_3: 0.7138 (0.7822) loss_bbox_3: 0.9728 (1.8292) loss_giou_3: 1.3945 (2.1083) loss_ce_4: 0.7190 (0.7797) loss_bbox_4: 1.2304 (1.9368) loss_giou_4: 1.6800 (2.2033) loss_ce_unscaled: 0.7220 (0.7781) class_error_unscaled: 100.0000 (99.2674) loss_bbox_unscaled: 0.1894 (0.3686) loss_giou_unscaled: 0.7447 (1.0566) cardinality_error_unscaled: 3.0000 (4.9945) loss_ce_0_unscaled: 0.7117 (0.7650) loss_bbox_0_unscaled: 0.2025 (0.3693) loss_giou_0_unscaled: 0.7075 (1.0586) cardinality_error_0_unscaled: 3.0000 (3.9533) loss_ce_1_unscaled: 0.7015 (0.7705) loss_bbox_1_unscaled: 0.2453 (0.3738) loss_giou_1_unscaled: 0.8313 (1.0725) cardinality_error_1_unscaled: 3.0000 (4.1016) loss_ce_2_unscaled: 0.7058 (0.7745) loss_bbox_2_unscaled: 0.2200 (0.3779) loss_giou_2_unscaled: 0.7684 (1.0769) cardinality_error_2_unscaled: 3.0000 (4.9038) loss_ce_3_unscaled: 0.7138 (0.7822) loss_bbox_3_unscaled: 0.1946 (0.3658) loss_giou_3_unscaled: 0.6972 (1.0541) cardinality_error_3_unscaled: 3.0000 (4.9945) loss_ce_4_unscaled: 0.7190 (0.7797) loss_bbox_4_unscaled: 0.2461 (0.3874) loss_giou_4_unscaled: 0.8400 (1.1017) cardinality_error_4_unscaled: 3.0000 (4.9918) time: 0.3271 data: 0.0144 max mem: 5477
Epoch: [0] [100/225] eta: 0:00:42 lr: 0.001000 class_error: 100.00 loss: 20.1649 (27.9541) loss_ce: 0.7003 (0.7687) loss_bbox: 1.0745 (1.7630) loss_giou: 1.5262 (2.0532) loss_ce_0: 0.7128 (0.7564) loss_bbox_0: 1.1028 (1.7936) loss_giou_0: 1.5624 (2.0816) loss_ce_1: 0.7015 (0.7603) loss_bbox_1: 1.1178 (1.7996) loss_giou_1: 1.4671 (2.0949) loss_ce_2: 0.6817 (0.7625) loss_bbox_2: 1.2027 (1.8460) loss_giou_2: 1.6791 (2.1249) loss_ce_3: 0.7115 (0.7715) loss_bbox_3: 1.0009 (1.7556) loss_giou_3: 1.4495 (2.0527) loss_ce_4: 0.7190 (0.7703) loss_bbox_4: 1.1550 (1.8591) loss_giou_4: 1.5835 (2.1402) loss_ce_unscaled: 0.7003 (0.7687) class_error_unscaled: 100.0000 (99.3399) loss_bbox_unscaled: 0.2149 (0.3526) loss_giou_unscaled: 0.7631 (1.0266) cardinality_error_unscaled: 3.0000 (4.7797) loss_ce_0_unscaled: 0.7128 (0.7564) loss_bbox_0_unscaled: 0.2206 (0.3587) loss_giou_0_unscaled: 0.7812 (1.0408) cardinality_error_0_unscaled: 3.0000 (3.8416) loss_ce_1_unscaled: 0.7015 (0.7603) loss_bbox_1_unscaled: 0.2236 (0.3599) loss_giou_1_unscaled: 0.7335 (1.0474) cardinality_error_1_unscaled: 3.0000 (3.9752) loss_ce_2_unscaled: 0.6817 (0.7625) loss_bbox_2_unscaled: 0.2405 (0.3692) loss_giou_2_unscaled: 0.8396 (1.0625) cardinality_error_2_unscaled: 3.0000 (4.6980) loss_ce_3_unscaled: 0.7115 (0.7715) loss_bbox_3_unscaled: 0.2002 (0.3511) loss_giou_3_unscaled: 0.7248 (1.0264) cardinality_error_3_unscaled: 3.0000 (4.7797) loss_ce_4_unscaled: 0.7190 (0.7703) loss_bbox_4_unscaled: 0.2310 (0.3718) loss_giou_4_unscaled: 0.7918 (1.0701) cardinality_error_4_unscaled: 3.0000 (4.7772) time: 0.3336 data: 0.0146 max mem: 5637
Epoch: [0] [110/225] eta: 0:00:38 lr: 0.001000 class_error: 100.00 loss: 21.6342 (27.3816) loss_ce: 0.7064 (0.7642) loss_bbox: 0.9988 (1.6882) loss_giou: 1.4457 (1.9938) loss_ce_0: 0.7238 (0.7540) loss_bbox_0: 1.2679 (1.7486) loss_giou_0: 1.6559 (2.0487) loss_ce_1: 0.6962 (0.7567) loss_bbox_1: 1.1178 (1.7325) loss_giou_1: 1.5664 (2.0472) loss_ce_2: 0.7001 (0.7578) loss_bbox_2: 1.2249 (1.8120) loss_giou_2: 1.7109 (2.0992) loss_ce_3: 0.7018 (0.7668) loss_bbox_3: 1.1599 (1.7221) loss_giou_3: 1.5990 (2.0310) loss_ce_4: 0.7098 (0.7668) loss_bbox_4: 1.1236 (1.7990) loss_giou_4: 1.4905 (2.0932) loss_ce_unscaled: 0.7064 (0.7642) class_error_unscaled: 100.0000 (99.3994) loss_bbox_unscaled: 0.1998 (0.3376) loss_giou_unscaled: 0.7229 (0.9969) cardinality_error_unscaled: 3.0000 (4.6374) loss_ce_0_unscaled: 0.7238 (0.7540) loss_bbox_0_unscaled: 0.2536 (0.3497) loss_giou_0_unscaled: 0.8279 (1.0243) cardinality_error_0_unscaled: 3.0000 (3.7838) loss_ce_1_unscaled: 0.6962 (0.7567) loss_bbox_1_unscaled: 0.2236 (0.3465) loss_giou_1_unscaled: 0.7832 (1.0236) cardinality_error_1_unscaled: 3.0000 (3.9054) loss_ce_2_unscaled: 0.7001 (0.7578) loss_bbox_2_unscaled: 0.2450 (0.3624) loss_giou_2_unscaled: 0.8554 (1.0496) cardinality_error_2_unscaled: 3.0000 (4.5631) loss_ce_3_unscaled: 0.7018 (0.7668) loss_bbox_3_unscaled: 0.2320 (0.3444) loss_giou_3_unscaled: 0.7995 (1.0155) cardinality_error_3_unscaled: 3.0000 (4.6374) loss_ce_4_unscaled: 0.7098 (0.7668) loss_bbox_4_unscaled: 0.2247 (0.3598) loss_giou_4_unscaled: 0.7453 (1.0466) cardinality_error_4_unscaled: 3.0000 (4.6351) time: 0.3367 data: 0.0147 max mem: 5637
Epoch: [0] [120/225] eta: 0:00:35 lr: 0.001000 class_error: 100.00 loss: 21.1659 (26.8506) loss_ce: 0.7089 (0.7577) loss_bbox: 0.9988 (1.6532) loss_giou: 1.4848 (1.9689) loss_ce_0: 0.7133 (0.7496) loss_bbox_0: 1.0947 (1.7004) loss_giou_0: 1.5849 (2.0108) loss_ce_1: 0.7111 (0.7519) loss_bbox_1: 1.0310 (1.6767) loss_giou_1: 1.4821 (1.9990) loss_ce_2: 0.7007 (0.7512) loss_bbox_2: 1.2147 (1.7573) loss_giou_2: 1.5715 (2.0571) loss_ce_3: 0.7257 (0.7607) loss_bbox_3: 1.3461 (1.6945) loss_giou_3: 1.7193 (2.0082) loss_ce_4: 0.7094 (0.7603) loss_bbox_4: 1.0727 (1.7453) loss_giou_4: 1.4593 (2.0478) loss_ce_unscaled: 0.7089 (0.7577) class_error_unscaled: 100.0000 (99.4490) loss_bbox_unscaled: 0.1998 (0.3306) loss_giou_unscaled: 0.7424 (0.9845) cardinality_error_unscaled: 3.2500 (4.5124) loss_ce_0_unscaled: 0.7133 (0.7496) loss_bbox_0_unscaled: 0.2189 (0.3401) loss_giou_0_unscaled: 0.7924 (1.0054) cardinality_error_0_unscaled: 3.2500 (3.7273) loss_ce_1_unscaled: 0.7111 (0.7519) loss_bbox_1_unscaled: 0.2062 (0.3353) loss_giou_1_unscaled: 0.7411 (0.9995) cardinality_error_1_unscaled: 3.2500 (3.8409) loss_ce_2_unscaled: 0.7007 (0.7512) loss_bbox_2_unscaled: 0.2429 (0.3515) loss_giou_2_unscaled: 0.7857 (1.0286) cardinality_error_2_unscaled: 3.2500 (4.4442) loss_ce_3_unscaled: 0.7257 (0.7607) loss_bbox_3_unscaled: 0.2692 (0.3389) loss_giou_3_unscaled: 0.8596 (1.0041) cardinality_error_3_unscaled: 3.2500 (4.5124) loss_ce_4_unscaled: 0.7094 (0.7603) loss_bbox_4_unscaled: 0.2145 (0.3491) loss_giou_4_unscaled: 0.7297 (1.0239) cardinality_error_4_unscaled: 3.2500 (4.5103) time: 0.3324 data: 0.0146 max mem: 5637
Epoch: [0] [130/225] eta: 0:00:31 lr: 0.001000 class_error: 100.00 loss: 19.8474 (26.3022) loss_ce: 0.6661 (0.7493) loss_bbox: 1.1333 (1.6103) loss_giou: 1.5970 (1.9413) loss_ce_0: 0.6654 (0.7415) loss_bbox_0: 1.0947 (1.6567) loss_giou_0: 1.5989 (1.9873) loss_ce_1: 0.6688 (0.7446) loss_bbox_1: 1.0219 (1.6198) loss_giou_1: 1.4521 (1.9542) loss_ce_2: 0.6613 (0.7427) loss_bbox_2: 1.0290 (1.6975) loss_giou_2: 1.5071 (2.0150) loss_ce_3: 0.6708 (0.7522) loss_bbox_3: 1.0442 (1.6452) loss_giou_3: 1.5472 (1.9718) loss_ce_4: 0.6684 (0.7518) loss_bbox_4: 1.0742 (1.7012) loss_giou_4: 1.4883 (2.0199) loss_ce_unscaled: 0.6661 (0.7493) class_error_unscaled: 100.0000 (99.4911) loss_bbox_unscaled: 0.2267 (0.3221) loss_giou_unscaled: 0.7985 (0.9706) cardinality_error_unscaled: 2.7500 (4.3664) loss_ce_0_unscaled: 0.6654 (0.7415) loss_bbox_0_unscaled: 0.2189 (0.3313) loss_giou_0_unscaled: 0.7994 (0.9936) cardinality_error_0_unscaled: 2.7500 (3.6412) loss_ce_1_unscaled: 0.6688 (0.7446) loss_bbox_1_unscaled: 0.2044 (0.3240) loss_giou_1_unscaled: 0.7261 (0.9771) cardinality_error_1_unscaled: 2.7500 (3.7462) loss_ce_2_unscaled: 0.6613 (0.7427) loss_bbox_2_unscaled: 0.2058 (0.3395) loss_giou_2_unscaled: 0.7535 (1.0075) cardinality_error_2_unscaled: 2.7500 (4.3034) loss_ce_3_unscaled: 0.6708 (0.7522) loss_bbox_3_unscaled: 0.2088 (0.3290) loss_giou_3_unscaled: 0.7736 (0.9859) cardinality_error_3_unscaled: 2.7500 (4.3664) loss_ce_4_unscaled: 0.6684 (0.7518) loss_bbox_4_unscaled: 0.2148 (0.3402) loss_giou_4_unscaled: 0.7441 (1.0100) cardinality_error_4_unscaled: 2.7500 (4.3645) time: 0.3296 data: 0.0146 max mem: 5637
Epoch: [0] [140/225] eta: 0:00:28 lr: 0.001000 class_error: 100.00 loss: 19.8021 (25.8832) loss_ce: 0.6661 (0.7441) loss_bbox: 1.1773 (1.5948) loss_giou: 1.6169 (1.9355) loss_ce_0: 0.6654 (0.7385) loss_bbox_0: 1.0968 (1.6210) loss_giou_0: 1.6052 (1.9583) loss_ce_1: 0.6688 (0.7396) loss_bbox_1: 1.0471 (1.5973) loss_giou_1: 1.5270 (1.9388) loss_ce_2: 0.6613 (0.7373) loss_bbox_2: 1.0705 (1.6604) loss_giou_2: 1.5409 (1.9847) loss_ce_3: 0.6688 (0.7472) loss_bbox_3: 0.9242 (1.5927) loss_giou_3: 1.4197 (1.9273) loss_ce_4: 0.6690 (0.7476) loss_bbox_4: 0.9728 (1.6452) loss_giou_4: 1.4794 (1.9728) loss_ce_unscaled: 0.6661 (0.7441) class_error_unscaled: 100.0000 (99.5272) loss_bbox_unscaled: 0.2355 (0.3190) loss_giou_unscaled: 0.8085 (0.9678) cardinality_error_unscaled: 2.7500 (4.2766) loss_ce_0_unscaled: 0.6654 (0.7385) loss_bbox_0_unscaled: 0.2194 (0.3242) loss_giou_0_unscaled: 0.8026 (0.9792) cardinality_error_0_unscaled: 2.7500 (3.6028) loss_ce_1_unscaled: 0.6688 (0.7396) loss_bbox_1_unscaled: 0.2094 (0.3195) loss_giou_1_unscaled: 0.7635 (0.9694) cardinality_error_1_unscaled: 2.7500 (3.7004) loss_ce_2_unscaled: 0.6613 (0.7373) loss_bbox_2_unscaled: 0.2141 (0.3321) loss_giou_2_unscaled: 0.7705 (0.9923) cardinality_error_2_unscaled: 2.7500 (4.2181) loss_ce_3_unscaled: 0.6688 (0.7472) loss_bbox_3_unscaled: 0.1848 (0.3185) loss_giou_3_unscaled: 0.7098 (0.9637) cardinality_error_3_unscaled: 2.7500 (4.2766) loss_ce_4_unscaled: 0.6690 (0.7476) loss_bbox_4_unscaled: 0.1946 (0.3290) loss_giou_4_unscaled: 0.7397 (0.9864) cardinality_error_4_unscaled: 2.7500 (4.2748) time: 0.3280 data: 0.0146 max mem: 5637
Epoch: [0] [150/225] eta: 0:00:25 lr: 0.001000 class_error: 100.00 loss: 19.4377 (25.4237) loss_ce: 0.6484 (0.7362) loss_bbox: 1.1630 (1.5634) loss_giou: 1.6169 (1.9105) loss_ce_0: 0.6593 (0.7312) loss_bbox_0: 1.0526 (1.5859) loss_giou_0: 1.5327 (1.9289) loss_ce_1: 0.6352 (0.7314) loss_bbox_1: 1.0940 (1.5660) loss_giou_1: 1.5832 (1.9111) loss_ce_2: 0.6494 (0.7303) loss_bbox_2: 1.0556 (1.6159) loss_giou_2: 1.4842 (1.9477) loss_ce_3: 0.6339 (0.7397) loss_bbox_3: 0.8654 (1.5534) loss_giou_3: 1.3514 (1.8963) loss_ce_4: 0.6644 (0.7394) loss_bbox_4: 0.9673 (1.6013) loss_giou_4: 1.3806 (1.9351) loss_ce_unscaled: 0.6484 (0.7362) class_error_unscaled: 100.0000 (99.5585) loss_bbox_unscaled: 0.2326 (0.3127) loss_giou_unscaled: 0.8085 (0.9552) cardinality_error_unscaled: 2.7500 (4.1556) loss_ce_0_unscaled: 0.6593 (0.7312) loss_bbox_0_unscaled: 0.2105 (0.3172) loss_giou_0_unscaled: 0.7664 (0.9644) cardinality_error_0_unscaled: 2.7500 (3.5265) loss_ce_1_unscaled: 0.6352 (0.7314) loss_bbox_1_unscaled: 0.2188 (0.3132) loss_giou_1_unscaled: 0.7916 (0.9556) cardinality_error_1_unscaled: 2.7500 (3.6175) loss_ce_2_unscaled: 0.6494 (0.7303) loss_bbox_2_unscaled: 0.2111 (0.3232) loss_giou_2_unscaled: 0.7421 (0.9738) cardinality_error_2_unscaled: 2.7500 (4.1010) loss_ce_3_unscaled: 0.6339 (0.7397) loss_bbox_3_unscaled: 0.1731 (0.3107) loss_giou_3_unscaled: 0.6757 (0.9481) cardinality_error_3_unscaled: 2.7500 (4.1556) loss_ce_4_unscaled: 0.6644 (0.7394) loss_bbox_4_unscaled: 0.1935 (0.3203) loss_giou_4_unscaled: 0.6903 (0.9675) cardinality_error_4_unscaled: 2.7500 (4.1540) time: 0.3254 data: 0.0145 max mem: 5637
Epoch: [0] [160/225] eta: 0:00:21 lr: 0.001000 class_error: 100.00 loss: 18.5955 (24.9967) loss_ce: 0.6175 (0.7305) loss_bbox: 1.0154 (1.5311) loss_giou: 1.4537 (1.8840) loss_ce_0: 0.6441 (0.7263) loss_bbox_0: 1.0167 (1.5470) loss_giou_0: 1.4343 (1.8967) loss_ce_1: 0.6136 (0.7256) loss_bbox_1: 0.9467 (1.5288) loss_giou_1: 1.4095 (1.8783) loss_ce_2: 0.6178 (0.7251) loss_bbox_2: 0.9770 (1.5762) loss_giou_2: 1.4386 (1.9168) loss_ce_3: 0.6290 (0.7335) loss_bbox_3: 0.9467 (1.5208) loss_giou_3: 1.4775 (1.8709) loss_ce_4: 0.6210 (0.7343) loss_bbox_4: 0.9768 (1.5656) loss_giou_4: 1.3806 (1.9053) loss_ce_unscaled: 0.6175 (0.7305) class_error_unscaled: 100.0000 (99.5859) loss_bbox_unscaled: 0.2031 (0.3062) loss_giou_unscaled: 0.7269 (0.9420) cardinality_error_unscaled: 2.5000 (4.0590) loss_ce_0_unscaled: 0.6441 (0.7263) loss_bbox_0_unscaled: 0.2033 (0.3094) loss_giou_0_unscaled: 0.7172 (0.9484) cardinality_error_0_unscaled: 2.5000 (3.4689) loss_ce_1_unscaled: 0.6136 (0.7256) loss_bbox_1_unscaled: 0.1893 (0.3058) loss_giou_1_unscaled: 0.7048 (0.9391) cardinality_error_1_unscaled: 2.5000 (3.5543) loss_ce_2_unscaled: 0.6178 (0.7251) loss_bbox_2_unscaled: 0.1954 (0.3152) loss_giou_2_unscaled: 0.7193 (0.9584) cardinality_error_2_unscaled: 2.5000 (4.0078) loss_ce_3_unscaled: 0.6290 (0.7335) loss_bbox_3_unscaled: 0.1893 (0.3042) loss_giou_3_unscaled: 0.7387 (0.9354) cardinality_error_3_unscaled: 2.5000 (4.0590) loss_ce_4_unscaled: 0.6210 (0.7343) loss_bbox_4_unscaled: 0.1954 (0.3131) loss_giou_4_unscaled: 0.6903 (0.9526) cardinality_error_4_unscaled: 2.5000 (4.0575) time: 0.3258 data: 0.0145 max mem: 5637
Epoch: [0] [170/225] eta: 0:00:18 lr: 0.001000 class_error: 100.00 loss: 18.7155 (24.6388) loss_ce: 0.6814 (0.7295) loss_bbox: 0.9805 (1.4954) loss_giou: 1.4094 (1.8550) loss_ce_0: 0.6808 (0.7260) loss_bbox_0: 0.9178 (1.5164) loss_giou_0: 1.4553 (1.8778) loss_ce_1: 0.6558 (0.7249) loss_bbox_1: 0.9360 (1.4940) loss_giou_1: 1.3779 (1.8533) loss_ce_2: 0.6761 (0.7245) loss_bbox_2: 0.9238 (1.5357) loss_giou_2: 1.3853 (1.8836) loss_ce_3: 0.6566 (0.7324) loss_bbox_3: 0.9437 (1.4854) loss_giou_3: 1.4492 (1.8438) loss_ce_4: 0.6907 (0.7334) loss_bbox_4: 0.9971 (1.5383) loss_giou_4: 1.5085 (1.8894) loss_ce_unscaled: 0.6814 (0.7295) class_error_unscaled: 100.0000 (99.6101) loss_bbox_unscaled: 0.1961 (0.2991) loss_giou_unscaled: 0.7047 (0.9275) cardinality_error_unscaled: 2.7500 (4.0073) loss_ce_0_unscaled: 0.6808 (0.7260) loss_bbox_0_unscaled: 0.1836 (0.3033) loss_giou_0_unscaled: 0.7277 (0.9389) cardinality_error_0_unscaled: 2.7500 (3.4488) loss_ce_1_unscaled: 0.6558 (0.7249) loss_bbox_1_unscaled: 0.1872 (0.2988) loss_giou_1_unscaled: 0.6889 (0.9267) cardinality_error_1_unscaled: 2.7500 (3.5322) loss_ce_2_unscaled: 0.6761 (0.7245) loss_bbox_2_unscaled: 0.1848 (0.3071) loss_giou_2_unscaled: 0.6927 (0.9418) cardinality_error_2_unscaled: 2.7500 (3.9576) loss_ce_3_unscaled: 0.6566 (0.7324) loss_bbox_3_unscaled: 0.1887 (0.2971) loss_giou_3_unscaled: 0.7246 (0.9219) cardinality_error_3_unscaled: 2.7500 (4.0073) loss_ce_4_unscaled: 0.6907 (0.7334) loss_bbox_4_unscaled: 0.1994 (0.3077) loss_giou_4_unscaled: 0.7542 (0.9447) cardinality_error_4_unscaled: 2.7500 (4.0015) time: 0.3224 data: 0.0144 max mem: 5637
Epoch: [0] [180/225] eta: 0:00:14 lr: 0.001000 class_error: 100.00 loss: 18.4054 (24.3400) loss_ce: 0.7040 (0.7282) loss_bbox: 0.9830 (1.4731) loss_giou: 1.3940 (1.8341) loss_ce_0: 0.7026 (0.7243) loss_bbox_0: 1.0940 (1.4929) loss_giou_0: 1.4768 (1.8555) loss_ce_1: 0.7101 (0.7233) loss_bbox_1: 0.9232 (1.4649) loss_giou_1: 1.4000 (1.8267) loss_ce_2: 0.7053 (0.7232) loss_bbox_2: 0.9237 (1.5075) loss_giou_2: 1.3560 (1.8598) loss_ce_3: 0.7139 (0.7305) loss_bbox_3: 0.9044 (1.4594) loss_giou_3: 1.3482 (1.8174) loss_ce_4: 0.7048 (0.7320) loss_bbox_4: 1.0585 (1.5171) loss_giou_4: 1.5085 (1.8700) loss_ce_unscaled: 0.7040 (0.7282) class_error_unscaled: 100.0000 (99.6317) loss_bbox_unscaled: 0.1966 (0.2946) loss_giou_unscaled: 0.6970 (0.9170) cardinality_error_unscaled: 3.2500 (3.9599) loss_ce_0_unscaled: 0.7026 (0.7243) loss_bbox_0_unscaled: 0.2188 (0.2986) loss_giou_0_unscaled: 0.7384 (0.9278) cardinality_error_0_unscaled: 3.2500 (3.4309) loss_ce_1_unscaled: 0.7101 (0.7233) loss_bbox_1_unscaled: 0.1846 (0.2930) loss_giou_1_unscaled: 0.7000 (0.9134) cardinality_error_1_unscaled: 3.2500 (3.5110) loss_ce_2_unscaled: 0.7053 (0.7232) loss_bbox_2_unscaled: 0.1847 (0.3015) loss_giou_2_unscaled: 0.6780 (0.9299) cardinality_error_2_unscaled: 3.2500 (3.9130) loss_ce_3_unscaled: 0.7139 (0.7305) loss_bbox_3_unscaled: 0.1809 (0.2919) loss_giou_3_unscaled: 0.6741 (0.9087) cardinality_error_3_unscaled: 3.2500 (3.9586) loss_ce_4_unscaled: 0.7048 (0.7320) loss_bbox_4_unscaled: 0.2117 (0.3034) loss_giou_4_unscaled: 0.7542 (0.9350) cardinality_error_4_unscaled: 3.2500 (3.9517) time: 0.3212 data: 0.0148 max mem: 5637
Epoch: [0] [190/225] eta: 0:00:11 lr: 0.001000 class_error: 100.00 loss: 18.0909 (23.9890) loss_ce: 0.7036 (0.7262) loss_bbox: 0.9683 (1.4462) loss_giou: 1.4349 (1.8160) loss_ce_0: 0.7026 (0.7226) loss_bbox_0: 0.9071 (1.4586) loss_giou_0: 1.3566 (1.8274) loss_ce_1: 0.6941 (0.7210) loss_bbox_1: 0.8661 (1.4328) loss_giou_1: 1.3329 (1.8017) loss_ce_2: 0.7035 (0.7214) loss_bbox_2: 0.8918 (1.4733) loss_giou_2: 1.3383 (1.8350) loss_ce_3: 0.6843 (0.7281) loss_bbox_3: 0.8703 (1.4305) loss_giou_3: 1.3343 (1.7969) loss_ce_4: 0.7020 (0.7305) loss_bbox_4: 0.9180 (1.4799) loss_giou_4: 1.3109 (1.8409) loss_ce_unscaled: 0.7036 (0.7262) class_error_unscaled: 100.0000 (99.6510) loss_bbox_unscaled: 0.1937 (0.2892) loss_giou_unscaled: 0.7174 (0.9080) cardinality_error_unscaled: 3.2500 (3.9110) loss_ce_0_unscaled: 0.7026 (0.7226) loss_bbox_0_unscaled: 0.1814 (0.2917) loss_giou_0_unscaled: 0.6783 (0.9137) cardinality_error_0_unscaled: 3.2500 (3.4097) loss_ce_1_unscaled: 0.6941 (0.7210) loss_bbox_1_unscaled: 0.1732 (0.2866) loss_giou_1_unscaled: 0.6665 (0.9009) cardinality_error_1_unscaled: 3.2500 (3.4856) loss_ce_2_unscaled: 0.7035 (0.7214) loss_bbox_2_unscaled: 0.1784 (0.2947) loss_giou_2_unscaled: 0.6692 (0.9175) cardinality_error_2_unscaled: 3.2500 (3.8665) loss_ce_3_unscaled: 0.6843 (0.7281) loss_bbox_3_unscaled: 0.1741 (0.2861) loss_giou_3_unscaled: 0.6672 (0.8984) cardinality_error_3_unscaled: 3.2500 (3.9097) loss_ce_4_unscaled: 0.7020 (0.7305) loss_bbox_4_unscaled: 0.1836 (0.2960) loss_giou_4_unscaled: 0.6555 (0.9204) cardinality_error_4_unscaled: 3.2500 (3.9031) time: 0.3258 data: 0.0149 max mem: 5640
Epoch: [0] [200/225] eta: 0:00:08 lr: 0.001000 class_error: 100.00 loss: 17.8010 (23.7049) loss_ce: 0.6675 (0.7230) loss_bbox: 0.9316 (1.4328) loss_giou: 1.4737 (1.8085) loss_ce_0: 0.6540 (0.7194) loss_bbox_0: 0.8879 (1.4402) loss_giou_0: 1.3566 (1.8152) loss_ce_1: 0.6669 (0.7185) loss_bbox_1: 0.8515 (1.4078) loss_giou_1: 1.3605 (1.7799) loss_ce_2: 0.6703 (0.7191) loss_bbox_2: 0.8717 (1.4446) loss_giou_2: 1.3383 (1.8108) loss_ce_3: 0.6720 (0.7254) loss_bbox_3: 0.8708 (1.4028) loss_giou_3: 1.3336 (1.7713) loss_ce_4: 0.6676 (0.7272) loss_bbox_4: 0.7845 (1.4471) loss_giou_4: 1.2801 (1.8113) loss_ce_unscaled: 0.6675 (0.7230) class_error_unscaled: 100.0000 (99.6683) loss_bbox_unscaled: 0.1863 (0.2866) loss_giou_unscaled: 0.7369 (0.9042) cardinality_error_unscaled: 2.7500 (3.8507) loss_ce_0_unscaled: 0.6540 (0.7194) loss_bbox_0_unscaled: 0.1776 (0.2880) loss_giou_0_unscaled: 0.6783 (0.9076) cardinality_error_0_unscaled: 2.7500 (3.3769) loss_ce_1_unscaled: 0.6669 (0.7185) loss_bbox_1_unscaled: 0.1703 (0.2816) loss_giou_1_unscaled: 0.6803 (0.8900) cardinality_error_1_unscaled: 2.7500 (3.4490) loss_ce_2_unscaled: 0.6703 (0.7191) loss_bbox_2_unscaled: 0.1743 (0.2889) loss_giou_2_unscaled: 0.6692 (0.9054) cardinality_error_2_unscaled: 2.7500 (3.8109) loss_ce_3_unscaled: 0.6720 (0.7254) loss_bbox_3_unscaled: 0.1742 (0.2806) loss_giou_3_unscaled: 0.6668 (0.8857) cardinality_error_3_unscaled: 2.7500 (3.8520) loss_ce_4_unscaled: 0.6676 (0.7272) loss_bbox_4_unscaled: 0.1569 (0.2894) loss_giou_4_unscaled: 0.6400 (0.9056) cardinality_error_4_unscaled: 2.7500 (3.8458) time: 0.3301 data: 0.0145 max mem: 5640
Epoch: [0] [210/225] eta: 0:00:04 lr: 0.001000 class_error: 100.00 loss: 18.0540 (23.4763) loss_ce: 0.6480 (0.7208) loss_bbox: 1.0048 (1.4134) loss_giou: 1.5002 (1.7948) loss_ce_0: 0.6452 (0.7170) loss_bbox_0: 1.0194 (1.4193) loss_giou_0: 1.5324 (1.8009) loss_ce_1: 0.6662 (0.7162) loss_bbox_1: 0.9526 (1.3892) loss_giou_1: 1.4151 (1.7668) loss_ce_2: 0.6622 (0.7172) loss_bbox_2: 0.9372 (1.4226) loss_giou_2: 1.3879 (1.7945) loss_ce_3: 0.6688 (0.7231) loss_bbox_3: 0.8915 (1.3805) loss_giou_3: 1.3420 (1.7544) loss_ce_4: 0.6564 (0.7250) loss_bbox_4: 0.8851 (1.4247) loss_giou_4: 1.3233 (1.7959) loss_ce_unscaled: 0.6480 (0.7208) class_error_unscaled: 100.0000 (99.6840) loss_bbox_unscaled: 0.2010 (0.2827) loss_giou_unscaled: 0.7501 (0.8974) cardinality_error_unscaled: 2.5000 (3.8021) loss_ce_0_unscaled: 0.6452 (0.7170) loss_bbox_0_unscaled: 0.2039 (0.2839) loss_giou_0_unscaled: 0.7662 (0.9004) cardinality_error_0_unscaled: 2.5000 (3.3507) loss_ce_1_unscaled: 0.6662 (0.7162) loss_bbox_1_unscaled: 0.1905 (0.2778) loss_giou_1_unscaled: 0.7075 (0.8834) cardinality_error_1_unscaled: 2.5000 (3.4194) loss_ce_2_unscaled: 0.6622 (0.7172) loss_bbox_2_unscaled: 0.1874 (0.2845) loss_giou_2_unscaled: 0.6940 (0.8972) cardinality_error_2_unscaled: 2.5000 (3.7642) loss_ce_3_unscaled: 0.6688 (0.7231) loss_bbox_3_unscaled: 0.1783 (0.2761) loss_giou_3_unscaled: 0.6710 (0.8772) cardinality_error_3_unscaled: 2.5000 (3.8033) loss_ce_4_unscaled: 0.6564 (0.7250) loss_bbox_4_unscaled: 0.1770 (0.2849) loss_giou_4_unscaled: 0.6616 (0.8979) cardinality_error_4_unscaled: 2.5000 (3.7974) time: 0.3318 data: 0.0148 max mem: 5640
Epoch: [0] [220/225] eta: 0:00:01 lr: 0.001000 class_error: 100.00 loss: 18.4685 (23.2550) loss_ce: 0.7104 (0.7212) loss_bbox: 1.0054 (1.3937) loss_giou: 1.4747 (1.7770) loss_ce_0: 0.7039 (0.7182) loss_bbox_0: 0.8646 (1.3952) loss_giou_0: 1.3435 (1.7785) loss_ce_1: 0.7096 (0.7172) loss_bbox_1: 0.9061 (1.3666) loss_giou_1: 1.4124 (1.7470) loss_ce_2: 0.7033 (0.7182) loss_bbox_2: 1.0047 (1.4031) loss_giou_2: 1.3926 (1.7766) loss_ce_3: 0.7101 (0.7243) loss_bbox_3: 0.8724 (1.3616) loss_giou_3: 1.3556 (1.7367) loss_ce_4: 0.7159 (0.7252) loss_bbox_4: 0.9905 (1.4106) loss_giou_4: 1.4590 (1.7839) loss_ce_unscaled: 0.7104 (0.7212) class_error_unscaled: 100.0000 (99.6983) loss_bbox_unscaled: 0.2011 (0.2787) loss_giou_unscaled: 0.7374 (0.8885) cardinality_error_unscaled: 3.2500 (3.7896) loss_ce_0_unscaled: 0.7039 (0.7182) loss_bbox_0_unscaled: 0.1729 (0.2790) loss_giou_0_unscaled: 0.6718 (0.8893) cardinality_error_0_unscaled: 3.0000 (3.3575) loss_ce_1_unscaled: 0.7096 (0.7172) loss_bbox_1_unscaled: 0.1812 (0.2733) loss_giou_1_unscaled: 0.7062 (0.8735) cardinality_error_1_unscaled: 3.2500 (3.4231) loss_ce_2_unscaled: 0.7033 (0.7182) loss_bbox_2_unscaled: 0.2009 (0.2806) loss_giou_2_unscaled: 0.6963 (0.8883) cardinality_error_2_unscaled: 3.2500 (3.7534) loss_ce_3_unscaled: 0.7101 (0.7243) loss_bbox_3_unscaled: 0.1745 (0.2723) loss_giou_3_unscaled: 0.6778 (0.8684) cardinality_error_3_unscaled: 3.2500 (3.7919) loss_ce_4_unscaled: 0.7159 (0.7252) loss_bbox_4_unscaled: 0.1981 (0.2821) loss_giou_4_unscaled: 0.7295 (0.8919) cardinality_error_4_unscaled: 3.2500 (3.7862) time: 0.3244 data: 0.0147 max mem: 5640
Epoch: [0] [224/225] eta: 0:00:00 lr: 0.001000 class_error: 100.00 loss: 18.4685 (23.1879) loss_ce: 0.6992 (0.7193) loss_bbox: 1.0062 (1.3909) loss_giou: 1.4650 (1.7726) loss_ce_0: 0.7027 (0.7163) loss_bbox_0: 0.8961 (1.3926) loss_giou_0: 1.3753 (1.7748) loss_ce_1: 0.7094 (0.7157) loss_bbox_1: 0.9046 (1.3604) loss_giou_1: 1.3026 (1.7393) loss_ce_2: 0.6922 (0.7161) loss_bbox_2: 1.0614 (1.4012) loss_giou_2: 1.3909 (1.7740) loss_ce_3: 0.7083 (0.7224) loss_bbox_3: 0.9956 (1.3571) loss_giou_3: 1.3279 (1.7304) loss_ce_4: 0.7025 (0.7232) loss_bbox_4: 1.0279 (1.4045) loss_giou_4: 1.4262 (1.7770) loss_ce_unscaled: 0.6992 (0.7193) class_error_unscaled: 100.0000 (99.7037) loss_bbox_unscaled: 0.2012 (0.2782) loss_giou_unscaled: 0.7325 (0.8863) cardinality_error_unscaled: 3.0000 (3.7622) loss_ce_0_unscaled: 0.7027 (0.7163) loss_bbox_0_unscaled: 0.1792 (0.2785) loss_giou_0_unscaled: 0.6876 (0.8874) cardinality_error_0_unscaled: 3.0000 (3.3389) loss_ce_1_unscaled: 0.7094 (0.7157) loss_bbox_1_unscaled: 0.1809 (0.2721) loss_giou_1_unscaled: 0.6513 (0.8697) cardinality_error_1_unscaled: 3.0000 (3.4033) loss_ce_2_unscaled: 0.6922 (0.7161) loss_bbox_2_unscaled: 0.2123 (0.2802) loss_giou_2_unscaled: 0.6955 (0.8870) cardinality_error_2_unscaled: 3.0000 (3.7278) loss_ce_3_unscaled: 0.7083 (0.7224) loss_bbox_3_unscaled: 0.1991 (0.2714) loss_giou_3_unscaled: 0.6639 (0.8652) cardinality_error_3_unscaled: 3.0000 (3.7656) loss_ce_4_unscaled: 0.7025 (0.7232) loss_bbox_4_unscaled: 0.2056 (0.2809) loss_giou_4_unscaled: 0.7131 (0.8885) cardinality_error_4_unscaled: 3.0000 (3.7600) time: 0.3175 data: 0.0142 max mem: 5640
Epoch: [0] Total time: 0:01:14 (0.3314 s / it)
Averaged stats: lr: 0.001000 class_error: 100.00 loss: 18.4685 (23.1879) loss_ce: 0.6992 (0.7193) loss_bbox: 1.0062 (1.3909) loss_giou: 1.4650 (1.7726) loss_ce_0: 0.7027 (0.7163) loss_bbox_0: 0.8961 (1.3926) loss_giou_0: 1.3753 (1.7748) loss_ce_1: 0.7094 (0.7157) loss_bbox_1: 0.9046 (1.3604) loss_giou_1: 1.3026 (1.7393) loss_ce_2: 0.6922 (0.7161) loss_bbox_2: 1.0614 (1.4012) loss_giou_2: 1.3909 (1.7740) loss_ce_3: 0.7083 (0.7224) loss_bbox_3: 0.9956 (1.3571) loss_giou_3: 1.3279 (1.7304) loss_ce_4: 0.7025 (0.7232) loss_bbox_4: 1.0279 (1.4045) loss_giou_4: 1.4262 (1.7770) loss_ce_unscaled: 0.6992 (0.7193) class_error_unscaled: 100.0000 (99.7037) loss_bbox_unscaled: 0.2012 (0.2782) loss_giou_unscaled: 0.7325 (0.8863) cardinality_error_unscaled: 3.0000 (3.7622) loss_ce_0_unscaled: 0.7027 (0.7163) loss_bbox_0_unscaled: 0.1792 (0.2785) loss_giou_0_unscaled: 0.6876 (0.8874) cardinality_error_0_unscaled: 3.0000 (3.3389) loss_ce_1_unscaled: 0.7094 (0.7157) loss_bbox_1_unscaled: 0.1809 (0.2721) loss_giou_1_unscaled: 0.6513 (0.8697) cardinality_error_1_unscaled: 3.0000 (3.4033) loss_ce_2_unscaled: 0.6922 (0.7161) loss_bbox_2_unscaled: 0.2123 (0.2802) loss_giou_2_unscaled: 0.6955 (0.8870) cardinality_error_2_unscaled: 3.0000 (3.7278) loss_ce_3_unscaled: 0.7083 (0.7224) loss_bbox_3_unscaled: 0.1991 (0.2714) loss_giou_3_unscaled: 0.6639 (0.8652) cardinality_error_3_unscaled: 3.0000 (3.7656) loss_ce_4_unscaled: 0.7025 (0.7232) loss_bbox_4_unscaled: 0.2056 (0.2809) loss_giou_4_unscaled: 0.7131 (0.8885) cardinality_error_4_unscaled: 3.0000 (3.7600)
Test: [ 0/25] eta: 0:00:11 class_error: 100.00 loss: 39.7441 (39.7441) loss_ce: 0.8244 (0.8244) loss_bbox: 2.4085 (2.4085) loss_giou: 2.7545 (2.7545) loss_ce_0: 0.8306 (0.8306) loss_bbox_0: 3.6152 (3.6152) loss_giou_0: 3.1071 (3.1071) loss_ce_1: 0.8320 (0.8320) loss_bbox_1: 2.8095 (2.8095) loss_giou_1: 3.0116 (3.0116) loss_ce_2: 0.8365 (0.8365) loss_bbox_2: 3.6158 (3.6158) loss_giou_2: 3.1490 (3.1490) loss_ce_3: 0.8286 (0.8286) loss_bbox_3: 2.3526 (2.3526) loss_giou_3: 2.6735 (2.6735) loss_ce_4: 0.8391 (0.8391) loss_bbox_4: 2.5136 (2.5136) loss_giou_4: 2.7420 (2.7420) loss_ce_unscaled: 0.8244 (0.8244) class_error_unscaled: 100.0000 (100.0000) loss_bbox_unscaled: 0.4817 (0.4817) loss_giou_unscaled: 1.3773 (1.3773) cardinality_error_unscaled: 4.0000 (4.0000) loss_ce_0_unscaled: 0.8306 (0.8306) loss_bbox_0_unscaled: 0.7230 (0.7230) loss_giou_0_unscaled: 1.5536 (1.5536) cardinality_error_0_unscaled: 4.0000 (4.0000) loss_ce_1_unscaled: 0.8320 (0.8320) loss_bbox_1_unscaled: 0.5619 (0.5619) loss_giou_1_unscaled: 1.5058 (1.5058) cardinality_error_1_unscaled: 4.0000 (4.0000) loss_ce_2_unscaled: 0.8365 (0.8365) loss_bbox_2_unscaled: 0.7232 (0.7232) loss_giou_2_unscaled: 1.5745 (1.5745) cardinality_error_2_unscaled: 4.0000 (4.0000) loss_ce_3_unscaled: 0.8286 (0.8286) loss_bbox_3_unscaled: 0.4705 (0.4705) loss_giou_3_unscaled: 1.3368 (1.3368) cardinality_error_3_unscaled: 4.0000 (4.0000) loss_ce_4_unscaled: 0.8391 (0.8391) loss_bbox_4_unscaled: 0.5027 (0.5027) loss_giou_4_unscaled: 1.3710 (1.3710) cardinality_error_4_unscaled: 4.0000 (4.0000) time: 0.4770 data: 0.2986 max mem: 5640
Test: [10/25] eta: 0:00:03 class_error: 100.00 loss: 41.7285 (42.7447) loss_ce: 0.7112 (0.7057) loss_bbox: 2.8246 (2.9384) loss_giou: 2.9177 (2.9712) loss_ce_0: 0.7185 (0.7087) loss_bbox_0: 3.9275 (4.0503) loss_giou_0: 3.3074 (3.3029) loss_ce_1: 0.7018 (0.7094) loss_bbox_1: 3.0020 (3.0818) loss_giou_1: 3.0118 (3.0638) loss_ce_2: 0.7163 (0.7117) loss_bbox_2: 3.7587 (3.9167) loss_giou_2: 3.3040 (3.2931) loss_ce_3: 0.7120 (0.7075) loss_bbox_3: 2.7522 (2.8755) loss_giou_3: 2.8856 (2.9507) loss_ce_4: 0.7222 (0.7131) loss_bbox_4: 2.9204 (3.0646) loss_giou_4: 2.9303 (2.9795) loss_ce_unscaled: 0.7112 (0.7057) class_error_unscaled: 100.0000 (100.0000) loss_bbox_unscaled: 0.5649 (0.5877) loss_giou_unscaled: 1.4588 (1.4856) cardinality_error_unscaled: 3.0000 (3.0682) loss_ce_0_unscaled: 0.7185 (0.7087) loss_bbox_0_unscaled: 0.7855 (0.8101) loss_giou_0_unscaled: 1.6537 (1.6514) cardinality_error_0_unscaled: 3.0000 (3.0682) loss_ce_1_unscaled: 0.7018 (0.7094) loss_bbox_1_unscaled: 0.6004 (0.6164) loss_giou_1_unscaled: 1.5059 (1.5319) cardinality_error_1_unscaled: 3.0000 (3.0682) loss_ce_2_unscaled: 0.7163 (0.7117) loss_bbox_2_unscaled: 0.7517 (0.7833) loss_giou_2_unscaled: 1.6520 (1.6465) cardinality_error_2_unscaled: 3.0000 (3.0682) loss_ce_3_unscaled: 0.7120 (0.7075) loss_bbox_3_unscaled: 0.5504 (0.5751) loss_giou_3_unscaled: 1.4428 (1.4753) cardinality_error_3_unscaled: 3.0000 (3.0682) loss_ce_4_unscaled: 0.7222 (0.7131) loss_bbox_4_unscaled: 0.5841 (0.6129) loss_giou_4_unscaled: 1.4651 (1.4898) cardinality_error_4_unscaled: 3.0000 (3.0682) time: 0.2017 data: 0.0414 max mem: 5640
Test: [20/25] eta: 0:00:00 class_error: 100.00 loss: 41.2935 (41.5291) loss_ce: 0.6903 (0.7185) loss_bbox: 2.7155 (2.8024) loss_giou: 2.9177 (2.9212) loss_ce_0: 0.6937 (0.7217) loss_bbox_0: 3.8261 (3.8550) loss_giou_0: 3.2795 (3.2636) loss_ce_1: 0.7012 (0.7231) loss_bbox_1: 2.8832 (2.9342) loss_giou_1: 2.9980 (2.9970) loss_ce_2: 0.6953 (0.7253) loss_bbox_2: 3.6141 (3.7033) loss_giou_2: 3.2122 (3.2223) loss_ce_3: 0.6916 (0.7206) loss_bbox_3: 2.6811 (2.7424) loss_giou_3: 2.8825 (2.9045) loss_ce_4: 0.6974 (0.7266) loss_bbox_4: 2.8724 (2.9167) loss_giou_4: 2.9303 (2.9306) loss_ce_unscaled: 0.6903 (0.7185) class_error_unscaled: 100.0000 (100.0000) loss_bbox_unscaled: 0.5431 (0.5605) loss_giou_unscaled: 1.4588 (1.4606) cardinality_error_unscaled: 3.0000 (3.1548) loss_ce_0_unscaled: 0.6937 (0.7217) loss_bbox_0_unscaled: 0.7652 (0.7710) loss_giou_0_unscaled: 1.6397 (1.6318) cardinality_error_0_unscaled: 3.0000 (3.1548) loss_ce_1_unscaled: 0.7012 (0.7231) loss_bbox_1_unscaled: 0.5766 (0.5868) loss_giou_1_unscaled: 1.4990 (1.4985) cardinality_error_1_unscaled: 3.0000 (3.1548) loss_ce_2_unscaled: 0.6953 (0.7253) loss_bbox_2_unscaled: 0.7228 (0.7407) loss_giou_2_unscaled: 1.6061 (1.6111) cardinality_error_2_unscaled: 3.0000 (3.1548) loss_ce_3_unscaled: 0.6916 (0.7206) loss_bbox_3_unscaled: 0.5362 (0.5485) loss_giou_3_unscaled: 1.4412 (1.4522) cardinality_error_3_unscaled: 3.0000 (3.1548) loss_ce_4_unscaled: 0.6974 (0.7266) loss_bbox_4_unscaled: 0.5745 (0.5833) loss_giou_4_unscaled: 1.4651 (1.4653) cardinality_error_4_unscaled: 3.0000 (3.1548) time: 0.1793 data: 0.0158 max mem: 5640
Test: [24/25] eta: 0:00:00 class_error: 100.00 loss: 41.2935 (41.5760) loss_ce: 0.7405 (0.7177) loss_bbox: 2.8130 (2.8165) loss_giou: 2.9177 (2.9297) loss_ce_0: 0.7413 (0.7207) loss_bbox_0: 3.7636 (3.8434) loss_giou_0: 3.2680 (3.2679) loss_ce_1: 0.7550 (0.7228) loss_bbox_1: 2.8832 (2.9293) loss_giou_1: 3.0061 (3.0050) loss_ce_2: 0.7492 (0.7245) loss_bbox_2: 3.6117 (3.6901) loss_giou_2: 3.2321 (3.2339) loss_ce_3: 0.7438 (0.7199) loss_bbox_3: 2.6811 (2.7477) loss_giou_3: 2.8856 (2.9175) loss_ce_4: 0.7476 (0.7257) loss_bbox_4: 2.8724 (2.9254) loss_giou_4: 2.9195 (2.9384) loss_ce_unscaled: 0.7405 (0.7177) class_error_unscaled: 100.0000 (100.0000) loss_bbox_unscaled: 0.5626 (0.5633) loss_giou_unscaled: 1.4588 (1.4649) cardinality_error_unscaled: 3.0000 (3.1400) loss_ce_0_unscaled: 0.7413 (0.7207) loss_bbox_0_unscaled: 0.7527 (0.7687) loss_giou_0_unscaled: 1.6340 (1.6339) cardinality_error_0_unscaled: 3.0000 (3.1400) loss_ce_1_unscaled: 0.7550 (0.7228) loss_bbox_1_unscaled: 0.5766 (0.5859) loss_giou_1_unscaled: 1.5031 (1.5025) cardinality_error_1_unscaled: 3.0000 (3.1400) loss_ce_2_unscaled: 0.7492 (0.7245) loss_bbox_2_unscaled: 0.7223 (0.7380) loss_giou_2_unscaled: 1.6160 (1.6169) cardinality_error_2_unscaled: 3.0000 (3.1400) loss_ce_3_unscaled: 0.7438 (0.7199) loss_bbox_3_unscaled: 0.5362 (0.5495) loss_giou_3_unscaled: 1.4428 (1.4588) cardinality_error_3_unscaled: 3.0000 (3.1400) loss_ce_4_unscaled: 0.7476 (0.7257) loss_bbox_4_unscaled: 0.5745 (0.5851) loss_giou_4_unscaled: 1.4597 (1.4692) cardinality_error_4_unscaled: 3.0000 (3.1400) time: 0.1772 data: 0.0158 max mem: 5640
Test: Total time: 0:00:04 (0.1923 s / it)
Averaged stats: class_error: 100.00 loss: 41.2935 (41.5760) loss_ce: 0.7405 (0.7177) loss_bbox: 2.8130 (2.8165) loss_giou: 2.9177 (2.9297) loss_ce_0: 0.7413 (0.7207) loss_bbox_0: 3.7636 (3.8434) loss_giou_0: 3.2680 (3.2679) loss_ce_1: 0.7550 (0.7228) loss_bbox_1: 2.8832 (2.9293) loss_giou_1: 3.0061 (3.0050) loss_ce_2: 0.7492 (0.7245) loss_bbox_2: 3.6117 (3.6901) loss_giou_2: 3.2321 (3.2339) loss_ce_3: 0.7438 (0.7199) loss_bbox_3: 2.6811 (2.7477) loss_giou_3: 2.8856 (2.9175) loss_ce_4: 0.7476 (0.7257) loss_bbox_4: 2.8724 (2.9254) loss_giou_4: 2.9195 (2.9384) loss_ce_unscaled: 0.7405 (0.7177) class_error_unscaled: 100.0000 (100.0000) loss_bbox_unscaled: 0.5626 (0.5633) loss_giou_unscaled: 1.4588 (1.4649) cardinality_error_unscaled: 3.0000 (3.1400) loss_ce_0_unscaled: 0.7413 (0.7207) loss_bbox_0_unscaled: 0.7527 (0.7687) loss_giou_0_unscaled: 1.6340 (1.6339) cardinality_error_0_unscaled: 3.0000 (3.1400) loss_ce_1_unscaled: 0.7550 (0.7228) loss_bbox_1_unscaled: 0.5766 (0.5859) loss_giou_1_unscaled: 1.5031 (1.5025) cardinality_error_1_unscaled: 3.0000 (3.1400) loss_ce_2_unscaled: 0.7492 (0.7245) loss_bbox_2_unscaled: 0.7223 (0.7380) loss_giou_2_unscaled: 1.6160 (1.6169) cardinality_error_2_unscaled: 3.0000 (3.1400) loss_ce_3_unscaled: 0.7438 (0.7199) loss_bbox_3_unscaled: 0.5362 (0.5495) loss_giou_3_unscaled: 1.4428 (1.4588) cardinality_error_3_unscaled: 3.0000 (3.1400) loss_ce_4_unscaled: 0.7476 (0.7257) loss_bbox_4_unscaled: 0.5745 (0.5851) loss_giou_4_unscaled: 1.4597 (1.4692) cardinality_error_4_unscaled: 3.0000 (3.1400)
Accumulating evaluation results...
DONE (t=0.08s).
IoU metric: bbox
**Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.002**
Could you explain this to me, please?
What is the parameters I must to change?
Thank you.
Thank you so much for sharing the log for 150 epochs.
Can you share 300 epochs?
class Joiner(nn.Sequential):
def init(self, backbone, position_embedding):
super().init(backbone, position_embedding)
def forward(self, tensor_list):
xs = self[0](tensor_list)
out = []
pos = []
for name, x in xs.items():
out.append(x)
# position encoding
pos.append(self[1](x).to(x.tensors.dtype))
return out, pos
What is the meaning of self[0] and self[1] here?
Many thanks.
I am trying to train the resnet50 model with one more class on top of the coco dataset. So I loaded the pretrained model like this -
model = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
and then i am unfreezing class_embed
and bbox_embed
for param in model.parameters():
param.requires_grad = False
classifier_class = nn.Sequential(nn.Linear(256,128),
nn.ReLU(),
nn.Dropout(p=0.2),
nn.Linear(128,93),
#nn.LogSoftmax(dim=1)
)
model.class_embed = classifier_class
classifier_bbox = nn.Sequential(nn.Linear(256,256),
nn.ReLU(),
nn.Dropout(p=0.2),
nn.Linear(256,256),
nn.ReLU(),
nn.Dropout(p=0.2),
nn.Linear(256,4),
nn.Sigmoid()
)
And I am using build_model
to get my criterion
and postprocesses
dummy, criterion, postprocessors = build_model(data_args)
Optimizer:
optimizer = torch.optim.Adam([{'params': model.class_embed.parameters()},
{'params': model.bbox_embed.parameters()}],
lr=data_args.lr, weight_decay=data_args.weight_decay)
Now I am loading only 'skyscraper' class using data_loader.
Unfortunately I am getting this error:
RuntimeError: weight tensor should be defined either for all or no classes at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:27
Here is the entire code:
https://colab.research.google.com/drive/1L3PLEiOVICgmjyK6JIDjEBFmraVEQYhz?usp=sharing
how to realize the transformer decoder paraller decoding
Very impressed with the all new innovative architecture in Detr!
Can you clarify recommendations for training on a custom dataset?
Should we build a model similar to demo and train, or better to use and fine tune a full coco pretrained model and adjust the linear layer to desired class count?
Thanks in advance for any input.
First of all, thanks for presenting a great paper. It's one of the most innovative papers I've read recently in computer vision and sure many works will follow.
I was interested in the mAP performance with nms in Fig.4.
Do stronger nms (like nms=0.5) have similar mAP performance curves?
Maybe the mAP gets worse since more positive predictions will be deleted..
In EfficientDet there was an improvement gain switching to Distance IoU, and suspect the same would hold for DETR with either Distance or Complete IoU.
By incorporating DIoU and CIoU losses into state-of-the-art object detection algorithms, e.g., YOLO v3, SSD and Faster RCNN, we achieve notable performance gains in terms of not only IoU metric but also GIoU metric.
we consider three geometric factors, i.e., overlap area, normalized central point distance and aspect ratio, which are crucial for measuring bounding box regression in object detection and instance segmentation.
The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted โn-norm loss and IoU-based loss.
Here's the paper discussing CIoU:
https://arxiv.org/abs/2005.03572
and Distance IoU:
https://arxiv.org/abs/1911.08287
and most importantly code:
https://github.com/Zzh-tju/CIoU
Hi, just a question on speed. The reported inference speeds are on which GPU? Tesla V100 or something less powerful? Thanks
Thanks for the amazing work!
I noticed the training time for DETR is 3 days with multi GPUs. I believe this setting is too hard to achieve for most end users.
I would like to know in your study did you try transfer learning in DETR? if so, would you provide related module on that?
tl;dr:
Thanks for the amazing work!
I'm very intrigued by the simplicity of DETR, especially the inference demo code. I was wondering how the demo model was trained, since you guys do provide pretrained weights for it. I'm asking this particularly because the inference code says that it only supports a batch size of 1. Does the batch size have to be 1 during training? Also, why does it have to be 1, either in training or inference?
Thank you so much for your time!
Describe what you want to do, including:
python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --num_queries 2000 --pre_norm --masks --output_dir output --eval --num_workers 4 --enc_layers 2 --dec_layers 2 --dim_feedforward 512 --backbone resnet18 --hidden_dim 128
Could you, please, help me run with resnet18? Any advice regarding optimal parameters to start for my task are appreciated!
Traceback:
File "main.py", line 248, in <module>
main(args)
File "main.py", line 186, in main
data_loader_val, base_ds, device, args.output_dir)
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
return func(*args, **kwargs)
File "/home/user/Documents/repos/detr/engine.py", line 92, in evaluate
outputs = model(samples)
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/Documents/repos/detr/models/segmentation.py", line 57, in forward
seg_masks = self.mask_head(src_proj, bbox_mask, [features[2].tensors, features[1].tensors, features[0].tensors])
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/Documents/repos/detr/models/segmentation.py", line 110, in forward
cur_fpn = self.adapter1(fpns[0])
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
result = self.forward(*input, **kwargs)
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 349, in forward
return self._conv_forward(input, self.weight)
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 346, in _conv_forward
self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 1024, 1, 1], expected input[2, 256, 50, 50] to have 1024 channels, but got 256 channels instead
Traceback (most recent call last):
File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
main()
File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
cmd=cmd)
In your README, it seems that the final model is trained by 300 epochs with a learning rate drop at 200 epochs.
However, in the following link, it seems like 42.0 is trained by 500 epochs with a learning rate drop at 400 epochs.
Can you clarify?
python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py
--lr_drop 400 --epochs 500
--coco_path /path/to/coco
https://gist.github.com/szagoruyko/9c9ebb8455610958f7deaa27845d7918
Hi, I am trying to run the DETR on my local machine. But both training process gets stuck at the beginning stage, as follows
I am using Pytorch 1.5 and torchvision 0.6. And the faster-rcnn model can be trained on the coco dataset wihtout the problem.
I am wondering the problem may come from the Dataloader part. Could you provide some hints on this ? Thanks!
Hi and thanks for the code!
When I try to load detr it gives :
from detr.models import detr
7 from torch import nn
8
----> 9 from util import box_ops
10 from util.misc import (NestedTensor, accuracy, get_world_size, interpolate,
11 is_dist_avail_and_initialized)
ModuleNotFoundError: No module named 'util'
or
from detr.engine import evaluate
10 import torch
11
---> 12 import util.misc as utils
13 from datasets.coco_eval import CocoEvaluator
14 from datasets.panoptic_eval import PanopticEvaluator
ModuleNotFoundError: No module named 'util'
One idea to jump DETR's impressive results might be to swap in the new ResNeST50 backbone (released last month by Amazon AI and UCDavis).
In all of the architectures they tested, it immediately provided 3-4% AP boost for Coco.
This improvement also helps downstream tasks including object detection, instance segmentation and semantic segmentation. For example, by simply replace the ResNet-50 backbone with ResNeSt-50, we improve the mAP of Faster-RCNN on MS-COCO from 39.3% to 42.3% and the mIoU for DeeplabV3 on ADE20K from 42.1% to 45.1%.
It should plug and play right in. I've been using it for classification work and was a nice improvement there, and the concept of better global context maps to the improvements DETR is providing for the head architecture.
https://arxiv.org/abs/2004.08955v1
https://github.com/zhanghang1989/ResNeSt
(I plan to test this out on my own datasets, but will not have time to train it on Coco proper and I think conceptually it's a great match for DETR regardless).
Hi, thank you so much for your work!
I have one question about the self-attention implementation. In the paper Attention is All You Need, the residual connection is made upon input embeddings + positional encoding
as shown in the figure below.
In the paper, the figure seems to match above as shown in the paper below.
However, in the code, it looks to me that the residual connection is made upon input embeddings
only (the src
), also see figure below. Is this a mistake or there is a reason for such modification? Thank you!
I get an AccessDenied Error when I try to download the panoptic model .pth files linked in the readme. Url for normal models work fine.
Hi, great work.
I read your code but I found you set 'num_classes=91' for coco detection.
But coco detection has 80 categories. May you explain why you set this=91?
Thanks very much~
I'm trying to get my custom dataset working but I can't get past 8 or so images via get_item and it keeps asserting that my bboxes are bad..I pull that one, it flags the next one, I pull that one, it flags the next...
From reading the code it wants to check that x1 and y1 are larger than x0 and y0 which is a great check.
55 assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
But it keeps flagging images that when I unwind from coco format should be fine...thus any insights? I was not able to print the boxes1 (200,4) and boxes 2 (12,4) tensors for some reason so I couldn't see into what it was actually calculating for the results (threw an odd gpu issue with 'formatting').
Example it flagged this image as being bad - here's the JSON for it in coco format, 6 classes. 1 box will surround all the other 5 objects btw as it's a malaria reader, so not sure if that box encompassing other boxes is really the issue?):
{"id": "c33c3539-8bd1-48e0-8065-831709e5e64d", "image_id": 3091210, "category_id": 2905442, "segmentation": null, "area": 0, "bbox": **[499, 121, 177, 80]**, "iscrowd": 0},
{"id": "0023d71e-e1e9-4862-a0b8-6e2bc3982b3b", "image_id": 3091210, "category_id": 2905422, "segmentation": null, "area": 0, "bbox": **[492, 523, 187, 163]**, "iscrowd": 0},
{"id": "726fdfbc-3801-409d-ab75-ccf951e74316", "image_id": 3091210, "category_id": 2905421, "segmentation": null, "area": 0, "bbox": **[496, 428, 181, 93],** "iscrowd": 0},
{"id": "2bf85a8e-108d-4875-b0f5-47c8e5cb13e0", "image_id": 3091210, "category_id": 2905420, "segmentation": null, "area": 0, "bbox": **[494, 272, 186, 169]**, "iscrowd": 0},
{"id": "8669c13a-1205-4e94-a645-18e2ffa491d0", "image_id": 3091210, "category_id": 2905419, "segmentation": null, "area": 0, "bbox": **[489, 127, 193, 557]**, "iscrowd": 0},
{"id": "d9619859-e0ef-4632-ad51-7237a5760a5e", "image_id": 3091210, "category_id": 2905418, "segmentation": null, "area": 0, "bbox": **[495, 203, 182, 73]**, "iscrowd": 0},
And as a check for me, here's coco format:
The COCO bounding box format is [top left x position, top left y position, width, height].
All the bboxes which it flags, are positive numbers for width and height, so the x1 and y1 must be larger than x0 and y0 - only a negative number added to the original x0 or y0 could result in it being smaller...so I'm unclear what it is asserting on or for.
But it asserts here:
~/detr/util/box_ops.py in generalized_box_iou(boxes1, boxes2)
53 #print(boxes1)
54 #print(boxes2)
---> 55 assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
56 assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
57 iou, union = box_iou(boxes1, boxes2)
I've removed 15+ images trying to get it to actually train but just keeps flagging more and more as invalid bboxes. I remove one image, then it asserts on the next one...and in reviewing the ones it flags vs the ones it lets pass, I don't see any real difference. (I have trained with this same dataset on EffficientDet so I know the dataset is reasonable).
Thus any help into debugging, or what might be awry would be appreciated.
Thanks!
Hi, I didn't see nms in postprocess. Why you don't you nms and could you please explain how does postprocess work?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.