facebookresearch / detr Goto Github PK

End-to-End Object Detection with Transformers

License: Apache License 2.0

Python 99.81% Dockerfile 0.19%

detr's Issues

Unable to evaluate model

Environment
pytorch 1.3.1
torchvision 0.4.2

I am able to train the model successfully. However, the following mistake appear when I run the evaluation independetly.

srun --gres gpu:1 python main.py --batch_size 2 --no_aux_loss --eval --resume https://dl.fbaipublicfiles.com/detr/detr-r50-e632da11.pth --coco_path ../../dataset/

Traceback (most recent call last):
File "main.py", line 248, in
main(args)
File "main.py", line 106, in main
utils.init_distributed_mode(args)
File "/mnt/lustre/chenyuntao1/homes/gaopeng/mask_detr/detr/util/misc.py", line 416, in init_distributed_mode
world_size=args.world_size, rank=args.rank)
File "/mnt/lustre/chenyuntao1/homes/gaopeng/anaconda3/envs/detr/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 400, in init_process_group
store, rank, world_size = next(rendezvous(url))
File "/mnt/lustre/chenyuntao1/homes/gaopeng/anaconda3/envs/detr/lib/python3.7/site-packages/torch/distributed/rendezvous.py", line 130, in _env_rendezvous_handler
raise _env_error("MASTER_ADDR")
ValueError: Error initializing torch.distributed using env:// rendezvous: environment variable MASTER_ADDR expected, but not set

Generate attention decoder heatmap images for model feedback as shown in DETR paper?

❓ How to do something using DETR

Describe what you want to do, including:

what inputs you will provide, if any:
what outputs you are expecting:

Will code be added/released to generate the attention decoder heatmaps like in the paper? (i.e. the zebra and elephant images).
I've found heatmaps to be very useful for helping with training and understanding model performance, so hoping the code used in the paper will be added so we can generate these for our DETR models?

NOTE:

Only general answers are provided.
If you want to ask about "why X did not work", please use the
Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to DETR, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.

Any plan about data augmentation

❓ Is there any plan about data augmentation

Hi, DETR teams,

According to the implementation in ultralytics/yolov3#310 (comment), and a similar are discussed in AlexeyAB/darknet#3114 (comment). It seems that such augmentation such as the mosaic techniques is helpful to detect smaller size object. I quote Jocher's conclusions below.

The smaller cars are detected earlier with less blinking and cars of all sizes show better behaved bounding boxes.

I check make_coco_transforms of this repo, and visualized the augmented images and labels in
VOC dataset (use the same config of make_coco_transforms here). Since the utilization of RandomSizeCrop, all the labels associated to an image may be cropped. (So this repo supports training with no targets in an image? 🤔️)

I want to know whether is there some plan about data augmentation.

Thank you!

Training with 16 GPU need to double the lr and lr_backbone

Sorry for the naive question.

However, this is never made clear in the paper.

When scaling 8 GPU to 16 GPU, I guess we need to double the learning rate accordingly?

Help on Training a Small custom data set

Hi Team,
I am.working on a custom dataset , which has 7 classes and I have 1500 images , I wanna train on DeTR, help me out how should I train the model

Thanks in advance

Custom Object Detection Training

❓ How to do something using DETR

Describe what you want to do, including:

what inputs you will provide, if any:
what outputs you are expecting:

NOTE:

Only general answers are provided.
If you want to ask about "why X did not work", please use the
Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to DETR, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.

How to train a new model for Custom object detection in google colab.

panoptic segmentation visualization

Very nice repo!

I want to do a simple inference with the panoptic segmentation model. How can I visualize the output of the panoptic model after the "panoptic post-processing"?

Thanks ;)

inference one image

hi, how to inference one image with detr?

Demo can be exported to ONNX but other pretrained models cannot

Instructions To Reproduce the Issue:

run torch.onnx.export on the demo model provided here and on a model from torchhub. The demo model is successfully exported while other models fail.

#works
torch.onnx.export(detr_demo, sample_input, 'detr_demo.onnx', opset_version = 10)

#does not work
detr = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)
detr.eval()
torch.onnx.export(detr, sample_input, 'detr.onnx', opset_version = 10)

see full code here

The error log is as follows:

/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:59: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
/usr/local/lib/python3.6/dist-packages/ipykernel_launcher.py:60: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
/usr/local/lib/python3.6/dist-packages/torch/tensor.py:467: RuntimeWarning: Iterating over a tensor might cause the trace to be incorrect. Passing a tensor of different shape won't change the number of iterations executed (and might lead to errors or silently give incorrect results).
  'incorrect results).', category=RuntimeWarning)
/root/.cache/torch/hub/facebookresearch_detr_master/util/misc.py:294: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  batch_shape = (len(tensor_list),) + max_size
/root/.cache/torch/hub/facebookresearch_detr_master/util/misc.py:301: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  pad_img[: img.shape[0], : img.shape[1], : img.shape[2]].copy_(img)
/root/.cache/torch/hub/facebookresearch_detr_master/util/misc.py:302: TracerWarning: Converting a tensor to a Python index might cause the trace to be incorrect. We can't record the data flow of Python values, so this value will be treated as a constant in the future. This means that the trace might not generalize to other inputs!
  m[: img.shape[1], :img.shape[2]] = False

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

<ipython-input-19-968e97398387> in <module>()
     11 
     12 torch.onnx.export(detr_demo, sample_input, 'detr_demo.onnx', opset_version = 10)
---> 13 torch.onnx.export(detr, sample_input, 'detr.onnx', opset_version = 10)

8 frames

/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
    166                         do_constant_folding, example_outputs,
    167                         strip_doc_string, dynamic_axes, keep_initializers_as_inputs,
--> 168                         custom_opsets, enable_onnx_checker, use_external_data_format)
    169 
    170 

/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in export(model, args, f, export_params, verbose, training, input_names, output_names, aten, export_raw_ir, operator_export_type, opset_version, _retain_param_name, do_constant_folding, example_outputs, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, custom_opsets, enable_onnx_checker, use_external_data_format)
     67             dynamic_axes=dynamic_axes, keep_initializers_as_inputs=keep_initializers_as_inputs,
     68             custom_opsets=custom_opsets, enable_onnx_checker=enable_onnx_checker,
---> 69             use_external_data_format=use_external_data_format)
     70 
     71 

/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in _export(model, args, f, export_params, verbose, training, input_names, output_names, operator_export_type, export_type, example_outputs, propagate, opset_version, _retain_param_name, do_constant_folding, strip_doc_string, dynamic_axes, keep_initializers_as_inputs, fixed_batch_size, custom_opsets, add_node_names, enable_onnx_checker, use_external_data_format)
    486                                                         example_outputs, propagate,
    487                                                         _retain_param_name, val_do_constant_folding,
--> 488                                                         fixed_batch_size=fixed_batch_size)
    489 
    490         # TODO: Don't allocate a in-memory string for the protobuf

/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in _model_to_graph(model, args, verbose, training, input_names, output_names, operator_export_type, example_outputs, propagate, _retain_param_name, do_constant_folding, _disable_torch_constant_prop, fixed_batch_size)
    349     graph = _optimize_graph(graph, operator_export_type,
    350                             _disable_torch_constant_prop=_disable_torch_constant_prop,
--> 351                             fixed_batch_size=fixed_batch_size, params_dict=params_dict)
    352 
    353     if isinstance(model, torch.jit.ScriptModule) or isinstance(model, torch.jit.ScriptFunction):

/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in _optimize_graph(graph, operator_export_type, _disable_torch_constant_prop, fixed_batch_size, params_dict)
    152         torch._C._jit_pass_erase_number_types(graph)
    153 
--> 154         graph = torch._C._jit_pass_onnx(graph, operator_export_type)
    155         torch._C._jit_pass_lint(graph)
    156 

/usr/local/lib/python3.6/dist-packages/torch/onnx/__init__.py in _run_symbolic_function(*args, **kwargs)
    197 def _run_symbolic_function(*args, **kwargs):
    198     from torch.onnx import utils
--> 199     return utils._run_symbolic_function(*args, **kwargs)
    200 
    201 

/usr/local/lib/python3.6/dist-packages/torch/onnx/utils.py in _run_symbolic_function(g, n, inputs, env, operator_export_type)
    738                                   .format(op_name, opset_version, op_name))
    739                 op_fn = sym_registry.get_registered_op(op_name, '', opset_version)
--> 740                 return op_fn(g, *inputs, **attrs)
    741 
    742         elif ns == "prim":

/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_helper.py in wrapper(g, *args)
    127             assert len(arg_descriptors) >= len(args)
    128             args = [_parse_arg(arg, arg_desc) for arg, arg_desc in zip(args, arg_descriptors)]
--> 129             return fn(g, *args)
    130         # In Python 2 functools.wraps chokes on partially applied functions, so we need this as a workaround
    131         try:

/usr/local/lib/python3.6/dist-packages/torch/onnx/symbolic_opset9.py in ones(g, sizes, dtype, layout, device, pin_memory)
   1409         dtype = 6  # float
   1410     return g.op("ConstantOfShape", sizes,
-> 1411                 value_t=torch.tensor([1], dtype=sym_help.scalar_type_to_pytorch_type[dtype]))
   1412 
   1413 

IndexError: list index out of range

Expected behavior:

It should be possible to export a model from torchhub similar to the demo model.

Environment:

Google colab

Implementation not consistent with the original paper?

Hi.
In original paper, it mentioned in Sec. 4 that

To optimize for AP, we override the prediction of these slots with the second highest scoring class, using the corresponding con dence. This improves AP by 2 points compared to filtering out empty slots.

But I didn't see any corresponding code in this repo. Did I miss something or it is not implement here?

Thank you.

Optimize the transformers

@alexholdenmiller @leaderj1001 @alcinos @snyxan
Thank you for your hard work,

Seeing the transformers learn to understand instances was truly amazing, your work is amazing.
Further research into optimization is vital in-order to make training and inferencing feasible for the
average person.

Is there a plan for optimizing detr, pruning, distilling, searching for better students, etc... ?

https://github.com/mit-han-lab/hardware-aware-transformers
https://github.com/mit-han-lab/gan-compression
http://news.mit.edu/2020/foolproof-way-shrink-deep-learning-models-0430

Integration with Detectron2

First of all, excellent work!

I know that probably the integration with detectron2 is not automatic since there are differences in the training architecture compared with the default detectron2 procedure. But there are any plans to integrate DETR in detectron2?

Thank you!

use plot_logs function from plot_utils - errors if you only have a single log.txt?

❓ How to do something using DETR

Describe what you want to do, including:

what inputs you will provide, if any:
log files from training
what outputs you are expecting:
plots of the various losses

I'm trying to make use of the plot_logs function in /util/plot_utils.py
In Jupyter, I'm passing a Pathlib to the dir with my log.txt...but that immediately generates a TypeError: 'PosixPath' object is not iterable...which makes sense, I'm just passing in the dir of the single log.txt, so nothing to iterate.
I changed the code to not iterate and just read the single file into a df and ultimately got the graphs to print...but clearly I'm not calling it correctly?
Is there a preferred way to call/plot a single log file without removing all the list comprehensions?

NOTE:

Only general answers are provided.
If you want to ask about "why X did not work", please use the
Unexpected behaviors issue template.
About how to implement new models / new dataloader / new training logic, etc., check documentation first.
We do not answer general machine learning / computer vision questions that are not specific to DETR, such as how a model works, how to improve your training/make it converge, or what algorithm/methods can be used to achieve X.

High class error

Hi, so I've tried training with a personal dataset and COCO2017 for a sanity check. My class_error stays at 100.00 for most of the training, with very few 75 - 100 errors. I average around 99 for my class errors after a couple of epochs for both training and validation scores (both my dataset and COCO2017). Wanted to know if anyone has experienced similar issues?

To add, I only changed num_queries flag for my personal dataset. COCO2017 kept its original arguments. My loss does seem to drop, however. Any direction would be greatly appreciated!

please change "--output-dir" to "--output_dir" in main.py.

How to train 16 GPUs using slurm?

Can you share the code to run 16 GPUS over 2 nodes using slurm?

What is the class id for background class?

I have a quick question: is the background class id 0 or 91? (DETR used 91 COCO categories to train)
It seems the targets object returned by dataloader uses 1-91 for all the object categories, but the loss_labels function used 91 instead of 0 for background. I am not sure if I missed something.

Thanks.

Using the model for image classification only tasks

How can I leverage this architecture for image calssifcation tasks? I tried using the example in colab notebook but had trouble with batch sizes. The example is intended for batch_size = 1 but gives errors when using a larger batch size. How can I overcome this?

Run panoptic segmentation

Great paper and repo btw, congrats!

I wanted to do a simple single image inference with the panoptic segmentation model. I archived that in a very 'hacky' way by editing the main.sh file (Can be seen here).

Are you planing to release a demo notebook like the one for object detection or uploading the .pth files to torch.Hub?

Unable to understand positional encoding and masks.

Can someone please explain me how you calculated the positional encoding?
I know what positional encoding is, but models.positional_encoding.py is but overwhelming. I want to know what are considered as positional encoding while working with images. Are these calculated for feature maps or somewhat else?
How do you calculate masks when using images in transformers?
I know what masks are, but how do we calculate these when dealing with images?

I found no answers to these questions anywhere so posting it here.

Landmark regression

Have you experimented this with landmarks/joints regression other then bounding boxes?
Some of the methods mentioned in the paper like "Object as points" was applied also on these joint tasks.

Strage behavior potententially bug.

I am trying to adopt this repository to OCR task and facing same dilemma

While training you have 3 different sizes of image encoded in dataset

Actual tensor size
filed 'size' - which means what?
field 'orig_size' - which I believe means original size of an image in the dataset

So if you try to print boxes of a dataset of an element for batch size bigger than 1 (I check it with 5)
You will get behaviour there for the same picture due to random batch sampling will have different box coordinates

Look below. This function will print you boxes on image correctly but only in case of batch_size=1 or if all pictures in your dataset are the same size or in case if you use for W, H scaling from target["size"] which is wrong.

# img = (3, H, W) tensor from batch with the samme H and W
# target - labels for this particular image
def showImageFromBatch(img, target):
    from PIL import Image, ImageDraw, ImageFont

    draw = ImageDraw.Draw(img)
    boxes = target['boxes']
    cl = target['labels']

    if 1:#boxes.max() <= 1:
        boxes = box_cxcywh_to_xyxy(boxes)

        print('Image:', (img.height, img.width), target['size'], target['orig_size'])


        H, W = target['size'] <<< Works well only with that
        W, H = img.width, img.height <<< But must works with this!!!

        boxes[:, 0::2] *= W
        boxes[:, 1::2] *= H

    for i in range(len(boxes)):
        x1, y1, x2, y2 = boxes[i]
        draw.rectangle((x1, y1, x2, y2), outline=(0, 255, 0) if cl[i] >= 0 else (0, 0, 0), width=3)
        draw.text((x1, y1), str(cl[i].item()), (0, 255, 0) if cl[i] >= 0 else (0, 0, 0),
                  font=ImageFont.truetype("DejaVuSansMono.ttf", 20))
    img.show()

Please clarify this situation. Thank you in advance.

AttributeError: module 'torch.distributed' has no attribute 'init_process_group'

I'm trying to run the example as-is, and i'm running into this issue. I did have to adjust the number of gpus because the VM I'm working on only has 1. I'm also working on a Windows 10 machine with pytorch version 1.5.0, CUDA version 10.1, and CUDA compiler driver v10.0.130.

| distributed init (rank 0): env://
Traceback (most recent call last):
  File "main.py", line 248, in <module>
    main(args)
  File "main.py", line 106, in main
    utils.init_distributed_mode(args)
  File "C:\Users\-user-\Documents\Projects\detr\util\misc.py", line 374, in init_distributed_mode
    torch.distributed.init_process_group(backend=args.dist_backend, init_method=args.dist_url,
AttributeError: module 'torch.distributed' has no attribute 'init_process_group'
Traceback (most recent call last):
  File "C:\Anaconda\envs\detr\lib\runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "C:\Anaconda\envs\detr\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "C:\Anaconda\envs\detr\lib\site-packages\torch\distributed\launch.py", line 263, in <module>
    main()
  File "C:\Anaconda\envs\detr\lib\site-packages\torch\distributed\launch.py", line 258, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['C:\\Anaconda\\envs\\detr\\python.exe', '-u', 'main.py', '--coco_path', 'F:/coco-data']' returned non-zero exit status 1.```

Learning loss coeffs

Have you experimented any techniques to learning losses coeffs (from Multi-task learning literature) hard-coded at https://github.com/facebookresearch/detr/blob/master/main.py#L73?

Edit:
E.g. Just to make a recent example https://arxiv.org/abs/2001.02223

Problem with aspect rations in colab demo

Hi!

The colab demo doesn't seem to be working with images with wide aspect ratio (e.g. 16:9). The resulting bounding boxes are shifted to the right a bit and sometimes the inference crashes with a RuntimeError. Please see this colab notebook.
The bounding boxes look good after I change T.Resize(800) to something explicit, like T.Resize((800,600)). But I'm not sure if that's the correct way of addressing that (see the giraffe detections in my notebook). What would be a correct way of dealing with different aspect ratios?

The only things I changed are the urls of the input images and the transformation pipeline (in the second case) 🙂

the paper is unavailable

will it make any sense to use zero v in the first decoder layer?

As in your code, tgt of the decoderlayer was firstly assigned with zeros, and use these zeros as v to calculate a new ouput with qkv attention operation, take the pre-norm forward part for example:

    def forward_pre(self, tgt, memory,
                    tgt_mask: Optional[Tensor] = None,
                    memory_mask: Optional[Tensor] = None,
                    tgt_key_padding_mask: Optional[Tensor] = None,
                    memory_key_padding_mask: Optional[Tensor] = None,
                    pos: Optional[Tensor] = None,
                    query_pos: Optional[Tensor] = None):
        tgt2 = self.norm1(tgt)
        q = k = self.with_pos_embed(tgt2, query_pos)
        tgt2 = self.self_attn(q, k, value=tgt2, attn_mask=tgt_mask,
                              key_padding_mask=tgt_key_padding_mask)[0]
        tgt = tgt + self.dropout1(tgt2)
        tgt2 = self.norm2(tgt)
        tgt2 = self.multihead_attn(query=self.with_pos_embed(tgt2, query_pos),
                                   key=self.with_pos_embed(memory, pos),
                                   value=memory, attn_mask=memory_mask,
                                   key_padding_mask=memory_key_padding_mask)[0]
        tgt = tgt + self.dropout2(tgt2)
        tgt2 = self.norm3(tgt)
        tgt2 = self.linear2(self.dropout(self.activation(self.linear1(tgt2))))
        tgt = tgt + self.dropout3(tgt2)
        return tgt

I mean, if it was the first decoderlayer, tgt was token-wisely zero, then tgt2 will be token-wisely same after the first layernorm, how will that make any sense to get weighed output from this tgt2? No matter what the q and k is, nothing but a featureless bias will be learned I think.

Question for training own dataset

Hi,
Thank you for your great job.
I have a question with training own dataset. The result for eval is always zeros like that:

**(base) [detr]$ python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --lr 1e-3 --batch_size 4 --epochs 10  --coco_path datasets/shape/coco**
| distributed init (rank 0): env://
git:
  sha: 0af41930d1b6c2244e33bbef76dff6c537dd53c0, status: clean, branch: master

**Namespace(aux_loss=True, backbone='resnet50', batch_size=4, bbox_loss_coef=5, clip_max_norm=0.1, coco_panoptic_path=None, coco_path='datasets/shape/coco', dataset_file='coco', dec_layers=6, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_backend='nccl', dist_url='env://', distributed=True, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=10, eval=False, frozen_weights=None, giou_loss_coef=2, gpu=0, hidden_dim=256, lr=0.001, lr_backbone=1e-05, lr_drop=200, mask_loss_coef=1, masks=False, nheads=8, num_queries=100, num_workers=2, output_dir='', position_embedding='sine', pre_norm=False, rank=0, remove_difficult=False, resume='', seed=42, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, start_epoch=0, weight_decay=0.0001, world_size=1)**
number of params: 41302368
loading annotations into memory...
Done (t=0.02s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Start training

Training log

Epoch: [0]  [  0/225]  eta: 0:02:30  lr: 0.001000  class_error: 100.00  loss: 75.9316 (75.9316)  loss_ce: 4.8402 (4.8402)  loss_bbox: 5.6168 (5.6168)  loss_giou: 2.2340 (2.2340)  loss_ce_0: 4.4001 (4.4001)  loss_bbox_0: 5.4950 (5.4950)  loss_giou_0: 2.2311 (2.2311)  loss_ce_1: 4.8179 (4.8179)  loss_bbox_1: 5.6163 (5.6163)  loss_giou_1: 2.2393 (2.2393)  loss_ce_2: 4.7843 (4.7843)  loss_bbox_2: 5.6247 (5.6247)  loss_giou_2: 2.2343 (2.2343)  loss_ce_3: 4.9645 (4.9645)  loss_bbox_3: 5.6222 (5.6222)  loss_giou_3: 2.2467 (2.2467)  loss_ce_4: 4.9737 (4.9737)  loss_bbox_4: 5.7800 (5.7800)  loss_giou_4: 2.2105 (2.2105)  loss_ce_unscaled: 4.8402 (4.8402)  class_error_unscaled: 100.0000 (100.0000)  loss_bbox_unscaled: 1.1234 (1.1234)  loss_giou_unscaled: 1.1170 (1.1170)  cardinality_error_unscaled: 96.7500 (96.7500)  loss_ce_0_unscaled: 4.4001 (4.4001)  loss_bbox_0_unscaled: 1.0990 (1.0990)  loss_giou_0_unscaled: 1.1155 (1.1155)  cardinality_error_0_unscaled: 96.7500 (96.7500)  loss_ce_1_unscaled: 4.8179 (4.8179)  loss_bbox_1_unscaled: 1.1233 (1.1233)  loss_giou_1_unscaled: 1.1197 (1.1197)  cardinality_error_1_unscaled: 96.7500 (96.7500)  loss_ce_2_unscaled: 4.7843 (4.7843)  loss_bbox_2_unscaled: 1.1249 (1.1249)  loss_giou_2_unscaled: 1.1172 (1.1172)  cardinality_error_2_unscaled: 96.7500 (96.7500)  loss_ce_3_unscaled: 4.9645 (4.9645)  loss_bbox_3_unscaled: 1.1244 (1.1244)  loss_giou_3_unscaled: 1.1234 (1.1234)  cardinality_error_3_unscaled: 96.7500 (96.7500)  loss_ce_4_unscaled: 4.9737 (4.9737)  loss_bbox_4_unscaled: 1.1560 (1.1560)  loss_giou_4_unscaled: 1.1053 (1.1053)  cardinality_error_4_unscaled: 96.7500 (96.7500)  time: 0.6674  data: 0.2966  max mem: 2899
Epoch: [0]  [ 10/225]  eta: 0:01:16  lr: 0.001000  class_error: 100.00  loss: 40.5536 (44.2869)  loss_ce: 0.8244 (1.3583)  loss_bbox: 3.0293 (3.3114)  loss_giou: 2.7750 (2.7253)  loss_ce_0: 0.8599 (1.2429)  loss_bbox_0: 3.0818 (3.3239)  loss_giou_0: 2.7982 (2.7456)  loss_ce_1: 0.8457 (1.3100)  loss_bbox_1: 3.1305 (3.3230)  loss_giou_1: 2.7961 (2.7431)  loss_ce_2: 0.8787 (1.3171)  loss_bbox_2: 3.0785 (3.3198)  loss_giou_2: 2.8003 (2.7389)  loss_ce_3: 0.8455 (1.3657)  loss_bbox_3: 3.0552 (3.3092)  loss_giou_3: 2.7829 (2.7311)  loss_ce_4: 0.8526 (1.3903)  loss_bbox_4: 3.0473 (3.3076)  loss_giou_4: 2.7943 (2.7239)  loss_ce_unscaled: 0.8244 (1.3583)  class_error_unscaled: 100.0000 (93.9394)  loss_bbox_unscaled: 0.6059 (0.6623)  loss_giou_unscaled: 1.3875 (1.3626)  cardinality_error_unscaled: 3.5000 (19.9773)  loss_ce_0_unscaled: 0.8599 (1.2429)  loss_bbox_0_unscaled: 0.6164 (0.6648)  loss_giou_0_unscaled: 1.3991 (1.3728)  cardinality_error_0_unscaled: 3.0000 (11.3636)  loss_ce_1_unscaled: 0.8457 (1.3100)  loss_bbox_1_unscaled: 0.6261 (0.6646)  loss_giou_1_unscaled: 1.3981 (1.3716)  cardinality_error_1_unscaled: 3.5000 (12.5909)  loss_ce_2_unscaled: 0.8787 (1.3171)  loss_bbox_2_unscaled: 0.6157 (0.6640)  loss_giou_2_unscaled: 1.4001 (1.3694)  cardinality_error_2_unscaled: 3.5000 (19.2500)  loss_ce_3_unscaled: 0.8455 (1.3657)  loss_bbox_3_unscaled: 0.6110 (0.6618)  loss_giou_3_unscaled: 1.3915 (1.3656)  cardinality_error_3_unscaled: 3.5000 (19.9773)  loss_ce_4_unscaled: 0.8526 (1.3903)  loss_bbox_4_unscaled: 0.6095 (0.6615)  loss_giou_4_unscaled: 1.3971 (1.3619)  cardinality_error_4_unscaled: 3.5000 (19.9773)  time: 0.3566  data: 0.0417  max mem: 4100
Epoch: [0]  [ 20/225]  eta: 0:01:11  lr: 0.001000  class_error: 100.00  loss: 38.5593 (40.8082)  loss_ce: 0.7449 (1.0389)  loss_bbox: 2.5801 (2.9656)  loss_giou: 2.8444 (2.8061)  loss_ce_0: 0.7395 (0.9770)  loss_bbox_0: 2.5890 (2.9779)  loss_giou_0: 2.8093 (2.8087)  loss_ce_1: 0.7514 (1.0102)  loss_bbox_1: 2.5790 (2.9720)  loss_giou_1: 2.8446 (2.8075)  loss_ce_2: 0.7420 (1.0158)  loss_bbox_2: 2.5879 (2.9603)  loss_giou_2: 2.8726 (2.8131)  loss_ce_3: 0.7451 (1.0461)  loss_bbox_3: 2.5694 (2.9725)  loss_giou_3: 2.8119 (2.8037)  loss_ce_4: 0.7562 (1.0560)  loss_bbox_4: 2.5707 (2.9682)  loss_giou_4: 2.8531 (2.8084)  loss_ce_unscaled: 0.7449 (1.0389)  class_error_unscaled: 100.0000 (96.8254)  loss_bbox_unscaled: 0.5160 (0.5931)  loss_giou_unscaled: 1.4222 (1.4031)  cardinality_error_unscaled: 2.7500 (11.7857)  loss_ce_0_unscaled: 0.7395 (0.9770)  loss_bbox_0_unscaled: 0.5178 (0.5956)  loss_giou_0_unscaled: 1.4046 (1.4043)  cardinality_error_0_unscaled: 2.7500 (7.2738)  loss_ce_1_unscaled: 0.7514 (1.0102)  loss_bbox_1_unscaled: 0.5158 (0.5944)  loss_giou_1_unscaled: 1.4223 (1.4037)  cardinality_error_1_unscaled: 2.7500 (7.9167)  loss_ce_2_unscaled: 0.7420 (1.0158)  loss_bbox_2_unscaled: 0.5176 (0.5921)  loss_giou_2_unscaled: 1.4363 (1.4066)  cardinality_error_2_unscaled: 2.7500 (11.4048)  loss_ce_3_unscaled: 0.7451 (1.0461)  loss_bbox_3_unscaled: 0.5139 (0.5945)  loss_giou_3_unscaled: 1.4059 (1.4019)  cardinality_error_3_unscaled: 2.7500 (11.7857)  loss_ce_4_unscaled: 0.7562 (1.0560)  loss_bbox_4_unscaled: 0.5141 (0.5936)  loss_giou_4_unscaled: 1.4265 (1.4042)  cardinality_error_4_unscaled: 2.7500 (11.7857)  time: 0.3333  data: 0.0155  max mem: 4763
Epoch: [0]  [ 30/225]  eta: 0:01:06  lr: 0.001000  class_error: 100.00  loss: 36.1775 (39.1106)  loss_ce: 0.6629 (0.9151)  loss_bbox: 2.5098 (2.8222)  loss_giou: 2.8313 (2.7752)  loss_ce_0: 0.6502 (0.8715)  loss_bbox_0: 2.5458 (2.8372)  loss_giou_0: 2.8088 (2.7885)  loss_ce_1: 0.6497 (0.8934)  loss_bbox_1: 2.5484 (2.8363)  loss_giou_1: 2.7917 (2.7771)  loss_ce_2: 0.6553 (0.9005)  loss_bbox_2: 2.4740 (2.8460)  loss_giou_2: 2.8586 (2.7895)  loss_ce_3: 0.6577 (0.9152)  loss_bbox_3: 2.5694 (2.8280)  loss_giou_3: 2.8119 (2.7908)  loss_ce_4: 0.6433 (0.9227)  loss_bbox_4: 2.5183 (2.8121)  loss_giou_4: 2.8352 (2.7893)  loss_ce_unscaled: 0.6629 (0.9151)  class_error_unscaled: 100.0000 (97.8495)  loss_bbox_unscaled: 0.5020 (0.5644)  loss_giou_unscaled: 1.4156 (1.3876)  cardinality_error_unscaled: 2.7500 (8.8468)  loss_ce_0_unscaled: 0.6502 (0.8715)  loss_bbox_0_unscaled: 0.5092 (0.5674)  loss_giou_0_unscaled: 1.4044 (1.3943)  cardinality_error_0_unscaled: 2.7500 (5.7903)  loss_ce_1_unscaled: 0.6497 (0.8934)  loss_bbox_1_unscaled: 0.5097 (0.5673)  loss_giou_1_unscaled: 1.3958 (1.3885)  cardinality_error_1_unscaled: 2.7500 (6.2258)  loss_ce_2_unscaled: 0.6553 (0.9005)  loss_bbox_2_unscaled: 0.4948 (0.5692)  loss_giou_2_unscaled: 1.4293 (1.3948)  cardinality_error_2_unscaled: 2.7500 (8.5887)  loss_ce_3_unscaled: 0.6577 (0.9152)  loss_bbox_3_unscaled: 0.5139 (0.5656)  loss_giou_3_unscaled: 1.4059 (1.3954)  cardinality_error_3_unscaled: 2.7500 (8.8468)  loss_ce_4_unscaled: 0.6433 (0.9227)  loss_bbox_4_unscaled: 0.5037 (0.5624)  loss_giou_4_unscaled: 1.4176 (1.3947)  cardinality_error_4_unscaled: 2.7500 (8.8468)  time: 0.3356  data: 0.0145  max mem: 5477
Epoch: [0]  [ 40/225]  eta: 0:01:03  lr: 0.001000  class_error: 100.00  loss: 32.6019 (36.9908)  loss_ce: 0.6956 (0.8742)  loss_bbox: 2.1999 (2.5978)  loss_giou: 2.5164 (2.6636)  loss_ce_0: 0.6839 (0.8402)  loss_bbox_0: 2.2153 (2.6161)  loss_giou_0: 2.4194 (2.6766)  loss_ce_1: 0.7205 (0.8589)  loss_bbox_1: 2.1482 (2.6176)  loss_giou_1: 2.4094 (2.6681)  loss_ce_2: 0.6871 (0.8615)  loss_bbox_2: 2.3961 (2.6406)  loss_giou_2: 2.5275 (2.6903)  loss_ce_3: 0.6944 (0.8725)  loss_bbox_3: 2.1964 (2.6045)  loss_giou_3: 2.5396 (2.6689)  loss_ce_4: 0.6961 (0.8798)  loss_bbox_4: 2.2572 (2.6533)  loss_giou_4: 2.6218 (2.7062)  loss_ce_unscaled: 0.6956 (0.8742)  class_error_unscaled: 100.0000 (98.3740)  loss_bbox_unscaled: 0.4400 (0.5196)  loss_giou_unscaled: 1.2582 (1.3318)  cardinality_error_unscaled: 2.7500 (7.5122)  loss_ce_0_unscaled: 0.6839 (0.8402)  loss_bbox_0_unscaled: 0.4431 (0.5232)  loss_giou_0_unscaled: 1.2097 (1.3383)  cardinality_error_0_unscaled: 2.7500 (5.2012)  loss_ce_1_unscaled: 0.7205 (0.8589)  loss_bbox_1_unscaled: 0.4296 (0.5235)  loss_giou_1_unscaled: 1.2047 (1.3341)  cardinality_error_1_unscaled: 2.7500 (5.5305)  loss_ce_2_unscaled: 0.6871 (0.8615)  loss_bbox_2_unscaled: 0.4792 (0.5281)  loss_giou_2_unscaled: 1.2638 (1.3451)  cardinality_error_2_unscaled: 2.7500 (7.3171)  loss_ce_3_unscaled: 0.6944 (0.8725)  loss_bbox_3_unscaled: 0.4393 (0.5209)  loss_giou_3_unscaled: 1.2698 (1.3345)  cardinality_error_3_unscaled: 2.7500 (7.5122)  loss_ce_4_unscaled: 0.6961 (0.8798)  loss_bbox_4_unscaled: 0.4514 (0.5307)  loss_giou_4_unscaled: 1.3109 (1.3531)  cardinality_error_4_unscaled: 2.7500 (7.5122)  time: 0.3336  data: 0.0145  max mem: 5477
Epoch: [0]  [ 50/225]  eta: 0:00:59  lr: 0.001000  class_error: 100.00  loss: 27.3266 (34.7739)  loss_ce: 0.7699 (0.8480)  loss_bbox: 1.7696 (2.3824)  loss_giou: 2.1386 (2.5234)  loss_ce_0: 0.7753 (0.8237)  loss_bbox_0: 1.7433 (2.4192)  loss_giou_0: 2.1408 (2.5614)  loss_ce_1: 0.7667 (0.8363)  loss_bbox_1: 1.7529 (2.4163)  loss_giou_1: 2.1323 (2.5346)  loss_ce_2: 0.7698 (0.8400)  loss_bbox_2: 1.7657 (2.4392)  loss_giou_2: 2.2232 (2.5671)  loss_ce_3: 0.7478 (0.8485)  loss_bbox_3: 1.6155 (2.3823)  loss_giou_3: 2.0623 (2.5389)  loss_ce_4: 0.7658 (0.8536)  loss_bbox_4: 1.6977 (2.4118)  loss_giou_4: 2.0993 (2.5472)  loss_ce_unscaled: 0.7699 (0.8480)  class_error_unscaled: 100.0000 (98.6928)  loss_bbox_unscaled: 0.3539 (0.4765)  loss_giou_unscaled: 1.0693 (1.2617)  cardinality_error_unscaled: 3.5000 (6.6765)  loss_ce_0_unscaled: 0.7753 (0.8237)  loss_bbox_0_unscaled: 0.3487 (0.4838)  loss_giou_0_unscaled: 1.0704 (1.2807)  cardinality_error_0_unscaled: 3.5000 (4.8186)  loss_ce_1_unscaled: 0.7667 (0.8363)  loss_bbox_1_unscaled: 0.3506 (0.4833)  loss_giou_1_unscaled: 1.0662 (1.2673)  cardinality_error_1_unscaled: 3.5000 (5.0833)  loss_ce_2_unscaled: 0.7698 (0.8400)  loss_bbox_2_unscaled: 0.3531 (0.4878)  loss_giou_2_unscaled: 1.1116 (1.2836)  cardinality_error_2_unscaled: 3.5000 (6.5196)  loss_ce_3_unscaled: 0.7478 (0.8485)  loss_bbox_3_unscaled: 0.3231 (0.4765)  loss_giou_3_unscaled: 1.0311 (1.2694)  cardinality_error_3_unscaled: 3.5000 (6.6765)  loss_ce_4_unscaled: 0.7658 (0.8536)  loss_bbox_4_unscaled: 0.3395 (0.4824)  loss_giou_4_unscaled: 1.0496 (1.2736)  cardinality_error_4_unscaled: 3.5000 (6.6765)  time: 0.3331  data: 0.0146  max mem: 5477
Epoch: [0]  [ 60/225]  eta: 0:00:55  lr: 0.001000  class_error: 100.00  loss: 23.1808 (32.6448)  loss_ce: 0.6915 (0.8191)  loss_bbox: 1.2613 (2.2106)  loss_giou: 1.8641 (2.3880)  loss_ce_0: 0.7082 (0.7989)  loss_bbox_0: 1.3390 (2.2320)  loss_giou_0: 1.7792 (2.4135)  loss_ce_1: 0.7016 (0.8092)  loss_bbox_1: 1.2421 (2.2059)  loss_giou_1: 1.7174 (2.3772)  loss_ce_2: 0.6996 (0.8123)  loss_bbox_2: 1.5305 (2.2715)  loss_giou_2: 1.8414 (2.4372)  loss_ce_3: 0.7185 (0.8226)  loss_bbox_3: 1.3289 (2.1920)  loss_giou_3: 1.7667 (2.3846)  loss_ce_4: 0.6862 (0.8221)  loss_bbox_4: 1.3238 (2.2314)  loss_giou_4: 1.8016 (2.4168)  loss_ce_unscaled: 0.6915 (0.8191)  class_error_unscaled: 100.0000 (98.9071)  loss_bbox_unscaled: 0.2523 (0.4421)  loss_giou_unscaled: 0.9320 (1.1940)  cardinality_error_unscaled: 3.0000 (6.0123)  loss_ce_0_unscaled: 0.7082 (0.7989)  loss_bbox_0_unscaled: 0.2678 (0.4464)  loss_giou_0_unscaled: 0.8896 (1.2068)  cardinality_error_0_unscaled: 3.0000 (4.4590)  loss_ce_1_unscaled: 0.7016 (0.8092)  loss_bbox_1_unscaled: 0.2484 (0.4412)  loss_giou_1_unscaled: 0.8587 (1.1886)  cardinality_error_1_unscaled: 3.0000 (4.6803)  loss_ce_2_unscaled: 0.6996 (0.8123)  loss_bbox_2_unscaled: 0.3061 (0.4543)  loss_giou_2_unscaled: 0.9207 (1.2186)  cardinality_error_2_unscaled: 3.0000 (5.8811)  loss_ce_3_unscaled: 0.7185 (0.8226)  loss_bbox_3_unscaled: 0.2658 (0.4384)  loss_giou_3_unscaled: 0.8833 (1.1923)  cardinality_error_3_unscaled: 3.0000 (6.0123)  loss_ce_4_unscaled: 0.6862 (0.8221)  loss_bbox_4_unscaled: 0.2648 (0.4463)  loss_giou_4_unscaled: 0.9008 (1.2084)  cardinality_error_4_unscaled: 3.0000 (6.0123)  time: 0.3338  data: 0.0146  max mem: 5477
Epoch: [0]  [ 70/225]  eta: 0:00:52  lr: 0.001000  class_error: 100.00  loss: 21.3303 (30.9936)  loss_ce: 0.6637 (0.8001)  loss_bbox: 1.1000 (2.0557)  loss_giou: 1.5853 (2.2778)  loss_ce_0: 0.6661 (0.7836)  loss_bbox_0: 1.1435 (2.0696)  loss_giou_0: 1.6672 (2.3055)  loss_ce_1: 0.6682 (0.7913)  loss_bbox_1: 1.0864 (2.0510)  loss_giou_1: 1.5471 (2.2684)  loss_ce_2: 0.6681 (0.7951)  loss_bbox_2: 1.1686 (2.0983)  loss_giou_2: 1.5572 (2.3080)  loss_ce_3: 0.6903 (0.8043)  loss_bbox_3: 1.1644 (2.0551)  loss_giou_3: 1.6332 (2.2966)  loss_ce_4: 0.6541 (0.8009)  loss_bbox_4: 1.2588 (2.1030)  loss_giou_4: 1.7398 (2.3293)  loss_ce_unscaled: 0.6637 (0.8001)  class_error_unscaled: 100.0000 (99.0610)  loss_bbox_unscaled: 0.2200 (0.4111)  loss_giou_unscaled: 0.7926 (1.1389)  cardinality_error_unscaled: 2.2500 (5.5563)  loss_ce_0_unscaled: 0.6661 (0.7836)  loss_bbox_0_unscaled: 0.2287 (0.4139)  loss_giou_0_unscaled: 0.8336 (1.1528)  cardinality_error_0_unscaled: 2.2500 (4.2218)  loss_ce_1_unscaled: 0.6682 (0.7913)  loss_bbox_1_unscaled: 0.2173 (0.4102)  loss_giou_1_unscaled: 0.7736 (1.1342)  cardinality_error_1_unscaled: 2.2500 (4.4120)  loss_ce_2_unscaled: 0.6681 (0.7951)  loss_bbox_2_unscaled: 0.2337 (0.4197)  loss_giou_2_unscaled: 0.7786 (1.1540)  cardinality_error_2_unscaled: 2.2500 (5.4401)  loss_ce_3_unscaled: 0.6903 (0.8043)  loss_bbox_3_unscaled: 0.2329 (0.4110)  loss_giou_3_unscaled: 0.8166 (1.1483)  cardinality_error_3_unscaled: 2.2500 (5.5563)  loss_ce_4_unscaled: 0.6541 (0.8009)  loss_bbox_4_unscaled: 0.2518 (0.4206)  loss_giou_4_unscaled: 0.8699 (1.1647)  cardinality_error_4_unscaled: 2.2500 (5.5563)  time: 0.3378  data: 0.0146  max mem: 5477
Epoch: [0]  [ 80/225]  eta: 0:00:48  lr: 0.001000  class_error: 100.00  loss: 20.5233 (29.6911)  loss_ce: 0.6906 (0.7851)  loss_bbox: 1.0481 (1.9269)  loss_giou: 1.5147 (2.1781)  loss_ce_0: 0.7025 (0.7712)  loss_bbox_0: 1.0714 (1.9364)  loss_giou_0: 1.4821 (2.1900)  loss_ce_1: 0.6821 (0.7767)  loss_bbox_1: 1.1409 (1.9650)  loss_giou_1: 1.6625 (2.2206)  loss_ce_2: 0.6765 (0.7826)  loss_bbox_2: 1.0265 (1.9785)  loss_giou_2: 1.4500 (2.2144)  loss_ce_3: 0.6955 (0.7901)  loss_bbox_3: 1.0413 (1.9257)  loss_giou_3: 1.5327 (2.1867)  loss_ce_4: 0.6909 (0.7869)  loss_bbox_4: 1.2588 (2.0151)  loss_giou_4: 1.7385 (2.2610)  loss_ce_unscaled: 0.6906 (0.7851)  class_error_unscaled: 100.0000 (99.1770)  loss_bbox_unscaled: 0.2096 (0.3854)  loss_giou_unscaled: 0.7574 (1.0890)  cardinality_error_unscaled: 3.0000 (5.2191)  loss_ce_0_unscaled: 0.7025 (0.7712)  loss_bbox_0_unscaled: 0.2143 (0.3873)  loss_giou_0_unscaled: 0.7411 (1.0950)  cardinality_error_0_unscaled: 3.0000 (4.0494)  loss_ce_1_unscaled: 0.6821 (0.7767)  loss_bbox_1_unscaled: 0.2282 (0.3930)  loss_giou_1_unscaled: 0.8313 (1.1103)  cardinality_error_1_unscaled: 3.0000 (4.2160)  loss_ce_2_unscaled: 0.6765 (0.7826)  loss_bbox_2_unscaled: 0.2053 (0.3957)  loss_giou_2_unscaled: 0.7250 (1.1072)  cardinality_error_2_unscaled: 3.0000 (5.1173)  loss_ce_3_unscaled: 0.6955 (0.7901)  loss_bbox_3_unscaled: 0.2083 (0.3851)  loss_giou_3_unscaled: 0.7663 (1.0933)  cardinality_error_3_unscaled: 3.0000 (5.2191)  loss_ce_4_unscaled: 0.6909 (0.7869)  loss_bbox_4_unscaled: 0.2518 (0.4030)  loss_giou_4_unscaled: 0.8693 (1.1305)  cardinality_error_4_unscaled: 3.0000 (5.2191)  time: 0.3316  data: 0.0145  max mem: 5477
Epoch: [0]  [ 90/225]  eta: 0:00:45  lr: 0.001000  class_error: 100.00  loss: 20.0966 (28.7047)  loss_ce: 0.7220 (0.7781)  loss_bbox: 0.9469 (1.8428)  loss_giou: 1.4894 (2.1132)  loss_ce_0: 0.7117 (0.7650)  loss_bbox_0: 1.0125 (1.8465)  loss_giou_0: 1.4151 (2.1172)  loss_ce_1: 0.7015 (0.7705)  loss_bbox_1: 1.2264 (1.8691)  loss_giou_1: 1.6625 (2.1449)  loss_ce_2: 0.7058 (0.7745)  loss_bbox_2: 1.1000 (1.8894)  loss_giou_2: 1.5367 (2.1538)  loss_ce_3: 0.7138 (0.7822)  loss_bbox_3: 0.9728 (1.8292)  loss_giou_3: 1.3945 (2.1083)  loss_ce_4: 0.7190 (0.7797)  loss_bbox_4: 1.2304 (1.9368)  loss_giou_4: 1.6800 (2.2033)  loss_ce_unscaled: 0.7220 (0.7781)  class_error_unscaled: 100.0000 (99.2674)  loss_bbox_unscaled: 0.1894 (0.3686)  loss_giou_unscaled: 0.7447 (1.0566)  cardinality_error_unscaled: 3.0000 (4.9945)  loss_ce_0_unscaled: 0.7117 (0.7650)  loss_bbox_0_unscaled: 0.2025 (0.3693)  loss_giou_0_unscaled: 0.7075 (1.0586)  cardinality_error_0_unscaled: 3.0000 (3.9533)  loss_ce_1_unscaled: 0.7015 (0.7705)  loss_bbox_1_unscaled: 0.2453 (0.3738)  loss_giou_1_unscaled: 0.8313 (1.0725)  cardinality_error_1_unscaled: 3.0000 (4.1016)  loss_ce_2_unscaled: 0.7058 (0.7745)  loss_bbox_2_unscaled: 0.2200 (0.3779)  loss_giou_2_unscaled: 0.7684 (1.0769)  cardinality_error_2_unscaled: 3.0000 (4.9038)  loss_ce_3_unscaled: 0.7138 (0.7822)  loss_bbox_3_unscaled: 0.1946 (0.3658)  loss_giou_3_unscaled: 0.6972 (1.0541)  cardinality_error_3_unscaled: 3.0000 (4.9945)  loss_ce_4_unscaled: 0.7190 (0.7797)  loss_bbox_4_unscaled: 0.2461 (0.3874)  loss_giou_4_unscaled: 0.8400 (1.1017)  cardinality_error_4_unscaled: 3.0000 (4.9918)  time: 0.3271  data: 0.0144  max mem: 5477
Epoch: [0]  [100/225]  eta: 0:00:42  lr: 0.001000  class_error: 100.00  loss: 20.1649 (27.9541)  loss_ce: 0.7003 (0.7687)  loss_bbox: 1.0745 (1.7630)  loss_giou: 1.5262 (2.0532)  loss_ce_0: 0.7128 (0.7564)  loss_bbox_0: 1.1028 (1.7936)  loss_giou_0: 1.5624 (2.0816)  loss_ce_1: 0.7015 (0.7603)  loss_bbox_1: 1.1178 (1.7996)  loss_giou_1: 1.4671 (2.0949)  loss_ce_2: 0.6817 (0.7625)  loss_bbox_2: 1.2027 (1.8460)  loss_giou_2: 1.6791 (2.1249)  loss_ce_3: 0.7115 (0.7715)  loss_bbox_3: 1.0009 (1.7556)  loss_giou_3: 1.4495 (2.0527)  loss_ce_4: 0.7190 (0.7703)  loss_bbox_4: 1.1550 (1.8591)  loss_giou_4: 1.5835 (2.1402)  loss_ce_unscaled: 0.7003 (0.7687)  class_error_unscaled: 100.0000 (99.3399)  loss_bbox_unscaled: 0.2149 (0.3526)  loss_giou_unscaled: 0.7631 (1.0266)  cardinality_error_unscaled: 3.0000 (4.7797)  loss_ce_0_unscaled: 0.7128 (0.7564)  loss_bbox_0_unscaled: 0.2206 (0.3587)  loss_giou_0_unscaled: 0.7812 (1.0408)  cardinality_error_0_unscaled: 3.0000 (3.8416)  loss_ce_1_unscaled: 0.7015 (0.7603)  loss_bbox_1_unscaled: 0.2236 (0.3599)  loss_giou_1_unscaled: 0.7335 (1.0474)  cardinality_error_1_unscaled: 3.0000 (3.9752)  loss_ce_2_unscaled: 0.6817 (0.7625)  loss_bbox_2_unscaled: 0.2405 (0.3692)  loss_giou_2_unscaled: 0.8396 (1.0625)  cardinality_error_2_unscaled: 3.0000 (4.6980)  loss_ce_3_unscaled: 0.7115 (0.7715)  loss_bbox_3_unscaled: 0.2002 (0.3511)  loss_giou_3_unscaled: 0.7248 (1.0264)  cardinality_error_3_unscaled: 3.0000 (4.7797)  loss_ce_4_unscaled: 0.7190 (0.7703)  loss_bbox_4_unscaled: 0.2310 (0.3718)  loss_giou_4_unscaled: 0.7918 (1.0701)  cardinality_error_4_unscaled: 3.0000 (4.7772)  time: 0.3336  data: 0.0146  max mem: 5637
Epoch: [0]  [110/225]  eta: 0:00:38  lr: 0.001000  class_error: 100.00  loss: 21.6342 (27.3816)  loss_ce: 0.7064 (0.7642)  loss_bbox: 0.9988 (1.6882)  loss_giou: 1.4457 (1.9938)  loss_ce_0: 0.7238 (0.7540)  loss_bbox_0: 1.2679 (1.7486)  loss_giou_0: 1.6559 (2.0487)  loss_ce_1: 0.6962 (0.7567)  loss_bbox_1: 1.1178 (1.7325)  loss_giou_1: 1.5664 (2.0472)  loss_ce_2: 0.7001 (0.7578)  loss_bbox_2: 1.2249 (1.8120)  loss_giou_2: 1.7109 (2.0992)  loss_ce_3: 0.7018 (0.7668)  loss_bbox_3: 1.1599 (1.7221)  loss_giou_3: 1.5990 (2.0310)  loss_ce_4: 0.7098 (0.7668)  loss_bbox_4: 1.1236 (1.7990)  loss_giou_4: 1.4905 (2.0932)  loss_ce_unscaled: 0.7064 (0.7642)  class_error_unscaled: 100.0000 (99.3994)  loss_bbox_unscaled: 0.1998 (0.3376)  loss_giou_unscaled: 0.7229 (0.9969)  cardinality_error_unscaled: 3.0000 (4.6374)  loss_ce_0_unscaled: 0.7238 (0.7540)  loss_bbox_0_unscaled: 0.2536 (0.3497)  loss_giou_0_unscaled: 0.8279 (1.0243)  cardinality_error_0_unscaled: 3.0000 (3.7838)  loss_ce_1_unscaled: 0.6962 (0.7567)  loss_bbox_1_unscaled: 0.2236 (0.3465)  loss_giou_1_unscaled: 0.7832 (1.0236)  cardinality_error_1_unscaled: 3.0000 (3.9054)  loss_ce_2_unscaled: 0.7001 (0.7578)  loss_bbox_2_unscaled: 0.2450 (0.3624)  loss_giou_2_unscaled: 0.8554 (1.0496)  cardinality_error_2_unscaled: 3.0000 (4.5631)  loss_ce_3_unscaled: 0.7018 (0.7668)  loss_bbox_3_unscaled: 0.2320 (0.3444)  loss_giou_3_unscaled: 0.7995 (1.0155)  cardinality_error_3_unscaled: 3.0000 (4.6374)  loss_ce_4_unscaled: 0.7098 (0.7668)  loss_bbox_4_unscaled: 0.2247 (0.3598)  loss_giou_4_unscaled: 0.7453 (1.0466)  cardinality_error_4_unscaled: 3.0000 (4.6351)  time: 0.3367  data: 0.0147  max mem: 5637
Epoch: [0]  [120/225]  eta: 0:00:35  lr: 0.001000  class_error: 100.00  loss: 21.1659 (26.8506)  loss_ce: 0.7089 (0.7577)  loss_bbox: 0.9988 (1.6532)  loss_giou: 1.4848 (1.9689)  loss_ce_0: 0.7133 (0.7496)  loss_bbox_0: 1.0947 (1.7004)  loss_giou_0: 1.5849 (2.0108)  loss_ce_1: 0.7111 (0.7519)  loss_bbox_1: 1.0310 (1.6767)  loss_giou_1: 1.4821 (1.9990)  loss_ce_2: 0.7007 (0.7512)  loss_bbox_2: 1.2147 (1.7573)  loss_giou_2: 1.5715 (2.0571)  loss_ce_3: 0.7257 (0.7607)  loss_bbox_3: 1.3461 (1.6945)  loss_giou_3: 1.7193 (2.0082)  loss_ce_4: 0.7094 (0.7603)  loss_bbox_4: 1.0727 (1.7453)  loss_giou_4: 1.4593 (2.0478)  loss_ce_unscaled: 0.7089 (0.7577)  class_error_unscaled: 100.0000 (99.4490)  loss_bbox_unscaled: 0.1998 (0.3306)  loss_giou_unscaled: 0.7424 (0.9845)  cardinality_error_unscaled: 3.2500 (4.5124)  loss_ce_0_unscaled: 0.7133 (0.7496)  loss_bbox_0_unscaled: 0.2189 (0.3401)  loss_giou_0_unscaled: 0.7924 (1.0054)  cardinality_error_0_unscaled: 3.2500 (3.7273)  loss_ce_1_unscaled: 0.7111 (0.7519)  loss_bbox_1_unscaled: 0.2062 (0.3353)  loss_giou_1_unscaled: 0.7411 (0.9995)  cardinality_error_1_unscaled: 3.2500 (3.8409)  loss_ce_2_unscaled: 0.7007 (0.7512)  loss_bbox_2_unscaled: 0.2429 (0.3515)  loss_giou_2_unscaled: 0.7857 (1.0286)  cardinality_error_2_unscaled: 3.2500 (4.4442)  loss_ce_3_unscaled: 0.7257 (0.7607)  loss_bbox_3_unscaled: 0.2692 (0.3389)  loss_giou_3_unscaled: 0.8596 (1.0041)  cardinality_error_3_unscaled: 3.2500 (4.5124)  loss_ce_4_unscaled: 0.7094 (0.7603)  loss_bbox_4_unscaled: 0.2145 (0.3491)  loss_giou_4_unscaled: 0.7297 (1.0239)  cardinality_error_4_unscaled: 3.2500 (4.5103)  time: 0.3324  data: 0.0146  max mem: 5637
Epoch: [0]  [130/225]  eta: 0:00:31  lr: 0.001000  class_error: 100.00  loss: 19.8474 (26.3022)  loss_ce: 0.6661 (0.7493)  loss_bbox: 1.1333 (1.6103)  loss_giou: 1.5970 (1.9413)  loss_ce_0: 0.6654 (0.7415)  loss_bbox_0: 1.0947 (1.6567)  loss_giou_0: 1.5989 (1.9873)  loss_ce_1: 0.6688 (0.7446)  loss_bbox_1: 1.0219 (1.6198)  loss_giou_1: 1.4521 (1.9542)  loss_ce_2: 0.6613 (0.7427)  loss_bbox_2: 1.0290 (1.6975)  loss_giou_2: 1.5071 (2.0150)  loss_ce_3: 0.6708 (0.7522)  loss_bbox_3: 1.0442 (1.6452)  loss_giou_3: 1.5472 (1.9718)  loss_ce_4: 0.6684 (0.7518)  loss_bbox_4: 1.0742 (1.7012)  loss_giou_4: 1.4883 (2.0199)  loss_ce_unscaled: 0.6661 (0.7493)  class_error_unscaled: 100.0000 (99.4911)  loss_bbox_unscaled: 0.2267 (0.3221)  loss_giou_unscaled: 0.7985 (0.9706)  cardinality_error_unscaled: 2.7500 (4.3664)  loss_ce_0_unscaled: 0.6654 (0.7415)  loss_bbox_0_unscaled: 0.2189 (0.3313)  loss_giou_0_unscaled: 0.7994 (0.9936)  cardinality_error_0_unscaled: 2.7500 (3.6412)  loss_ce_1_unscaled: 0.6688 (0.7446)  loss_bbox_1_unscaled: 0.2044 (0.3240)  loss_giou_1_unscaled: 0.7261 (0.9771)  cardinality_error_1_unscaled: 2.7500 (3.7462)  loss_ce_2_unscaled: 0.6613 (0.7427)  loss_bbox_2_unscaled: 0.2058 (0.3395)  loss_giou_2_unscaled: 0.7535 (1.0075)  cardinality_error_2_unscaled: 2.7500 (4.3034)  loss_ce_3_unscaled: 0.6708 (0.7522)  loss_bbox_3_unscaled: 0.2088 (0.3290)  loss_giou_3_unscaled: 0.7736 (0.9859)  cardinality_error_3_unscaled: 2.7500 (4.3664)  loss_ce_4_unscaled: 0.6684 (0.7518)  loss_bbox_4_unscaled: 0.2148 (0.3402)  loss_giou_4_unscaled: 0.7441 (1.0100)  cardinality_error_4_unscaled: 2.7500 (4.3645)  time: 0.3296  data: 0.0146  max mem: 5637
Epoch: [0]  [140/225]  eta: 0:00:28  lr: 0.001000  class_error: 100.00  loss: 19.8021 (25.8832)  loss_ce: 0.6661 (0.7441)  loss_bbox: 1.1773 (1.5948)  loss_giou: 1.6169 (1.9355)  loss_ce_0: 0.6654 (0.7385)  loss_bbox_0: 1.0968 (1.6210)  loss_giou_0: 1.6052 (1.9583)  loss_ce_1: 0.6688 (0.7396)  loss_bbox_1: 1.0471 (1.5973)  loss_giou_1: 1.5270 (1.9388)  loss_ce_2: 0.6613 (0.7373)  loss_bbox_2: 1.0705 (1.6604)  loss_giou_2: 1.5409 (1.9847)  loss_ce_3: 0.6688 (0.7472)  loss_bbox_3: 0.9242 (1.5927)  loss_giou_3: 1.4197 (1.9273)  loss_ce_4: 0.6690 (0.7476)  loss_bbox_4: 0.9728 (1.6452)  loss_giou_4: 1.4794 (1.9728)  loss_ce_unscaled: 0.6661 (0.7441)  class_error_unscaled: 100.0000 (99.5272)  loss_bbox_unscaled: 0.2355 (0.3190)  loss_giou_unscaled: 0.8085 (0.9678)  cardinality_error_unscaled: 2.7500 (4.2766)  loss_ce_0_unscaled: 0.6654 (0.7385)  loss_bbox_0_unscaled: 0.2194 (0.3242)  loss_giou_0_unscaled: 0.8026 (0.9792)  cardinality_error_0_unscaled: 2.7500 (3.6028)  loss_ce_1_unscaled: 0.6688 (0.7396)  loss_bbox_1_unscaled: 0.2094 (0.3195)  loss_giou_1_unscaled: 0.7635 (0.9694)  cardinality_error_1_unscaled: 2.7500 (3.7004)  loss_ce_2_unscaled: 0.6613 (0.7373)  loss_bbox_2_unscaled: 0.2141 (0.3321)  loss_giou_2_unscaled: 0.7705 (0.9923)  cardinality_error_2_unscaled: 2.7500 (4.2181)  loss_ce_3_unscaled: 0.6688 (0.7472)  loss_bbox_3_unscaled: 0.1848 (0.3185)  loss_giou_3_unscaled: 0.7098 (0.9637)  cardinality_error_3_unscaled: 2.7500 (4.2766)  loss_ce_4_unscaled: 0.6690 (0.7476)  loss_bbox_4_unscaled: 0.1946 (0.3290)  loss_giou_4_unscaled: 0.7397 (0.9864)  cardinality_error_4_unscaled: 2.7500 (4.2748)  time: 0.3280  data: 0.0146  max mem: 5637
Epoch: [0]  [150/225]  eta: 0:00:25  lr: 0.001000  class_error: 100.00  loss: 19.4377 (25.4237)  loss_ce: 0.6484 (0.7362)  loss_bbox: 1.1630 (1.5634)  loss_giou: 1.6169 (1.9105)  loss_ce_0: 0.6593 (0.7312)  loss_bbox_0: 1.0526 (1.5859)  loss_giou_0: 1.5327 (1.9289)  loss_ce_1: 0.6352 (0.7314)  loss_bbox_1: 1.0940 (1.5660)  loss_giou_1: 1.5832 (1.9111)  loss_ce_2: 0.6494 (0.7303)  loss_bbox_2: 1.0556 (1.6159)  loss_giou_2: 1.4842 (1.9477)  loss_ce_3: 0.6339 (0.7397)  loss_bbox_3: 0.8654 (1.5534)  loss_giou_3: 1.3514 (1.8963)  loss_ce_4: 0.6644 (0.7394)  loss_bbox_4: 0.9673 (1.6013)  loss_giou_4: 1.3806 (1.9351)  loss_ce_unscaled: 0.6484 (0.7362)  class_error_unscaled: 100.0000 (99.5585)  loss_bbox_unscaled: 0.2326 (0.3127)  loss_giou_unscaled: 0.8085 (0.9552)  cardinality_error_unscaled: 2.7500 (4.1556)  loss_ce_0_unscaled: 0.6593 (0.7312)  loss_bbox_0_unscaled: 0.2105 (0.3172)  loss_giou_0_unscaled: 0.7664 (0.9644)  cardinality_error_0_unscaled: 2.7500 (3.5265)  loss_ce_1_unscaled: 0.6352 (0.7314)  loss_bbox_1_unscaled: 0.2188 (0.3132)  loss_giou_1_unscaled: 0.7916 (0.9556)  cardinality_error_1_unscaled: 2.7500 (3.6175)  loss_ce_2_unscaled: 0.6494 (0.7303)  loss_bbox_2_unscaled: 0.2111 (0.3232)  loss_giou_2_unscaled: 0.7421 (0.9738)  cardinality_error_2_unscaled: 2.7500 (4.1010)  loss_ce_3_unscaled: 0.6339 (0.7397)  loss_bbox_3_unscaled: 0.1731 (0.3107)  loss_giou_3_unscaled: 0.6757 (0.9481)  cardinality_error_3_unscaled: 2.7500 (4.1556)  loss_ce_4_unscaled: 0.6644 (0.7394)  loss_bbox_4_unscaled: 0.1935 (0.3203)  loss_giou_4_unscaled: 0.6903 (0.9675)  cardinality_error_4_unscaled: 2.7500 (4.1540)  time: 0.3254  data: 0.0145  max mem: 5637
Epoch: [0]  [160/225]  eta: 0:00:21  lr: 0.001000  class_error: 100.00  loss: 18.5955 (24.9967)  loss_ce: 0.6175 (0.7305)  loss_bbox: 1.0154 (1.5311)  loss_giou: 1.4537 (1.8840)  loss_ce_0: 0.6441 (0.7263)  loss_bbox_0: 1.0167 (1.5470)  loss_giou_0: 1.4343 (1.8967)  loss_ce_1: 0.6136 (0.7256)  loss_bbox_1: 0.9467 (1.5288)  loss_giou_1: 1.4095 (1.8783)  loss_ce_2: 0.6178 (0.7251)  loss_bbox_2: 0.9770 (1.5762)  loss_giou_2: 1.4386 (1.9168)  loss_ce_3: 0.6290 (0.7335)  loss_bbox_3: 0.9467 (1.5208)  loss_giou_3: 1.4775 (1.8709)  loss_ce_4: 0.6210 (0.7343)  loss_bbox_4: 0.9768 (1.5656)  loss_giou_4: 1.3806 (1.9053)  loss_ce_unscaled: 0.6175 (0.7305)  class_error_unscaled: 100.0000 (99.5859)  loss_bbox_unscaled: 0.2031 (0.3062)  loss_giou_unscaled: 0.7269 (0.9420)  cardinality_error_unscaled: 2.5000 (4.0590)  loss_ce_0_unscaled: 0.6441 (0.7263)  loss_bbox_0_unscaled: 0.2033 (0.3094)  loss_giou_0_unscaled: 0.7172 (0.9484)  cardinality_error_0_unscaled: 2.5000 (3.4689)  loss_ce_1_unscaled: 0.6136 (0.7256)  loss_bbox_1_unscaled: 0.1893 (0.3058)  loss_giou_1_unscaled: 0.7048 (0.9391)  cardinality_error_1_unscaled: 2.5000 (3.5543)  loss_ce_2_unscaled: 0.6178 (0.7251)  loss_bbox_2_unscaled: 0.1954 (0.3152)  loss_giou_2_unscaled: 0.7193 (0.9584)  cardinality_error_2_unscaled: 2.5000 (4.0078)  loss_ce_3_unscaled: 0.6290 (0.7335)  loss_bbox_3_unscaled: 0.1893 (0.3042)  loss_giou_3_unscaled: 0.7387 (0.9354)  cardinality_error_3_unscaled: 2.5000 (4.0590)  loss_ce_4_unscaled: 0.6210 (0.7343)  loss_bbox_4_unscaled: 0.1954 (0.3131)  loss_giou_4_unscaled: 0.6903 (0.9526)  cardinality_error_4_unscaled: 2.5000 (4.0575)  time: 0.3258  data: 0.0145  max mem: 5637
Epoch: [0]  [170/225]  eta: 0:00:18  lr: 0.001000  class_error: 100.00  loss: 18.7155 (24.6388)  loss_ce: 0.6814 (0.7295)  loss_bbox: 0.9805 (1.4954)  loss_giou: 1.4094 (1.8550)  loss_ce_0: 0.6808 (0.7260)  loss_bbox_0: 0.9178 (1.5164)  loss_giou_0: 1.4553 (1.8778)  loss_ce_1: 0.6558 (0.7249)  loss_bbox_1: 0.9360 (1.4940)  loss_giou_1: 1.3779 (1.8533)  loss_ce_2: 0.6761 (0.7245)  loss_bbox_2: 0.9238 (1.5357)  loss_giou_2: 1.3853 (1.8836)  loss_ce_3: 0.6566 (0.7324)  loss_bbox_3: 0.9437 (1.4854)  loss_giou_3: 1.4492 (1.8438)  loss_ce_4: 0.6907 (0.7334)  loss_bbox_4: 0.9971 (1.5383)  loss_giou_4: 1.5085 (1.8894)  loss_ce_unscaled: 0.6814 (0.7295)  class_error_unscaled: 100.0000 (99.6101)  loss_bbox_unscaled: 0.1961 (0.2991)  loss_giou_unscaled: 0.7047 (0.9275)  cardinality_error_unscaled: 2.7500 (4.0073)  loss_ce_0_unscaled: 0.6808 (0.7260)  loss_bbox_0_unscaled: 0.1836 (0.3033)  loss_giou_0_unscaled: 0.7277 (0.9389)  cardinality_error_0_unscaled: 2.7500 (3.4488)  loss_ce_1_unscaled: 0.6558 (0.7249)  loss_bbox_1_unscaled: 0.1872 (0.2988)  loss_giou_1_unscaled: 0.6889 (0.9267)  cardinality_error_1_unscaled: 2.7500 (3.5322)  loss_ce_2_unscaled: 0.6761 (0.7245)  loss_bbox_2_unscaled: 0.1848 (0.3071)  loss_giou_2_unscaled: 0.6927 (0.9418)  cardinality_error_2_unscaled: 2.7500 (3.9576)  loss_ce_3_unscaled: 0.6566 (0.7324)  loss_bbox_3_unscaled: 0.1887 (0.2971)  loss_giou_3_unscaled: 0.7246 (0.9219)  cardinality_error_3_unscaled: 2.7500 (4.0073)  loss_ce_4_unscaled: 0.6907 (0.7334)  loss_bbox_4_unscaled: 0.1994 (0.3077)  loss_giou_4_unscaled: 0.7542 (0.9447)  cardinality_error_4_unscaled: 2.7500 (4.0015)  time: 0.3224  data: 0.0144  max mem: 5637
Epoch: [0]  [180/225]  eta: 0:00:14  lr: 0.001000  class_error: 100.00  loss: 18.4054 (24.3400)  loss_ce: 0.7040 (0.7282)  loss_bbox: 0.9830 (1.4731)  loss_giou: 1.3940 (1.8341)  loss_ce_0: 0.7026 (0.7243)  loss_bbox_0: 1.0940 (1.4929)  loss_giou_0: 1.4768 (1.8555)  loss_ce_1: 0.7101 (0.7233)  loss_bbox_1: 0.9232 (1.4649)  loss_giou_1: 1.4000 (1.8267)  loss_ce_2: 0.7053 (0.7232)  loss_bbox_2: 0.9237 (1.5075)  loss_giou_2: 1.3560 (1.8598)  loss_ce_3: 0.7139 (0.7305)  loss_bbox_3: 0.9044 (1.4594)  loss_giou_3: 1.3482 (1.8174)  loss_ce_4: 0.7048 (0.7320)  loss_bbox_4: 1.0585 (1.5171)  loss_giou_4: 1.5085 (1.8700)  loss_ce_unscaled: 0.7040 (0.7282)  class_error_unscaled: 100.0000 (99.6317)  loss_bbox_unscaled: 0.1966 (0.2946)  loss_giou_unscaled: 0.6970 (0.9170)  cardinality_error_unscaled: 3.2500 (3.9599)  loss_ce_0_unscaled: 0.7026 (0.7243)  loss_bbox_0_unscaled: 0.2188 (0.2986)  loss_giou_0_unscaled: 0.7384 (0.9278)  cardinality_error_0_unscaled: 3.2500 (3.4309)  loss_ce_1_unscaled: 0.7101 (0.7233)  loss_bbox_1_unscaled: 0.1846 (0.2930)  loss_giou_1_unscaled: 0.7000 (0.9134)  cardinality_error_1_unscaled: 3.2500 (3.5110)  loss_ce_2_unscaled: 0.7053 (0.7232)  loss_bbox_2_unscaled: 0.1847 (0.3015)  loss_giou_2_unscaled: 0.6780 (0.9299)  cardinality_error_2_unscaled: 3.2500 (3.9130)  loss_ce_3_unscaled: 0.7139 (0.7305)  loss_bbox_3_unscaled: 0.1809 (0.2919)  loss_giou_3_unscaled: 0.6741 (0.9087)  cardinality_error_3_unscaled: 3.2500 (3.9586)  loss_ce_4_unscaled: 0.7048 (0.7320)  loss_bbox_4_unscaled: 0.2117 (0.3034)  loss_giou_4_unscaled: 0.7542 (0.9350)  cardinality_error_4_unscaled: 3.2500 (3.9517)  time: 0.3212  data: 0.0148  max mem: 5637
Epoch: [0]  [190/225]  eta: 0:00:11  lr: 0.001000  class_error: 100.00  loss: 18.0909 (23.9890)  loss_ce: 0.7036 (0.7262)  loss_bbox: 0.9683 (1.4462)  loss_giou: 1.4349 (1.8160)  loss_ce_0: 0.7026 (0.7226)  loss_bbox_0: 0.9071 (1.4586)  loss_giou_0: 1.3566 (1.8274)  loss_ce_1: 0.6941 (0.7210)  loss_bbox_1: 0.8661 (1.4328)  loss_giou_1: 1.3329 (1.8017)  loss_ce_2: 0.7035 (0.7214)  loss_bbox_2: 0.8918 (1.4733)  loss_giou_2: 1.3383 (1.8350)  loss_ce_3: 0.6843 (0.7281)  loss_bbox_3: 0.8703 (1.4305)  loss_giou_3: 1.3343 (1.7969)  loss_ce_4: 0.7020 (0.7305)  loss_bbox_4: 0.9180 (1.4799)  loss_giou_4: 1.3109 (1.8409)  loss_ce_unscaled: 0.7036 (0.7262)  class_error_unscaled: 100.0000 (99.6510)  loss_bbox_unscaled: 0.1937 (0.2892)  loss_giou_unscaled: 0.7174 (0.9080)  cardinality_error_unscaled: 3.2500 (3.9110)  loss_ce_0_unscaled: 0.7026 (0.7226)  loss_bbox_0_unscaled: 0.1814 (0.2917)  loss_giou_0_unscaled: 0.6783 (0.9137)  cardinality_error_0_unscaled: 3.2500 (3.4097)  loss_ce_1_unscaled: 0.6941 (0.7210)  loss_bbox_1_unscaled: 0.1732 (0.2866)  loss_giou_1_unscaled: 0.6665 (0.9009)  cardinality_error_1_unscaled: 3.2500 (3.4856)  loss_ce_2_unscaled: 0.7035 (0.7214)  loss_bbox_2_unscaled: 0.1784 (0.2947)  loss_giou_2_unscaled: 0.6692 (0.9175)  cardinality_error_2_unscaled: 3.2500 (3.8665)  loss_ce_3_unscaled: 0.6843 (0.7281)  loss_bbox_3_unscaled: 0.1741 (0.2861)  loss_giou_3_unscaled: 0.6672 (0.8984)  cardinality_error_3_unscaled: 3.2500 (3.9097)  loss_ce_4_unscaled: 0.7020 (0.7305)  loss_bbox_4_unscaled: 0.1836 (0.2960)  loss_giou_4_unscaled: 0.6555 (0.9204)  cardinality_error_4_unscaled: 3.2500 (3.9031)  time: 0.3258  data: 0.0149  max mem: 5640
Epoch: [0]  [200/225]  eta: 0:00:08  lr: 0.001000  class_error: 100.00  loss: 17.8010 (23.7049)  loss_ce: 0.6675 (0.7230)  loss_bbox: 0.9316 (1.4328)  loss_giou: 1.4737 (1.8085)  loss_ce_0: 0.6540 (0.7194)  loss_bbox_0: 0.8879 (1.4402)  loss_giou_0: 1.3566 (1.8152)  loss_ce_1: 0.6669 (0.7185)  loss_bbox_1: 0.8515 (1.4078)  loss_giou_1: 1.3605 (1.7799)  loss_ce_2: 0.6703 (0.7191)  loss_bbox_2: 0.8717 (1.4446)  loss_giou_2: 1.3383 (1.8108)  loss_ce_3: 0.6720 (0.7254)  loss_bbox_3: 0.8708 (1.4028)  loss_giou_3: 1.3336 (1.7713)  loss_ce_4: 0.6676 (0.7272)  loss_bbox_4: 0.7845 (1.4471)  loss_giou_4: 1.2801 (1.8113)  loss_ce_unscaled: 0.6675 (0.7230)  class_error_unscaled: 100.0000 (99.6683)  loss_bbox_unscaled: 0.1863 (0.2866)  loss_giou_unscaled: 0.7369 (0.9042)  cardinality_error_unscaled: 2.7500 (3.8507)  loss_ce_0_unscaled: 0.6540 (0.7194)  loss_bbox_0_unscaled: 0.1776 (0.2880)  loss_giou_0_unscaled: 0.6783 (0.9076)  cardinality_error_0_unscaled: 2.7500 (3.3769)  loss_ce_1_unscaled: 0.6669 (0.7185)  loss_bbox_1_unscaled: 0.1703 (0.2816)  loss_giou_1_unscaled: 0.6803 (0.8900)  cardinality_error_1_unscaled: 2.7500 (3.4490)  loss_ce_2_unscaled: 0.6703 (0.7191)  loss_bbox_2_unscaled: 0.1743 (0.2889)  loss_giou_2_unscaled: 0.6692 (0.9054)  cardinality_error_2_unscaled: 2.7500 (3.8109)  loss_ce_3_unscaled: 0.6720 (0.7254)  loss_bbox_3_unscaled: 0.1742 (0.2806)  loss_giou_3_unscaled: 0.6668 (0.8857)  cardinality_error_3_unscaled: 2.7500 (3.8520)  loss_ce_4_unscaled: 0.6676 (0.7272)  loss_bbox_4_unscaled: 0.1569 (0.2894)  loss_giou_4_unscaled: 0.6400 (0.9056)  cardinality_error_4_unscaled: 2.7500 (3.8458)  time: 0.3301  data: 0.0145  max mem: 5640
Epoch: [0]  [210/225]  eta: 0:00:04  lr: 0.001000  class_error: 100.00  loss: 18.0540 (23.4763)  loss_ce: 0.6480 (0.7208)  loss_bbox: 1.0048 (1.4134)  loss_giou: 1.5002 (1.7948)  loss_ce_0: 0.6452 (0.7170)  loss_bbox_0: 1.0194 (1.4193)  loss_giou_0: 1.5324 (1.8009)  loss_ce_1: 0.6662 (0.7162)  loss_bbox_1: 0.9526 (1.3892)  loss_giou_1: 1.4151 (1.7668)  loss_ce_2: 0.6622 (0.7172)  loss_bbox_2: 0.9372 (1.4226)  loss_giou_2: 1.3879 (1.7945)  loss_ce_3: 0.6688 (0.7231)  loss_bbox_3: 0.8915 (1.3805)  loss_giou_3: 1.3420 (1.7544)  loss_ce_4: 0.6564 (0.7250)  loss_bbox_4: 0.8851 (1.4247)  loss_giou_4: 1.3233 (1.7959)  loss_ce_unscaled: 0.6480 (0.7208)  class_error_unscaled: 100.0000 (99.6840)  loss_bbox_unscaled: 0.2010 (0.2827)  loss_giou_unscaled: 0.7501 (0.8974)  cardinality_error_unscaled: 2.5000 (3.8021)  loss_ce_0_unscaled: 0.6452 (0.7170)  loss_bbox_0_unscaled: 0.2039 (0.2839)  loss_giou_0_unscaled: 0.7662 (0.9004)  cardinality_error_0_unscaled: 2.5000 (3.3507)  loss_ce_1_unscaled: 0.6662 (0.7162)  loss_bbox_1_unscaled: 0.1905 (0.2778)  loss_giou_1_unscaled: 0.7075 (0.8834)  cardinality_error_1_unscaled: 2.5000 (3.4194)  loss_ce_2_unscaled: 0.6622 (0.7172)  loss_bbox_2_unscaled: 0.1874 (0.2845)  loss_giou_2_unscaled: 0.6940 (0.8972)  cardinality_error_2_unscaled: 2.5000 (3.7642)  loss_ce_3_unscaled: 0.6688 (0.7231)  loss_bbox_3_unscaled: 0.1783 (0.2761)  loss_giou_3_unscaled: 0.6710 (0.8772)  cardinality_error_3_unscaled: 2.5000 (3.8033)  loss_ce_4_unscaled: 0.6564 (0.7250)  loss_bbox_4_unscaled: 0.1770 (0.2849)  loss_giou_4_unscaled: 0.6616 (0.8979)  cardinality_error_4_unscaled: 2.5000 (3.7974)  time: 0.3318  data: 0.0148  max mem: 5640
Epoch: [0]  [220/225]  eta: 0:00:01  lr: 0.001000  class_error: 100.00  loss: 18.4685 (23.2550)  loss_ce: 0.7104 (0.7212)  loss_bbox: 1.0054 (1.3937)  loss_giou: 1.4747 (1.7770)  loss_ce_0: 0.7039 (0.7182)  loss_bbox_0: 0.8646 (1.3952)  loss_giou_0: 1.3435 (1.7785)  loss_ce_1: 0.7096 (0.7172)  loss_bbox_1: 0.9061 (1.3666)  loss_giou_1: 1.4124 (1.7470)  loss_ce_2: 0.7033 (0.7182)  loss_bbox_2: 1.0047 (1.4031)  loss_giou_2: 1.3926 (1.7766)  loss_ce_3: 0.7101 (0.7243)  loss_bbox_3: 0.8724 (1.3616)  loss_giou_3: 1.3556 (1.7367)  loss_ce_4: 0.7159 (0.7252)  loss_bbox_4: 0.9905 (1.4106)  loss_giou_4: 1.4590 (1.7839)  loss_ce_unscaled: 0.7104 (0.7212)  class_error_unscaled: 100.0000 (99.6983)  loss_bbox_unscaled: 0.2011 (0.2787)  loss_giou_unscaled: 0.7374 (0.8885)  cardinality_error_unscaled: 3.2500 (3.7896)  loss_ce_0_unscaled: 0.7039 (0.7182)  loss_bbox_0_unscaled: 0.1729 (0.2790)  loss_giou_0_unscaled: 0.6718 (0.8893)  cardinality_error_0_unscaled: 3.0000 (3.3575)  loss_ce_1_unscaled: 0.7096 (0.7172)  loss_bbox_1_unscaled: 0.1812 (0.2733)  loss_giou_1_unscaled: 0.7062 (0.8735)  cardinality_error_1_unscaled: 3.2500 (3.4231)  loss_ce_2_unscaled: 0.7033 (0.7182)  loss_bbox_2_unscaled: 0.2009 (0.2806)  loss_giou_2_unscaled: 0.6963 (0.8883)  cardinality_error_2_unscaled: 3.2500 (3.7534)  loss_ce_3_unscaled: 0.7101 (0.7243)  loss_bbox_3_unscaled: 0.1745 (0.2723)  loss_giou_3_unscaled: 0.6778 (0.8684)  cardinality_error_3_unscaled: 3.2500 (3.7919)  loss_ce_4_unscaled: 0.7159 (0.7252)  loss_bbox_4_unscaled: 0.1981 (0.2821)  loss_giou_4_unscaled: 0.7295 (0.8919)  cardinality_error_4_unscaled: 3.2500 (3.7862)  time: 0.3244  data: 0.0147  max mem: 5640
Epoch: [0]  [224/225]  eta: 0:00:00  lr: 0.001000  class_error: 100.00  loss: 18.4685 (23.1879)  loss_ce: 0.6992 (0.7193)  loss_bbox: 1.0062 (1.3909)  loss_giou: 1.4650 (1.7726)  loss_ce_0: 0.7027 (0.7163)  loss_bbox_0: 0.8961 (1.3926)  loss_giou_0: 1.3753 (1.7748)  loss_ce_1: 0.7094 (0.7157)  loss_bbox_1: 0.9046 (1.3604)  loss_giou_1: 1.3026 (1.7393)  loss_ce_2: 0.6922 (0.7161)  loss_bbox_2: 1.0614 (1.4012)  loss_giou_2: 1.3909 (1.7740)  loss_ce_3: 0.7083 (0.7224)  loss_bbox_3: 0.9956 (1.3571)  loss_giou_3: 1.3279 (1.7304)  loss_ce_4: 0.7025 (0.7232)  loss_bbox_4: 1.0279 (1.4045)  loss_giou_4: 1.4262 (1.7770)  loss_ce_unscaled: 0.6992 (0.7193)  class_error_unscaled: 100.0000 (99.7037)  loss_bbox_unscaled: 0.2012 (0.2782)  loss_giou_unscaled: 0.7325 (0.8863)  cardinality_error_unscaled: 3.0000 (3.7622)  loss_ce_0_unscaled: 0.7027 (0.7163)  loss_bbox_0_unscaled: 0.1792 (0.2785)  loss_giou_0_unscaled: 0.6876 (0.8874)  cardinality_error_0_unscaled: 3.0000 (3.3389)  loss_ce_1_unscaled: 0.7094 (0.7157)  loss_bbox_1_unscaled: 0.1809 (0.2721)  loss_giou_1_unscaled: 0.6513 (0.8697)  cardinality_error_1_unscaled: 3.0000 (3.4033)  loss_ce_2_unscaled: 0.6922 (0.7161)  loss_bbox_2_unscaled: 0.2123 (0.2802)  loss_giou_2_unscaled: 0.6955 (0.8870)  cardinality_error_2_unscaled: 3.0000 (3.7278)  loss_ce_3_unscaled: 0.7083 (0.7224)  loss_bbox_3_unscaled: 0.1991 (0.2714)  loss_giou_3_unscaled: 0.6639 (0.8652)  cardinality_error_3_unscaled: 3.0000 (3.7656)  loss_ce_4_unscaled: 0.7025 (0.7232)  loss_bbox_4_unscaled: 0.2056 (0.2809)  loss_giou_4_unscaled: 0.7131 (0.8885)  cardinality_error_4_unscaled: 3.0000 (3.7600)  time: 0.3175  data: 0.0142  max mem: 5640
Epoch: [0] Total time: 0:01:14 (0.3314 s / it)
Averaged stats: lr: 0.001000  class_error: 100.00  loss: 18.4685 (23.1879)  loss_ce: 0.6992 (0.7193)  loss_bbox: 1.0062 (1.3909)  loss_giou: 1.4650 (1.7726)  loss_ce_0: 0.7027 (0.7163)  loss_bbox_0: 0.8961 (1.3926)  loss_giou_0: 1.3753 (1.7748)  loss_ce_1: 0.7094 (0.7157)  loss_bbox_1: 0.9046 (1.3604)  loss_giou_1: 1.3026 (1.7393)  loss_ce_2: 0.6922 (0.7161)  loss_bbox_2: 1.0614 (1.4012)  loss_giou_2: 1.3909 (1.7740)  loss_ce_3: 0.7083 (0.7224)  loss_bbox_3: 0.9956 (1.3571)  loss_giou_3: 1.3279 (1.7304)  loss_ce_4: 0.7025 (0.7232)  loss_bbox_4: 1.0279 (1.4045)  loss_giou_4: 1.4262 (1.7770)  loss_ce_unscaled: 0.6992 (0.7193)  class_error_unscaled: 100.0000 (99.7037)  loss_bbox_unscaled: 0.2012 (0.2782)  loss_giou_unscaled: 0.7325 (0.8863)  cardinality_error_unscaled: 3.0000 (3.7622)  loss_ce_0_unscaled: 0.7027 (0.7163)  loss_bbox_0_unscaled: 0.1792 (0.2785)  loss_giou_0_unscaled: 0.6876 (0.8874)  cardinality_error_0_unscaled: 3.0000 (3.3389)  loss_ce_1_unscaled: 0.7094 (0.7157)  loss_bbox_1_unscaled: 0.1809 (0.2721)  loss_giou_1_unscaled: 0.6513 (0.8697)  cardinality_error_1_unscaled: 3.0000 (3.4033)  loss_ce_2_unscaled: 0.6922 (0.7161)  loss_bbox_2_unscaled: 0.2123 (0.2802)  loss_giou_2_unscaled: 0.6955 (0.8870)  cardinality_error_2_unscaled: 3.0000 (3.7278)  loss_ce_3_unscaled: 0.7083 (0.7224)  loss_bbox_3_unscaled: 0.1991 (0.2714)  loss_giou_3_unscaled: 0.6639 (0.8652)  cardinality_error_3_unscaled: 3.0000 (3.7656)  loss_ce_4_unscaled: 0.7025 (0.7232)  loss_bbox_4_unscaled: 0.2056 (0.2809)  loss_giou_4_unscaled: 0.7131 (0.8885)  cardinality_error_4_unscaled: 3.0000 (3.7600)
Test:  [ 0/25]  eta: 0:00:11  class_error: 100.00  loss: 39.7441 (39.7441)  loss_ce: 0.8244 (0.8244)  loss_bbox: 2.4085 (2.4085)  loss_giou: 2.7545 (2.7545)  loss_ce_0: 0.8306 (0.8306)  loss_bbox_0: 3.6152 (3.6152)  loss_giou_0: 3.1071 (3.1071)  loss_ce_1: 0.8320 (0.8320)  loss_bbox_1: 2.8095 (2.8095)  loss_giou_1: 3.0116 (3.0116)  loss_ce_2: 0.8365 (0.8365)  loss_bbox_2: 3.6158 (3.6158)  loss_giou_2: 3.1490 (3.1490)  loss_ce_3: 0.8286 (0.8286)  loss_bbox_3: 2.3526 (2.3526)  loss_giou_3: 2.6735 (2.6735)  loss_ce_4: 0.8391 (0.8391)  loss_bbox_4: 2.5136 (2.5136)  loss_giou_4: 2.7420 (2.7420)  loss_ce_unscaled: 0.8244 (0.8244)  class_error_unscaled: 100.0000 (100.0000)  loss_bbox_unscaled: 0.4817 (0.4817)  loss_giou_unscaled: 1.3773 (1.3773)  cardinality_error_unscaled: 4.0000 (4.0000)  loss_ce_0_unscaled: 0.8306 (0.8306)  loss_bbox_0_unscaled: 0.7230 (0.7230)  loss_giou_0_unscaled: 1.5536 (1.5536)  cardinality_error_0_unscaled: 4.0000 (4.0000)  loss_ce_1_unscaled: 0.8320 (0.8320)  loss_bbox_1_unscaled: 0.5619 (0.5619)  loss_giou_1_unscaled: 1.5058 (1.5058)  cardinality_error_1_unscaled: 4.0000 (4.0000)  loss_ce_2_unscaled: 0.8365 (0.8365)  loss_bbox_2_unscaled: 0.7232 (0.7232)  loss_giou_2_unscaled: 1.5745 (1.5745)  cardinality_error_2_unscaled: 4.0000 (4.0000)  loss_ce_3_unscaled: 0.8286 (0.8286)  loss_bbox_3_unscaled: 0.4705 (0.4705)  loss_giou_3_unscaled: 1.3368 (1.3368)  cardinality_error_3_unscaled: 4.0000 (4.0000)  loss_ce_4_unscaled: 0.8391 (0.8391)  loss_bbox_4_unscaled: 0.5027 (0.5027)  loss_giou_4_unscaled: 1.3710 (1.3710)  cardinality_error_4_unscaled: 4.0000 (4.0000)  time: 0.4770  data: 0.2986  max mem: 5640
Test:  [10/25]  eta: 0:00:03  class_error: 100.00  loss: 41.7285 (42.7447)  loss_ce: 0.7112 (0.7057)  loss_bbox: 2.8246 (2.9384)  loss_giou: 2.9177 (2.9712)  loss_ce_0: 0.7185 (0.7087)  loss_bbox_0: 3.9275 (4.0503)  loss_giou_0: 3.3074 (3.3029)  loss_ce_1: 0.7018 (0.7094)  loss_bbox_1: 3.0020 (3.0818)  loss_giou_1: 3.0118 (3.0638)  loss_ce_2: 0.7163 (0.7117)  loss_bbox_2: 3.7587 (3.9167)  loss_giou_2: 3.3040 (3.2931)  loss_ce_3: 0.7120 (0.7075)  loss_bbox_3: 2.7522 (2.8755)  loss_giou_3: 2.8856 (2.9507)  loss_ce_4: 0.7222 (0.7131)  loss_bbox_4: 2.9204 (3.0646)  loss_giou_4: 2.9303 (2.9795)  loss_ce_unscaled: 0.7112 (0.7057)  class_error_unscaled: 100.0000 (100.0000)  loss_bbox_unscaled: 0.5649 (0.5877)  loss_giou_unscaled: 1.4588 (1.4856)  cardinality_error_unscaled: 3.0000 (3.0682)  loss_ce_0_unscaled: 0.7185 (0.7087)  loss_bbox_0_unscaled: 0.7855 (0.8101)  loss_giou_0_unscaled: 1.6537 (1.6514)  cardinality_error_0_unscaled: 3.0000 (3.0682)  loss_ce_1_unscaled: 0.7018 (0.7094)  loss_bbox_1_unscaled: 0.6004 (0.6164)  loss_giou_1_unscaled: 1.5059 (1.5319)  cardinality_error_1_unscaled: 3.0000 (3.0682)  loss_ce_2_unscaled: 0.7163 (0.7117)  loss_bbox_2_unscaled: 0.7517 (0.7833)  loss_giou_2_unscaled: 1.6520 (1.6465)  cardinality_error_2_unscaled: 3.0000 (3.0682)  loss_ce_3_unscaled: 0.7120 (0.7075)  loss_bbox_3_unscaled: 0.5504 (0.5751)  loss_giou_3_unscaled: 1.4428 (1.4753)  cardinality_error_3_unscaled: 3.0000 (3.0682)  loss_ce_4_unscaled: 0.7222 (0.7131)  loss_bbox_4_unscaled: 0.5841 (0.6129)  loss_giou_4_unscaled: 1.4651 (1.4898)  cardinality_error_4_unscaled: 3.0000 (3.0682)  time: 0.2017  data: 0.0414  max mem: 5640
Test:  [20/25]  eta: 0:00:00  class_error: 100.00  loss: 41.2935 (41.5291)  loss_ce: 0.6903 (0.7185)  loss_bbox: 2.7155 (2.8024)  loss_giou: 2.9177 (2.9212)  loss_ce_0: 0.6937 (0.7217)  loss_bbox_0: 3.8261 (3.8550)  loss_giou_0: 3.2795 (3.2636)  loss_ce_1: 0.7012 (0.7231)  loss_bbox_1: 2.8832 (2.9342)  loss_giou_1: 2.9980 (2.9970)  loss_ce_2: 0.6953 (0.7253)  loss_bbox_2: 3.6141 (3.7033)  loss_giou_2: 3.2122 (3.2223)  loss_ce_3: 0.6916 (0.7206)  loss_bbox_3: 2.6811 (2.7424)  loss_giou_3: 2.8825 (2.9045)  loss_ce_4: 0.6974 (0.7266)  loss_bbox_4: 2.8724 (2.9167)  loss_giou_4: 2.9303 (2.9306)  loss_ce_unscaled: 0.6903 (0.7185)  class_error_unscaled: 100.0000 (100.0000)  loss_bbox_unscaled: 0.5431 (0.5605)  loss_giou_unscaled: 1.4588 (1.4606)  cardinality_error_unscaled: 3.0000 (3.1548)  loss_ce_0_unscaled: 0.6937 (0.7217)  loss_bbox_0_unscaled: 0.7652 (0.7710)  loss_giou_0_unscaled: 1.6397 (1.6318)  cardinality_error_0_unscaled: 3.0000 (3.1548)  loss_ce_1_unscaled: 0.7012 (0.7231)  loss_bbox_1_unscaled: 0.5766 (0.5868)  loss_giou_1_unscaled: 1.4990 (1.4985)  cardinality_error_1_unscaled: 3.0000 (3.1548)  loss_ce_2_unscaled: 0.6953 (0.7253)  loss_bbox_2_unscaled: 0.7228 (0.7407)  loss_giou_2_unscaled: 1.6061 (1.6111)  cardinality_error_2_unscaled: 3.0000 (3.1548)  loss_ce_3_unscaled: 0.6916 (0.7206)  loss_bbox_3_unscaled: 0.5362 (0.5485)  loss_giou_3_unscaled: 1.4412 (1.4522)  cardinality_error_3_unscaled: 3.0000 (3.1548)  loss_ce_4_unscaled: 0.6974 (0.7266)  loss_bbox_4_unscaled: 0.5745 (0.5833)  loss_giou_4_unscaled: 1.4651 (1.4653)  cardinality_error_4_unscaled: 3.0000 (3.1548)  time: 0.1793  data: 0.0158  max mem: 5640
Test:  [24/25]  eta: 0:00:00  class_error: 100.00  loss: 41.2935 (41.5760)  loss_ce: 0.7405 (0.7177)  loss_bbox: 2.8130 (2.8165)  loss_giou: 2.9177 (2.9297)  loss_ce_0: 0.7413 (0.7207)  loss_bbox_0: 3.7636 (3.8434)  loss_giou_0: 3.2680 (3.2679)  loss_ce_1: 0.7550 (0.7228)  loss_bbox_1: 2.8832 (2.9293)  loss_giou_1: 3.0061 (3.0050)  loss_ce_2: 0.7492 (0.7245)  loss_bbox_2: 3.6117 (3.6901)  loss_giou_2: 3.2321 (3.2339)  loss_ce_3: 0.7438 (0.7199)  loss_bbox_3: 2.6811 (2.7477)  loss_giou_3: 2.8856 (2.9175)  loss_ce_4: 0.7476 (0.7257)  loss_bbox_4: 2.8724 (2.9254)  loss_giou_4: 2.9195 (2.9384)  loss_ce_unscaled: 0.7405 (0.7177)  class_error_unscaled: 100.0000 (100.0000)  loss_bbox_unscaled: 0.5626 (0.5633)  loss_giou_unscaled: 1.4588 (1.4649)  cardinality_error_unscaled: 3.0000 (3.1400)  loss_ce_0_unscaled: 0.7413 (0.7207)  loss_bbox_0_unscaled: 0.7527 (0.7687)  loss_giou_0_unscaled: 1.6340 (1.6339)  cardinality_error_0_unscaled: 3.0000 (3.1400)  loss_ce_1_unscaled: 0.7550 (0.7228)  loss_bbox_1_unscaled: 0.5766 (0.5859)  loss_giou_1_unscaled: 1.5031 (1.5025)  cardinality_error_1_unscaled: 3.0000 (3.1400)  loss_ce_2_unscaled: 0.7492 (0.7245)  loss_bbox_2_unscaled: 0.7223 (0.7380)  loss_giou_2_unscaled: 1.6160 (1.6169)  cardinality_error_2_unscaled: 3.0000 (3.1400)  loss_ce_3_unscaled: 0.7438 (0.7199)  loss_bbox_3_unscaled: 0.5362 (0.5495)  loss_giou_3_unscaled: 1.4428 (1.4588)  cardinality_error_3_unscaled: 3.0000 (3.1400)  loss_ce_4_unscaled: 0.7476 (0.7257)  loss_bbox_4_unscaled: 0.5745 (0.5851)  loss_giou_4_unscaled: 1.4597 (1.4692)  cardinality_error_4_unscaled: 3.0000 (3.1400)  time: 0.1772  data: 0.0158  max mem: 5640
Test: Total time: 0:00:04 (0.1923 s / it)
Averaged stats: class_error: 100.00  loss: 41.2935 (41.5760)  loss_ce: 0.7405 (0.7177)  loss_bbox: 2.8130 (2.8165)  loss_giou: 2.9177 (2.9297)  loss_ce_0: 0.7413 (0.7207)  loss_bbox_0: 3.7636 (3.8434)  loss_giou_0: 3.2680 (3.2679)  loss_ce_1: 0.7550 (0.7228)  loss_bbox_1: 2.8832 (2.9293)  loss_giou_1: 3.0061 (3.0050)  loss_ce_2: 0.7492 (0.7245)  loss_bbox_2: 3.6117 (3.6901)  loss_giou_2: 3.2321 (3.2339)  loss_ce_3: 0.7438 (0.7199)  loss_bbox_3: 2.6811 (2.7477)  loss_giou_3: 2.8856 (2.9175)  loss_ce_4: 0.7476 (0.7257)  loss_bbox_4: 2.8724 (2.9254)  loss_giou_4: 2.9195 (2.9384)  loss_ce_unscaled: 0.7405 (0.7177)  class_error_unscaled: 100.0000 (100.0000)  loss_bbox_unscaled: 0.5626 (0.5633)  loss_giou_unscaled: 1.4588 (1.4649)  cardinality_error_unscaled: 3.0000 (3.1400)  loss_ce_0_unscaled: 0.7413 (0.7207)  loss_bbox_0_unscaled: 0.7527 (0.7687)  loss_giou_0_unscaled: 1.6340 (1.6339)  cardinality_error_0_unscaled: 3.0000 (3.1400)  loss_ce_1_unscaled: 0.7550 (0.7228)  loss_bbox_1_unscaled: 0.5766 (0.5859)  loss_giou_1_unscaled: 1.5031 (1.5025)  cardinality_error_1_unscaled: 3.0000 (3.1400)  loss_ce_2_unscaled: 0.7492 (0.7245)  loss_bbox_2_unscaled: 0.7223 (0.7380)  loss_giou_2_unscaled: 1.6160 (1.6169)  cardinality_error_2_unscaled: 3.0000 (3.1400)  loss_ce_3_unscaled: 0.7438 (0.7199)  loss_bbox_3_unscaled: 0.5362 (0.5495)  loss_giou_3_unscaled: 1.4428 (1.4588)  cardinality_error_3_unscaled: 3.0000 (3.1400)  loss_ce_4_unscaled: 0.7476 (0.7257)  loss_bbox_4_unscaled: 0.5745 (0.5851)  loss_giou_4_unscaled: 1.4597 (1.4692)  cardinality_error_4_unscaled: 3.0000 (3.1400)
Accumulating evaluation results...
DONE (t=0.08s).
IoU metric: bbox
 **Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.001
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = -1.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.002**

Could you explain this to me, please?
What is the parameters I must to change?
Thank you.

Can you share the training log for 300 epoches?

Thank you so much for sharing the log for 150 epochs.

Can you share 300 epochs?

What is mean of self[0] and self[1] in Joiner class?

class Joiner(nn.Sequential):
def init(self, backbone, position_embedding):
super().init(backbone, position_embedding)

def forward(self, tensor_list):
    xs = self[0](tensor_list)
    out = []
    pos = []
    for name, x in xs.items():
        out.append(x)
        # position encoding
        pos.append(self[1](x).to(x.tensors.dtype))

    return out, pos

What is the meaning of self[0] and self[1] here?

Many thanks.

Custom dataset training

❓ How to do something using DETR

I am trying to train the resnet50 model with one more class on top of the coco dataset. So I loaded the pretrained model like this -

model = torch.hub.load('facebookresearch/detr', 'detr_resnet50', pretrained=True)

and then i am unfreezing class_embed and bbox_embed

for param in model.parameters():
    param.requires_grad = False

classifier_class = nn.Sequential(nn.Linear(256,128), 
                                 nn.ReLU(), 
                                 nn.Dropout(p=0.2), 
                                 nn.Linear(128,93), 
                                 #nn.LogSoftmax(dim=1)
                                 )

model.class_embed = classifier_class

classifier_bbox = nn.Sequential(nn.Linear(256,256), 
                                nn.ReLU(), 
                                nn.Dropout(p=0.2), 
                                nn.Linear(256,256),
                                nn.ReLU(),
                                nn.Dropout(p=0.2),
                                nn.Linear(256,4),
                                nn.Sigmoid()
                                )

And I am using build_model to get my criterion and postprocesses

dummy, criterion, postprocessors = build_model(data_args)

Optimizer:

optimizer = torch.optim.Adam([{'params': model.class_embed.parameters()}, 
                             {'params': model.bbox_embed.parameters()}], 
                             lr=data_args.lr, weight_decay=data_args.weight_decay)

Now I am loading only 'skyscraper' class using data_loader.

Unfortunately I am getting this error:

RuntimeError: weight tensor should be defined either for all or no classes at /pytorch/aten/src/THCUNN/generic/SpatialClassNLLCriterion.cu:27

Here is the entire code:
https://colab.research.google.com/drive/1L3PLEiOVICgmjyK6JIDjEBFmraVEQYhz?usp=sharing

transformer decoder paraller decoding

how to realize the transformer decoder paraller decoding

Recommendations for training Detr on custom dataset?

Very impressed with the all new innovative architecture in Detr!
Can you clarify recommendations for training on a custom dataset?
Should we build a model similar to demo and train, or better to use and fine tune a full coco pretrained model and adjust the linear layer to desired class count?
Thanks in advance for any input.

ECCV submission ID is visible in ArXiv pdf

nms parameters vs mAP

First of all, thanks for presenting a great paper. It's one of the most innovative papers I've read recently in computer vision and sure many works will follow.

I was interested in the mAP performance with nms in Fig.4.
Do stronger nms (like nms=0.5) have similar mAP performance curves?
Maybe the mAP gets worse since more positive predictions will be deleted..

Consider CIoU or Distance IoU instead of GIou for loss function?

In EfficientDet there was an improvement gain switching to Distance IoU, and suspect the same would hold for DETR with either Distance or Complete IoU.

By incorporating DIoU and CIoU losses into state-of-the-art object detection algorithms, e.g., YOLO v3, SSD and Faster RCNN, we achieve notable performance gains in terms of not only IoU metric but also GIoU metric.

we consider three geometric factors, i.e., overlap area, normalized central point distance and aspect ratio, which are crucial for measuring bounding box regression in object detection and instance segmentation.
The three geometric factors are then incorporated into CIoU loss for better distinguishing difficult regression cases. The training of deep models using CIoU loss results in consistent AP and AR improvements in comparison to widely adopted ℓn-norm loss and IoU-based loss.

Here's the paper discussing CIoU:
https://arxiv.org/abs/2005.03572
and Distance IoU:
https://arxiv.org/abs/1911.08287

and most importantly code:
https://github.com/Zzh-tju/CIoU

Need for Speed

Hi, just a question on speed. The reported inference speeds are on which GPU? Tesla V100 or something less powerful? Thanks

Do the model works well in transfer learning?

Thanks for the amazing work!

I noticed the training time for DETR is 3 days with multi GPUs. I believe this setting is too hard to achieve for most end users.

I would like to know in your study did you try transfer learning in DETR? if so, would you provide related module on that?

How was the DETR demo trained?

tl;dr:

how was the demo model trained?
why does the batch size have to be 1?

Thanks for the amazing work!

I'm very intrigued by the simplicity of DETR, especially the inference demo code. I was wondering how the demo model was trained, since you guys do provide pretrained weights for it. I'm asking this particularly because the inference code says that it only supports a batch size of 1. Does the batch size have to be 1 during training? Also, why does it have to be 1, either in training or inference?

Thank you so much for your time!

Training with resnet18

❓ How to train DETR with resnet18 backbone?

Describe what you want to do, including:

I'm trying to run training on my 2080ti with resnet18 backbone and getting an error
I started with default command, but end up with it still unsuccessfully: python -m torch.distributed.launch --nproc_per_node=1 --use_env main.py --num_queries 2000 --pre_norm --masks --output_dir output --eval --num_workers 4 --enc_layers 2 --dec_layers 2 --dim_feedforward 512 --backbone resnet18 --hidden_dim 128
I'm detecting small round like objects with a simple background. There're 200-2000 objects per image.

Could you, please, help me run with resnet18? Any advice regarding optimal parameters to start for my task are appreciated!

Traceback:

  File "main.py", line 248, in <module>
    main(args)
  File "main.py", line 186, in main
    data_loader_val, base_ds, device, args.output_dir)
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 15, in decorate_context
    return func(*args, **kwargs)
  File "/home/user/Documents/repos/detr/engine.py", line 92, in evaluate
    outputs = model(samples)
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 445, in forward
    output = self.module(*inputs[0], **kwargs[0])
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/Documents/repos/detr/models/segmentation.py", line 57, in forward
    seg_masks = self.mask_head(src_proj, bbox_mask, [features[2].tensors, features[1].tensors, features[0].tensors])
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/Documents/repos/detr/models/segmentation.py", line 110, in forward
    cur_fpn = self.adapter1(fpns[0])
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 349, in forward
    return self._conv_forward(input, self.weight)
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/nn/modules/conv.py", line 346, in _conv_forward
    self.padding, self.dilation, self.groups)
RuntimeError: Given groups=1, weight of size [64, 1024, 1, 1], expected input[2, 256, 50, 50] to have 1024 channels, but got 256 channels instead
Traceback (most recent call last):
  File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/home/user/.pyenv/versions/3.7.7/lib/python3.7/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/user/.virtualenvs/jupyter/lib/python3.7/site-packages/torch/distributed/launch.py", line 259, in main
    cmd=cmd)

the model report on the paper is trained by 300 epoches or 500 epoches?

In your README, it seems that the final model is trained by 300 epochs with a learning rate drop at 200 epochs.

However, in the following link, it seems like 42.0 is trained by 500 epochs with a learning rate drop at 400 epochs.

Can you clarify?

python -m torch.distributed.launch --nproc_per_node=8 --use_env main.py
--lr_drop 400 --epochs 500
--coco_path /path/to/coco

https://gist.github.com/szagoruyko/9c9ebb8455610958f7deaa27845d7918

Problems with detect large objects?

Inference result on some images coco:

Stucked at the beginning of training

Hi, I am trying to run the DETR on my local machine. But both training process gets stuck at the beginning stage, as follows

I am using Pytorch 1.5 and torchvision 0.6. And the faster-rcnn model can be trained on the coco dataset wihtout the problem.

I am wondering the problem may come from the Dataloader part. Could you provide some hints on this ? Thanks!

ModuleNotFoundError: No module named 'util'

Hi and thanks for the code!
When I try to load detr it gives :

from detr.models import detr


 7 from torch import nn
      8 
----> 9 from util import box_ops
     10 from util.misc import (NestedTensor, accuracy, get_world_size, interpolate,
     11                        is_dist_avail_and_initialized)

ModuleNotFoundError: No module named 'util'

from detr.engine import evaluate

10 import torch
     11 
---> 12 import util.misc as utils
     13 from datasets.coco_eval import CocoEvaluator
     14 from datasets.panoptic_eval import PanopticEvaluator

ModuleNotFoundError: No module named 'util'

Suggestion - change to ResNeST50 backbone (new split attention arch)

One idea to jump DETR's impressive results might be to swap in the new ResNeST50 backbone (released last month by Amazon AI and UCDavis).
In all of the architectures they tested, it immediately provided 3-4% AP boost for Coco.

This improvement also helps downstream tasks including object detection, instance segmentation and semantic segmentation. For example, by simply replace the ResNet-50 backbone with ResNeSt-50, we improve the mAP of Faster-RCNN on MS-COCO from 39.3% to 42.3% and the mIoU for DeeplabV3 on ADE20K from 42.1% to 45.1%.

It should plug and play right in. I've been using it for classification work and was a nice improvement there, and the concept of better global context maps to the improvements DETR is providing for the head architecture.

https://arxiv.org/abs/2004.08955v1
https://github.com/zhanghang1989/ResNeSt

(I plan to test this out on my own datasets, but will not have time to train it on Coco proper and I think conceptually it's a great match for DETR regardless).

Question about residual connection

Hi, thank you so much for your work!

I have one question about the self-attention implementation. In the paper Attention is All You Need, the residual connection is made upon input embeddings + positional encoding as shown in the figure below.

In the paper, the figure seems to match above as shown in the paper below.

However, in the code, it looks to me that the residual connection is made upon input embeddings only (the src), also see figure below. Is this a mistake or there is a reason for such modification? Thank you!

AccessDenied when downloading panoptic models

I get an AccessDenied Error when I try to download the panoptic model .pth files linked in the readme. Url for normal models work fine.

Why 'num_classes=91'?

Hi, great work.
I read your code but I found you set 'num_classes=91' for coco detection.
But coco detection has 80 categories. May you explain why you set this=91?
Thanks very much~

custom training asserts with "degenerate bboxes" over and over - but bboxes look correct, any debugging insight?

I'm trying to get my custom dataset working but I can't get past 8 or so images via get_item and it keeps asserting that my bboxes are bad..I pull that one, it flags the next one, I pull that one, it flags the next...

From reading the code it wants to check that x1 and y1 are larger than x0 and y0 which is a great check.

55   assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

But it keeps flagging images that when I unwind from coco format should be fine...thus any insights? I was not able to print the boxes1 (200,4) and boxes 2 (12,4) tensors for some reason so I couldn't see into what it was actually calculating for the results (threw an odd gpu issue with 'formatting').

Example it flagged this image as being bad - here's the JSON for it in coco format, 6 classes. 1 box will surround all the other 5 objects btw as it's a malaria reader, so not sure if that box encompassing other boxes is really the issue?):

{"id": "c33c3539-8bd1-48e0-8065-831709e5e64d", "image_id": 3091210, "category_id": 2905442, "segmentation": null, "area": 0, "bbox": **[499, 121, 177, 80]**, "iscrowd": 0}, 
{"id": "0023d71e-e1e9-4862-a0b8-6e2bc3982b3b", "image_id": 3091210, "category_id": 2905422, "segmentation": null, "area": 0, "bbox": **[492, 523, 187, 163]**, "iscrowd": 0},
 {"id": "726fdfbc-3801-409d-ab75-ccf951e74316", "image_id": 3091210, "category_id": 2905421, "segmentation": null, "area": 0, "bbox": **[496, 428, 181, 93],** "iscrowd": 0}, 
{"id": "2bf85a8e-108d-4875-b0f5-47c8e5cb13e0", "image_id": 3091210, "category_id": 2905420, "segmentation": null, "area": 0, "bbox": **[494, 272, 186, 169]**, "iscrowd": 0},
 {"id": "8669c13a-1205-4e94-a645-18e2ffa491d0", "image_id": 3091210, "category_id": 2905419, "segmentation": null, "area": 0, "bbox": **[489, 127, 193, 557]**, "iscrowd": 0},
 {"id": "d9619859-e0ef-4632-ad51-7237a5760a5e", "image_id": 3091210, "category_id": 2905418, "segmentation": null, "area": 0, "bbox": **[495, 203, 182, 73]**, "iscrowd": 0},

And as a check for me, here's coco format:
The COCO bounding box format is [top left x position, top left y position, width, height].

All the bboxes which it flags, are positive numbers for width and height, so the x1 and y1 must be larger than x0 and y0 - only a negative number added to the original x0 or y0 could result in it being smaller...so I'm unclear what it is asserting on or for.

But it asserts here:

~/detr/util/box_ops.py in generalized_box_iou(boxes1, boxes2)
     53     #print(boxes1)
     54     #print(boxes2)
---> 55     assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
     56     assert (boxes2[:, 2:] >= boxes2[:, :2]).all()
     57     iou, union = box_iou(boxes1, boxes2)

I've removed 15+ images trying to get it to actually train but just keeps flagging more and more as invalid bboxes. I remove one image, then it asserts on the next one...and in reviewing the ones it flags vs the ones it lets pass, I don't see any real difference. (I have trained with this same dataset on EffficientDet so I know the dataset is reasonable).

Thus any help into debugging, or what might be awry would be appreciated.
Thanks!

postprocess

Hi, I didn't see nms in postprocess. Why you don't you nms and could you please explain how does postprocess work?

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.