Git Product home page Git Product logo

dn-detr's People

Contributors

fengli-ust avatar haozhang534 avatar lymdlut avatar sangbumchoi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dn-detr's Issues

Sizes of tensors must match except in dimension 1.

Traceback (most recent call last):
File "main.py", line 428, in
main(args)
File "main.py", line 388, in main
wo_class_error=wo_class_error, args=args, logger=(logger if args.save_log else None)
File "/home/cxq/.conda/envs/torch10/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/cxq/dp_work/objectdetection/DN-DETR/engine.py", line 221, in evaluate
res_info = torch.cat((_res_bbox, _res_prob.unsqueeze(-1), _res_label.unsqueeze(-1)), 1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 900 but got size 300 for tensor number 1 in the list

This problem occurs when the code is tested after it has been trained

An error occurs when using the --save_results argument

Object Detection and Inference Image

Hi there,

Amazing Job!! Thanks for your guys~

I am wondering if you use this model for the object detection detection. Could you release your inference code? Thanks for your help!

Batch size effects

On my machine, I can only run a size 1 batch, how much will this degrade the results? I run with exactly the same parameters as yours the best one, except batch size, and the quality is much worse than MASK-RCNN
Screenshot 2022-06-17 at 17 27 15
.

details about the implementation

Hi, thanks for bringing new insights to the DETR series. DN-DETR is really an excellent work that can get such high performance with only 12 epochs.

After reading the paper, I have several questions about the detailed implementation of DN-DETR.

  1. about the class embedding. According to the description of the class embedding in the paper and the discussion in the issue #3, the class embedding can be achieved by two different ways: (1) use a pre-trained language model to generate the embedding for the word of classes (classes of COCO with an unknown class: [person], [bicycle], [car], ..., [toothbrush], [unknown]); (2) use one-hot vector to represent different classes, then use Linear layer or MLP to project the one-hot vector to the latent space. Could you give more details about the implementation?
  2. about the learning rate. I notice that DN-DETR uses an initial learning rate of 1e-5 with a batch size of 16 (Sec 5.1), which is different from the one in DAB-DETR(lr: 1e-4, lr_backbone: 1e-5 with a batch size of 16). Is it a typo or intended? If the learning rate is adjusted in DN-DETR, could you kindly report the gains of adjusting the learning rate?

Looking forward to a reply. Thanks in advance!

dilation convolution and two-stage strategy

Thanks for your great work,
In your pre-trained model DN-DAB-Deformable-DETR-R50-v24, you did not use dilation convolution and two-stage strategy, what I want to know is, if using these two strategies can further improve the performance?
Looking forward to your reply!

RuntimeError: "ms_deform_attn_forward_cuda" not implemented for 'Half'

When I try to use mixed precision training, the program reports an error:
Traceback (most recent call last):
File "main.py", line 414, in
main(args)
File "main.py", line 335, in main
args.clip_max_norm, wo_class_error=wo_class_error, lr_scheduler=lr_scheduler, args=args, logger=(logger if args.save_log else None))
File "/home/lyz/DN-DETR/engine.py", line 48, in train_one_epoch
outputs, mask_dict = model(samples, dn_args=(targets, args.scalar, args.label_noise_scale, args.box_noise_scale, args.num_patterns))
File "/home/lyz/anaconda3/envs/pytorch-1.8.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lyz/DN-DETR/models/dn_dab_deformable_detr/dab_deformable_detr.py", line 225, in forward
hs, init_reference, inter_references, enc_outputs_class, enc_outputs_coord_unact = self.transformer(srcs, masks, pos, query_embeds, attn_mask)
File "/home/lyz/anaconda3/envs/pytorch-1.8.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lyz/DN-DETR/models/dn_dab_deformable_detr/deformable_transformer.py", line 173, in forward
memory = self.encoder(src_flatten, spatial_shapes, level_start_index, valid_ratios, lvl_pos_embed_flatten, mask_flatten)
File "/home/lyz/anaconda3/envs/pytorch-1.8.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lyz/DN-DETR/models/dn_dab_deformable_detr/deformable_transformer.py", line 281, in forward
output = layer(output, pos, reference_points, spatial_shapes, level_start_index, padding_mask)
File "/home/lyz/anaconda3/envs/pytorch-1.8.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lyz/DN-DETR/models/dn_dab_deformable_detr/deformable_transformer.py", line 232, in forward
src2 = self.self_attn(self.with_pos_embed(src, pos), reference_points, src, spatial_shapes, level_start_index, padding_mask)
File "/home/lyz/anaconda3/envs/pytorch-1.8.1/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/lyz/DN-DETR/models/dn_dab_deformable_detr/ops/modules/ms_deform_attn.py", line 113, in forward
value, input_spatial_shapes, input_level_start_index, sampling_locations, attention_weights, self.im2col_step)
File "/home/lyz/DN-DETR/models/dn_dab_deformable_detr/ops/functions/ms_deform_attn_func.py", line 26, in forward
value, value_spatial_shapes, value_level_start_index, sampling_locations, attention_weights, ctx.im2col_step)
RuntimeError: "ms_deform_attn_forward_cuda" not implemented for 'Half'
may I ask why?

请问下pad_size为什么是int(max(known_num))

您好,我把一些中间变量的shape打出来看了下,有个地方不太明白
我的理解是这样的,只讨论tgt部分,300维是可学习的编码,然后pad部分是存放添加了噪声的label
image

如此图,batchsize为2,两张图片的label数量分别为4和16,然后噪声label的tensor经过repeat scalar次后shape变为20×5=100
但是pad_size只设置为known_num的最大值的话,pad部分大小为16×5=80.
那这样的话新的tgt大小为380,但是噪声label是100,会占用掉非去噪部分的20

当然如果按您给的训练参数batch_size=1的话不会存在这个问题,但是batch_size为1有点慢,针对batchsize>1可否设置成pad_size=sum(known_num)呢,这里的改动会影响整个模型的性能吗。谢谢。

Segmentation head

Hi,

Please, tell me how to run your code with segmentation head? --masks - doesn't work

The cup load is high when training

Hello, thanks for your wonderful work!
I modified a little code to train on my own dataset (the same format as coco) but the cpu load seems a little high (over 50%). But when using dab-detr to train the same dataset, the cpu load is very low.
Is it normal or dose it need some improvement?

Some details about implementation

  1. Does DN-DETR only add class label embedding to the content queries (tgt in the code) in cross-attention module of first decoder layer as Conditional DETR or DAB-DETR does?

class_embed

self.class_embed = nn.Linear(hidden_dim, num_classes)

self.bbox_embed = MLP(hidden_dim, hidden_dim, 4, 3)

self.num_feature_levels = num_feature_levels

self.use_dab = use_dab

self.num_patterns = num_patterns

self.random_refpoints_xy = random_refpoints_xy

self.label_enc = nn.Embedding(num_classes + 1, hidden_dim - 1)

请问为什么这里class_embed使用的是91类,而label_enc使用的是92类?标准的DETR里class_embed里似乎是

nn.Linear(hidden_dim, num_classes+1)
Why class_embed use class 91 here and label_enc use class 92?

Adding the DINO component to DN-DETR

Hi, authors,

Thank you for opening your fantastic project.

I was very impressed on your successive project DN-DETR and DINO,

so I have merged DINO component to this precedent Deformable DETR based DN-DETR, which is a little bit different from official-DINO.

Do you authors, by any chance, interested in to merge DINO into this DN-DETR?

If so, please let me know and prepare the code sharing.
Because you already have your own official DINO repo, maybe you don't want to mix DN-DETR with another DINO code,
That's ok, and in that case, I am considering to take another way to open my implementation

Thanks.

Denoising part

First, thanks for your excellent work,but I hava a question.
The Figure 3b in paper. Does it contain denoising part? or omitted it in this figure.
I read the issue about your previous answer, class label embedding is noised label right?, but include noised box?

The normalization of sine positional embedding

hi,
First, thanks for your perfect work.
When i learn your code, i found a little question about the sine positional embedding.I think - 0.5 here should be out of the brackets, and i dont know if there will be some influence on your temperature tuning experiment in DAB-DETR paper.

1677237149618

Parameters from the article

Hi,

First of all, thanks for uploading your code.

Please tell me with what parameters you need to run your code in order to repeat the results from the article.

AP = 0

I trained 12 epoches with dn_dab_detr and coco2017, but the result of AP=0. Could someone tell me where the problem is?????
Here is the parameters and result:
detr --coco_path ../datasets/coco2017 --use_dn --amp --dilation
IoU metric: bbox
Average Precision (AP) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.75 | area= all | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Precision (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 1 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets= 10 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= all | maxDets=100 ] = 0.001
Average Recall (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.000
Average Recall (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.003
Training time 6 days, 0:10:43
Now time: 2022-12-08 12:12:42.811224

How to append the indicator to the label embedding?

Thanks for your excellent work! I have two questions about the label embedding:

  1. For any query, it has its own one hot vector which has 81 dimensionalities (80 classes in COCO dataset and 1 for unknown class)?
    Than we can embed the one-hot vector to get a label embedding by an MLP?
  2. The indicator which is used to differentiate between a denoising part query and the matching part query is 1 or zero,
    How to append the indicator to the label embedding? Just concatenate the scalar to the end of label embedding?

How Known Labels Detection implemented?

Thanks for your excellent work.
Could you give more details how Known Labels Detection implemented?How do you let the decoder output all boxes of specific class c only using the label embedding of class c?

how to add DN to Vallina-DETR like model

Thanks for this amazing work! I have some question about adding DN to a Vallina-DETR like model.

Could you explain more about how can I use DN for a Vallina-DETR like algorithm?
Because the Vallina-DETR's object quries are not anchor-like, thus I don't know how to change an denoised gt(dim=a) to a obj query(dim=b && b != a)?
Will a learnable nn.linear or other oporator work?

Looking forward to your reply!

关于“inference.py”

请问使用预测文件出现以下问题怎么解决呀?!
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cpu and cuda:0! (when checking argument for argument index in method wrapper__index_select)

About plot_logs

Hello, thanks for your wonderful work!

When I finish training and get log.txt, I want to visualize it using plot_logs, as follows:

image

But I get an ERROR on this line:
https://github.com/IDEA-opensource/DN-DETR/blob/f41c276fe0af61a8acfbd32dfdde5d00291b3cf9/util/plot_utils.py#L65

Traceback (most recent call last):
  File "H:/yjs/code/DN-DETR-main/tmp.py", line 53, in <module>
    fig, axs = plot_logs(log_path)
  File "H:\yjs\code\DN-DETR-main\util\plot_utils.py", line 65, in plot_logs
    df.interpolate().ewm(com=ewm_col).mean().plot(
  File "H:\Anaconda\lib\site-packages\pandas\util\_decorators.py", line 311, in wrapper
    return func(*args, **kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\frame.py", line 10712, in interpolate
    return super().interpolate(
  File "H:\Anaconda\lib\site-packages\pandas\core\generic.py", line 6899, in interpolate
    new_data = obj._mgr.interpolate(
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\managers.py", line 377, in interpolate
    return self.apply("interpolate", **kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\managers.py", line 327, in apply
    applied = getattr(b, f)(**kwargs)
  File "H:\Anaconda\lib\site-packages\pandas\core\internals\blocks.py", line 1369, in interpolate
    new_values = values.fillna(value=fill_value, method=method, limit=limit)
  File "H:\Anaconda\lib\site-packages\pandas\core\arrays\_mixins.py", line 218, in fillna
    value, method = validate_fillna_kwargs(
  File "H:\Anaconda\lib\site-packages\pandas\util\_validators.py", line 372, in validate_fillna_kwargs
    method = clean_fill_method(method)
  File "H:\Anaconda\lib\site-packages\pandas\core\missing.py", line 120, in clean_fill_method
    raise ValueError(f"Invalid fill method. Expecting {expecting}. Got {method}")
ValueError: Invalid fill method. Expecting pad (ffill) or backfill (bfill). Got linear

My version of pandas is 1.3.5

I don't know if I am using it in a wrong way or it is a bug in pandas, how can I fix it?

About the loss

Where is the reconstruction loss? I can not find the loss in DABDETR.py? Thanks

Trained checkpoints for DN-Detr

6 DN-DAB-Deformable-DETR-R50-v24 R50 49.5 (48.4 in 24 epochs) Google Drive / BaiDu Optimized implementation with deformable attention in both encoder and decoder. See DAB-DETR for more details.

I download checkpoint0049.pth from this url. It namely seems to be model trained in 50 epochs. But I got the results below when testing:

 Average Precision  (AP) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.484
 Average Precision  (AP) @[ IoU=0.50      | area=   all | maxDets=100 ] = 0.665
 Average Precision  (AP) @[ IoU=0.75      | area=   all | maxDets=100 ] = 0.526
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.301
 Average Precision  (AP) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.517
 Average Precision  (AP) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.639
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=  1 ] = 0.361
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets= 10 ] = 0.590
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=   all | maxDets=100 ] = 0.626
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= small | maxDets=100 ] = 0.436
 Average Recall     (AR) @[ IoU=0.50:0.95 | area=medium | maxDets=100 ] = 0.673
 Average Recall     (AR) @[ IoU=0.50:0.95 | area= large | maxDets=100 ] = 0.780

Is this normal?

Why call it DN-DETR rather than DN-DAB-DETR?

Glad to see the great work and excitedly awaiting the release of the code, but still got a concern:
Why call it DN-DETR rather than DN-DAB-DETR?
Since the main method of the paper was built upon DAB-DETR and the sota numbers were achieved based on the DAB-DETR, then I guess it should be called DN-DAB-DETR. The name DN-DETR sounds like the work was based on vanilla DETR without changing the original Transformer decoder. I saw that your experiments based on Deformable-DETR were called DN-Deformable-DETR, the name of DN-DETR becomes more misleading with respect to that.
And the paper said the code will be organized as a plugin-module that can be applied to any DETR-like models including vanilla DETR, so what should it be called when applying the denoising method to vanilla DETR? Also the result of that case seems didn't appear on the results table.
Many thanks

how to add DN to Anchor DETR with 2D Anchors

Thanks for this amazing work!
DN-DETR: Accelerate DETR Training by Introducing Query DeNoising introduced that DN can be added to Anchor DETR, but I didn't find the relevant code in this project?
Looking forward to your reply!

How to calculate flops

Hi! thanks for your excellent work, I'm wondering how to evaluate the flops of DN-DETR model?
I can't easily use the DETR script below because of the dn_components.
facebookresearch/detr#110
Could you pls share your python script?

util/misc.py collate_fn函数

def collate_fn(batch): # import ipdb; ipdb.set_trace() batch = list(zip(*batch)) batch[0] = nested_tensor_from_tensor_list(batch[0]) return tuple(batch)
这里看到在nested_tensor_from_tensor_list中对batch[0],也就是训练图片的每个batch都做了像最大size进行padding的操作将一个batch的图片size保持一致,但是这里不需要对box进行修正吗?感觉box的cx cy w h还是用的修正前的图像坐标使用的?

How to use --drop_lr_now

Thank you for your excellent job! I wonder how to use and when to use --drop_lr_now?
Thank you!

use pretrain model error

Hi!I'm in trouble
I train my own datase with pre-trained models
--pretrain_model_path checkpoint.pth
the result error:
Traceback (most recent call last):
File "C:/Users/20825/Desktop/detr_code/DN-DETR-main/main.py", line 426, in
main(args)
File "C:/Users/20825/Desktop/detr_code/DN-DETR-main/main.py", line 352, in main
train_stats = train_one_epoch(
File "C:\Users\20825\Desktop\detr_code\DN-DETR-main\engine.py", line 52, in train_one_epoch
outputs = model(samples)
File "C:\anaconda\envs\Detr\lib\site-packages\torch\nn\modules\module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "C:\Users\20825\Desktop\detr_code\DN-DETR-main\models\DN_DAB_DETR\DABDETR.py", line 176, in forward
prepare_for_dn(dn_args, embedweight, src.size(0), self.training, self.num_queries, self.num_classes,
File "C:\Users\20825\Desktop\detr_code\DN-DETR-main\models\DN_DAB_DETR\dn_components.py", line 61, in prepare_for_dn
targets, scalar, label_noise_scale, box_noise_scale, num_patterns = dn_args
TypeError: cannot unpack non-iterable NoneType object

Thank you for your answer

assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

Traceback (most recent call last):
File "main.py", line 427, in
main(args)
File "main.py", line 355, in main
args.clip_max_norm, wo_class_error=wo_class_error, lr_scheduler=lr_scheduler, args=args, logger=(logger if args.save_log else None))
File "/home/cxq/dp_work/objectdetection/DN-DETR/engine.py", line 50, in train_one_epoch
loss_dict = criterion(outputs, targets, mask_dict)
File "/home/cxq/.conda/envs/torch10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cxq/dp_work/objectdetection/DN-DETR/models/DN_DAB_DETR/DABDETR.py", line 371, in forward
indices = self.matcher(outputs_without_aux, targets)
File "/home/cxq/.conda/envs/torch10/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1102, in _call_impl
return forward_call(*input, **kwargs)
File "/home/cxq/.conda/envs/torch10/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
return func(*args, **kwargs)
File "/home/cxq/dp_work/objectdetection/DN-DETR/models/DN_DAB_DETR/matcher.py", line 83, in forward
cost_giou = -generalized_box_iou(box_cxcywh_to_xyxy(out_bbox), box_cxcywh_to_xyxy(tgt_bbox))
File "/home/cxq/dp_work/objectdetection/DN-DETR/util/box_ops.py", line 52, in generalized_box_iou
assert (boxes1[:, 2:] >= boxes1[:, :2]).all()

I'm not sure why I reported this error during training, I've trained Epoch: [38] [12440/14785] eta: 0:17:40

TypeError: cannot unpack non-iterable NoneType object

D:\anaconda3.9\envs\zj\python.exe F:/1chen/DETR/jin/dn/DN-DETR/main.py
Not using distributed mode
[08/13 08:31:54.869]: git:
sha: a59a5de, status: has uncommited changes, branch: main

[08/13 08:31:54.869]: Command: F:/1chen/DETR/jin/dn/DN-DETR/main.py
[08/13 08:31:54.869]: Full config saved to log/r50\config.json
[08/13 08:31:54.869]: world size: 1
[08/13 08:31:54.869]: rank: 0
[08/13 08:31:54.869]: local_rank: 0
[08/13 08:31:54.870]: args: Namespace(amp=False, aux_loss=True, backbone='resnet50', backbone_freeze_keywords=None, batch_norm_type='FrozenBatchNorm2d', batch_size=2, bbox_loss_coef=5, box_noise_scale=0.4, clip_max_norm=0.1, cls_loss_coef=1, coco_panoptic_path=None, coco_path='COCODIR', dataset_file='coco', debug=False, dec_layers=6, dec_n_points=4, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_url='env://', distributed=False, drop_lr_now=False, dropout=0.0, enc_layers=6, enc_n_points=4, eos_coef=0.1, epochs=10, eval=False, find_unused_params=False, finetune_ignore=None, fix_size=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, label_noise_scale=0.2, local_rank=0, lr=0.0001, lr_backbone=1e-05, lr_drop=40, mask_loss_coef=1, masks=False, modelname='dn_dab_deformable_detr', nheads=8, note='', num_feature_levels=4, num_patterns=0, num_queries=300, num_select=300, num_workers=10, output_dir='log/r50', pe_temperatureH=20, pe_temperatureW=20, position_embedding='sine', pre_norm=False, pretrain_model_path=None, random_refpoints_xy=False, rank=0, remove_difficult=False, resume='', return_interm_layers=False, save_checkpoint_interval=10, save_log=False, save_results=False, scalar=5, seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_giou=2, start_epoch=0, transformer_activation='prelu', two_stage=False, use_dn=False, weight_decay=0.0001, world_size=1)

Namespace(amp=False, aux_loss=True, backbone='resnet50', backbone_freeze_keywords=None, batch_norm_type='FrozenBatchNorm2d', batch_size=2, bbox_loss_coef=5, box_noise_scale=0.4, clip_max_norm=0.1, cls_loss_coef=1, coco_panoptic_path=None, coco_path='COCODIR', dataset_file='coco', debug=False, dec_layers=6, dec_n_points=4, device='cuda', dice_loss_coef=1, dilation=False, dim_feedforward=2048, dist_url='env://', distributed=False, drop_lr_now=False, dropout=0.0, enc_layers=6, enc_n_points=4, eos_coef=0.1, epochs=10, eval=False, find_unused_params=False, finetune_ignore=None, fix_size=False, focal_alpha=0.25, frozen_weights=None, giou_loss_coef=2, hidden_dim=256, label_noise_scale=0.2, local_rank=0, lr=0.0001, lr_backbone=1e-05, lr_drop=40, mask_loss_coef=1, masks=False, modelname='dn_dab_deformable_detr', nheads=8, note='', num_feature_levels=4, num_patterns=0, num_queries=300, num_select=300, num_workers=10, output_dir='log/r50', pe_temperatureH=20, pe_temperatureW=20, position_embedding='sine', pre_norm=False, pretrain_model_path=None, random_refpoints_xy=False, rank=0, remove_difficult=False, resume='', return_interm_layers=False, save_checkpoint_interval=10, save_log=False, save_results=False, scalar=5, seed=42, set_cost_bbox=5, set_cost_class=2, set_cost_giou=2, start_epoch=0, transformer_activation='prelu', two_stage=False, use_dn=False, weight_decay=0.0001, world_size=1)
[08/13 08:31:55.431]: number of params:47206754
[08/13 08:31:55.433]: params:
{
"transformer.level_embed": 1024,
"transformer.encoder.layers.0.self_attn.sampling_offsets.weight": 65536,
"transformer.encoder.layers.0.self_attn.sampling_offsets.bias": 256,
"transformer.encoder.layers.0.self_attn.attention_weights.weight": 32768,
"transformer.encoder.layers.0.self_attn.attention_weights.bias": 128,
"transformer.encoder.layers.0.self_attn.value_proj.weight": 65536,
"transformer.encoder.layers.0.self_attn.value_proj.bias": 256,
"transformer.encoder.layers.0.self_attn.output_proj.weight": 65536,
"transformer.encoder.layers.0.self_attn.output_proj.bias": 256,
"transformer.encoder.layers.0.norm1.weight": 256,
"transformer.encoder.layers.0.norm1.bias": 256,
"transformer.encoder.layers.0.linear1.weight": 524288,
"transformer.encoder.layers.0.linear1.bias": 2048,
"transformer.encoder.layers.0.linear2.weight": 524288,
"transformer.encoder.layers.0.linear2.bias": 256,
"transformer.encoder.layers.0.norm2.weight": 256,
"transformer.encoder.layers.0.norm2.bias": 256,
"transformer.encoder.layers.1.self_attn.sampling_offsets.weight": 65536,
"transformer.encoder.layers.1.self_attn.sampling_offsets.bias": 256,
"transformer.encoder.layers.1.self_attn.attention_weights.weight": 32768,
"transformer.encoder.layers.1.self_attn.attention_weights.bias": 128,
"transformer.encoder.layers.1.self_attn.value_proj.weight": 65536,
"transformer.encoder.layers.1.self_attn.value_proj.bias": 256,
"transformer.encoder.layers.1.self_attn.output_proj.weight": 65536,
"transformer.encoder.layers.1.self_attn.output_proj.bias": 256,
"transformer.encoder.layers.1.norm1.weight": 256,
"transformer.encoder.layers.1.norm1.bias": 256,
"transformer.encoder.layers.1.linear1.weight": 524288,
"transformer.encoder.layers.1.linear1.bias": 2048,
"transformer.encoder.layers.1.linear2.weight": 524288,
"transformer.encoder.layers.1.linear2.bias": 256,
"transformer.encoder.layers.1.norm2.weight": 256,
"transformer.encoder.layers.1.norm2.bias": 256,
"transformer.encoder.layers.2.self_attn.sampling_offsets.weight": 65536,
"transformer.encoder.layers.2.self_attn.sampling_offsets.bias": 256,
"transformer.encoder.layers.2.self_attn.attention_weights.weight": 32768,
"transformer.encoder.layers.2.self_attn.attention_weights.bias": 128,
"transformer.encoder.layers.2.self_attn.value_proj.weight": 65536,
"transformer.encoder.layers.2.self_attn.value_proj.bias": 256,
"transformer.encoder.layers.2.self_attn.output_proj.weight": 65536,
"transformer.encoder.layers.2.self_attn.output_proj.bias": 256,
"transformer.encoder.layers.2.norm1.weight": 256,
"transformer.encoder.layers.2.norm1.bias": 256,
"transformer.encoder.layers.2.linear1.weight": 524288,
"transformer.encoder.layers.2.linear1.bias": 2048,
"transformer.encoder.layers.2.linear2.weight": 524288,
"transformer.encoder.layers.2.linear2.bias": 256,
"transformer.encoder.layers.2.norm2.weight": 256,
"transformer.encoder.layers.2.norm2.bias": 256,
"transformer.encoder.layers.3.self_attn.sampling_offsets.weight": 65536,
"transformer.encoder.layers.3.self_attn.sampling_offsets.bias": 256,
"transformer.encoder.layers.3.self_attn.attention_weights.weight": 32768,
"transformer.encoder.layers.3.self_attn.attention_weights.bias": 128,
"transformer.encoder.layers.3.self_attn.value_proj.weight": 65536,
"transformer.encoder.layers.3.self_attn.value_proj.bias": 256,
"transformer.encoder.layers.3.self_attn.output_proj.weight": 65536,
"transformer.encoder.layers.3.self_attn.output_proj.bias": 256,
"transformer.encoder.layers.3.norm1.weight": 256,
"transformer.encoder.layers.3.norm1.bias": 256,
"transformer.encoder.layers.3.linear1.weight": 524288,
"transformer.encoder.layers.3.linear1.bias": 2048,
"transformer.encoder.layers.3.linear2.weight": 524288,
"transformer.encoder.layers.3.linear2.bias": 256,
"transformer.encoder.layers.3.norm2.weight": 256,
"transformer.encoder.layers.3.norm2.bias": 256,
"transformer.encoder.layers.4.self_attn.sampling_offsets.weight": 65536,
"transformer.encoder.layers.4.self_attn.sampling_offsets.bias": 256,
"transformer.encoder.layers.4.self_attn.attention_weights.weight": 32768,
"transformer.encoder.layers.4.self_attn.attention_weights.bias": 128,
"transformer.encoder.layers.4.self_attn.value_proj.weight": 65536,
"transformer.encoder.layers.4.self_attn.value_proj.bias": 256,
"transformer.encoder.layers.4.self_attn.output_proj.weight": 65536,
"transformer.encoder.layers.4.self_attn.output_proj.bias": 256,
"transformer.encoder.layers.4.norm1.weight": 256,
"transformer.encoder.layers.4.norm1.bias": 256,
"transformer.encoder.layers.4.linear1.weight": 524288,
"transformer.encoder.layers.4.linear1.bias": 2048,
"transformer.encoder.layers.4.linear2.weight": 524288,
"transformer.encoder.layers.4.linear2.bias": 256,
"transformer.encoder.layers.4.norm2.weight": 256,
"transformer.encoder.layers.4.norm2.bias": 256,
"transformer.encoder.layers.5.self_attn.sampling_offsets.weight": 65536,
"transformer.encoder.layers.5.self_attn.sampling_offsets.bias": 256,
"transformer.encoder.layers.5.self_attn.attention_weights.weight": 32768,
"transformer.encoder.layers.5.self_attn.attention_weights.bias": 128,
"transformer.encoder.layers.5.self_attn.value_proj.weight": 65536,
"transformer.encoder.layers.5.self_attn.value_proj.bias": 256,
"transformer.encoder.layers.5.self_attn.output_proj.weight": 65536,
"transformer.encoder.layers.5.self_attn.output_proj.bias": 256,
"transformer.encoder.layers.5.norm1.weight": 256,
"transformer.encoder.layers.5.norm1.bias": 256,
"transformer.encoder.layers.5.linear1.weight": 524288,
"transformer.encoder.layers.5.linear1.bias": 2048,
"transformer.encoder.layers.5.linear2.weight": 524288,
"transformer.encoder.layers.5.linear2.bias": 256,
"transformer.encoder.layers.5.norm2.weight": 256,
"transformer.encoder.layers.5.norm2.bias": 256,
"transformer.decoder.layers.0.cross_attn.sampling_offsets.weight": 65536,
"transformer.decoder.layers.0.cross_attn.sampling_offsets.bias": 256,
"transformer.decoder.layers.0.cross_attn.attention_weights.weight": 32768,
"transformer.decoder.layers.0.cross_attn.attention_weights.bias": 128,
"transformer.decoder.layers.0.cross_attn.value_proj.weight": 65536,
"transformer.decoder.layers.0.cross_attn.value_proj.bias": 256,
"transformer.decoder.layers.0.cross_attn.output_proj.weight": 65536,
"transformer.decoder.layers.0.cross_attn.output_proj.bias": 256,
"transformer.decoder.layers.0.norm1.weight": 256,
"transformer.decoder.layers.0.norm1.bias": 256,
"transformer.decoder.layers.0.self_attn.in_proj_weight": 196608,
"transformer.decoder.layers.0.self_attn.in_proj_bias": 768,
"transformer.decoder.layers.0.self_attn.out_proj.weight": 65536,
"transformer.decoder.layers.0.self_attn.out_proj.bias": 256,
"transformer.decoder.layers.0.norm2.weight": 256,
"transformer.decoder.layers.0.norm2.bias": 256,
"transformer.decoder.layers.0.linear1.weight": 524288,
"transformer.decoder.layers.0.linear1.bias": 2048,
"transformer.decoder.layers.0.linear2.weight": 524288,
"transformer.decoder.layers.0.linear2.bias": 256,
"transformer.decoder.layers.0.norm3.weight": 256,
"transformer.decoder.layers.0.norm3.bias": 256,
"transformer.decoder.layers.1.cross_attn.sampling_offsets.weight": 65536,
"transformer.decoder.layers.1.cross_attn.sampling_offsets.bias": 256,
"transformer.decoder.layers.1.cross_attn.attention_weights.weight": 32768,
"transformer.decoder.layers.1.cross_attn.attention_weights.bias": 128,
"transformer.decoder.layers.1.cross_attn.value_proj.weight": 65536,
"transformer.decoder.layers.1.cross_attn.value_proj.bias": 256,
"transformer.decoder.layers.1.cross_attn.output_proj.weight": 65536,
"transformer.decoder.layers.1.cross_attn.output_proj.bias": 256,
"transformer.decoder.layers.1.norm1.weight": 256,
"transformer.decoder.layers.1.norm1.bias": 256,
"transformer.decoder.layers.1.self_attn.in_proj_weight": 196608,
"transformer.decoder.layers.1.self_attn.in_proj_bias": 768,
"transformer.decoder.layers.1.self_attn.out_proj.weight": 65536,
"transformer.decoder.layers.1.self_attn.out_proj.bias": 256,
"transformer.decoder.layers.1.norm2.weight": 256,
"transformer.decoder.layers.1.norm2.bias": 256,
"transformer.decoder.layers.1.linear1.weight": 524288,
"transformer.decoder.layers.1.linear1.bias": 2048,
"transformer.decoder.layers.1.linear2.weight": 524288,
"transformer.decoder.layers.1.linear2.bias": 256,
"transformer.decoder.layers.1.norm3.weight": 256,
"transformer.decoder.layers.1.norm3.bias": 256,
"transformer.decoder.layers.2.cross_attn.sampling_offsets.weight": 65536,
"transformer.decoder.layers.2.cross_attn.sampling_offsets.bias": 256,
"transformer.decoder.layers.2.cross_attn.attention_weights.weight": 32768,
"transformer.decoder.layers.2.cross_attn.attention_weights.bias": 128,
"transformer.decoder.layers.2.cross_attn.value_proj.weight": 65536,
"transformer.decoder.layers.2.cross_attn.value_proj.bias": 256,
"transformer.decoder.layers.2.cross_attn.output_proj.weight": 65536,
"transformer.decoder.layers.2.cross_attn.output_proj.bias": 256,
"transformer.decoder.layers.2.norm1.weight": 256,
"transformer.decoder.layers.2.norm1.bias": 256,
"transformer.decoder.layers.2.self_attn.in_proj_weight": 196608,
"transformer.decoder.layers.2.self_attn.in_proj_bias": 768,
"transformer.decoder.layers.2.self_attn.out_proj.weight": 65536,
"transformer.decoder.layers.2.self_attn.out_proj.bias": 256,
"transformer.decoder.layers.2.norm2.weight": 256,
"transformer.decoder.layers.2.norm2.bias": 256,
"transformer.decoder.layers.2.linear1.weight": 524288,
"transformer.decoder.layers.2.linear1.bias": 2048,
"transformer.decoder.layers.2.linear2.weight": 524288,
"transformer.decoder.layers.2.linear2.bias": 256,
"transformer.decoder.layers.2.norm3.weight": 256,
"transformer.decoder.layers.2.norm3.bias": 256,
"transformer.decoder.layers.3.cross_attn.sampling_offsets.weight": 65536,
"transformer.decoder.layers.3.cross_attn.sampling_offsets.bias": 256,
"transformer.decoder.layers.3.cross_attn.attention_weights.weight": 32768,
"transformer.decoder.layers.3.cross_attn.attention_weights.bias": 128,
"transformer.decoder.layers.3.cross_attn.value_proj.weight": 65536,
"transformer.decoder.layers.3.cross_attn.value_proj.bias": 256,
"transformer.decoder.layers.3.cross_attn.output_proj.weight": 65536,
"transformer.decoder.layers.3.cross_attn.output_proj.bias": 256,
"transformer.decoder.layers.3.norm1.weight": 256,
"transformer.decoder.layers.3.norm1.bias": 256,
"transformer.decoder.layers.3.self_attn.in_proj_weight": 196608,
"transformer.decoder.layers.3.self_attn.in_proj_bias": 768,
"transformer.decoder.layers.3.self_attn.out_proj.weight": 65536,
"transformer.decoder.layers.3.self_attn.out_proj.bias": 256,
"transformer.decoder.layers.3.norm2.weight": 256,
"transformer.decoder.layers.3.norm2.bias": 256,
"transformer.decoder.layers.3.linear1.weight": 524288,
"transformer.decoder.layers.3.linear1.bias": 2048,
"transformer.decoder.layers.3.linear2.weight": 524288,
"transformer.decoder.layers.3.linear2.bias": 256,
"transformer.decoder.layers.3.norm3.weight": 256,
"transformer.decoder.layers.3.norm3.bias": 256,
"transformer.decoder.layers.4.cross_attn.sampling_offsets.weight": 65536,
"transformer.decoder.layers.4.cross_attn.sampling_offsets.bias": 256,
"transformer.decoder.layers.4.cross_attn.attention_weights.weight": 32768,
"transformer.decoder.layers.4.cross_attn.attention_weights.bias": 128,
"transformer.decoder.layers.4.cross_attn.value_proj.weight": 65536,
"transformer.decoder.layers.4.cross_attn.value_proj.bias": 256,
"transformer.decoder.layers.4.cross_attn.output_proj.weight": 65536,
"transformer.decoder.layers.4.cross_attn.output_proj.bias": 256,
"transformer.decoder.layers.4.norm1.weight": 256,
"transformer.decoder.layers.4.norm1.bias": 256,
"transformer.decoder.layers.4.self_attn.in_proj_weight": 196608,
"transformer.decoder.layers.4.self_attn.in_proj_bias": 768,
"transformer.decoder.layers.4.self_attn.out_proj.weight": 65536,
"transformer.decoder.layers.4.self_attn.out_proj.bias": 256,
"transformer.decoder.layers.4.norm2.weight": 256,
"transformer.decoder.layers.4.norm2.bias": 256,
"transformer.decoder.layers.4.linear1.weight": 524288,
"transformer.decoder.layers.4.linear1.bias": 2048,
"transformer.decoder.layers.4.linear2.weight": 524288,
"transformer.decoder.layers.4.linear2.bias": 256,
"transformer.decoder.layers.4.norm3.weight": 256,
"transformer.decoder.layers.4.norm3.bias": 256,
"transformer.decoder.layers.5.cross_attn.sampling_offsets.weight": 65536,
"transformer.decoder.layers.5.cross_attn.sampling_offsets.bias": 256,
"transformer.decoder.layers.5.cross_attn.attention_weights.weight": 32768,
"transformer.decoder.layers.5.cross_attn.attention_weights.bias": 128,
"transformer.decoder.layers.5.cross_attn.value_proj.weight": 65536,
"transformer.decoder.layers.5.cross_attn.value_proj.bias": 256,
"transformer.decoder.layers.5.cross_attn.output_proj.weight": 65536,
"transformer.decoder.layers.5.cross_attn.output_proj.bias": 256,
"transformer.decoder.layers.5.norm1.weight": 256,
"transformer.decoder.layers.5.norm1.bias": 256,
"transformer.decoder.layers.5.self_attn.in_proj_weight": 196608,
"transformer.decoder.layers.5.self_attn.in_proj_bias": 768,
"transformer.decoder.layers.5.self_attn.out_proj.weight": 65536,
"transformer.decoder.layers.5.self_attn.out_proj.bias": 256,
"transformer.decoder.layers.5.norm2.weight": 256,
"transformer.decoder.layers.5.norm2.bias": 256,
"transformer.decoder.layers.5.linear1.weight": 524288,
"transformer.decoder.layers.5.linear1.bias": 2048,
"transformer.decoder.layers.5.linear2.weight": 524288,
"transformer.decoder.layers.5.linear2.bias": 256,
"transformer.decoder.layers.5.norm3.weight": 256,
"transformer.decoder.layers.5.norm3.bias": 256,
"transformer.decoder.query_scale.layers.0.weight": 65536,
"transformer.decoder.query_scale.layers.0.bias": 256,
"transformer.decoder.query_scale.layers.1.weight": 65536,
"transformer.decoder.query_scale.layers.1.bias": 256,
"transformer.decoder.ref_point_head.layers.0.weight": 131072,
"transformer.decoder.ref_point_head.layers.0.bias": 256,
"transformer.decoder.ref_point_head.layers.1.weight": 65536,
"transformer.decoder.ref_point_head.layers.1.bias": 256,
"transformer.decoder.bbox_embed.0.layers.0.weight": 65536,
"transformer.decoder.bbox_embed.0.layers.0.bias": 256,
"transformer.decoder.bbox_embed.0.layers.1.weight": 65536,
"transformer.decoder.bbox_embed.0.layers.1.bias": 256,
"transformer.decoder.bbox_embed.0.layers.2.weight": 1024,
"transformer.decoder.bbox_embed.0.layers.2.bias": 4,
"transformer.decoder.bbox_embed.1.layers.0.weight": 65536,
"transformer.decoder.bbox_embed.1.layers.0.bias": 256,
"transformer.decoder.bbox_embed.1.layers.1.weight": 65536,
"transformer.decoder.bbox_embed.1.layers.1.bias": 256,
"transformer.decoder.bbox_embed.1.layers.2.weight": 1024,
"transformer.decoder.bbox_embed.1.layers.2.bias": 4,
"transformer.decoder.bbox_embed.2.layers.0.weight": 65536,
"transformer.decoder.bbox_embed.2.layers.0.bias": 256,
"transformer.decoder.bbox_embed.2.layers.1.weight": 65536,
"transformer.decoder.bbox_embed.2.layers.1.bias": 256,
"transformer.decoder.bbox_embed.2.layers.2.weight": 1024,
"transformer.decoder.bbox_embed.2.layers.2.bias": 4,
"transformer.decoder.bbox_embed.3.layers.0.weight": 65536,
"transformer.decoder.bbox_embed.3.layers.0.bias": 256,
"transformer.decoder.bbox_embed.3.layers.1.weight": 65536,
"transformer.decoder.bbox_embed.3.layers.1.bias": 256,
"transformer.decoder.bbox_embed.3.layers.2.weight": 1024,
"transformer.decoder.bbox_embed.3.layers.2.bias": 4,
"transformer.decoder.bbox_embed.4.layers.0.weight": 65536,
"transformer.decoder.bbox_embed.4.layers.0.bias": 256,
"transformer.decoder.bbox_embed.4.layers.1.weight": 65536,
"transformer.decoder.bbox_embed.4.layers.1.bias": 256,
"transformer.decoder.bbox_embed.4.layers.2.weight": 1024,
"transformer.decoder.bbox_embed.4.layers.2.bias": 4,
"transformer.decoder.bbox_embed.5.layers.0.weight": 65536,
"transformer.decoder.bbox_embed.5.layers.0.bias": 256,
"transformer.decoder.bbox_embed.5.layers.1.weight": 65536,
"transformer.decoder.bbox_embed.5.layers.1.bias": 256,
"transformer.decoder.bbox_embed.5.layers.2.weight": 1024,
"transformer.decoder.bbox_embed.5.layers.2.bias": 4,
"class_embed.0.weight": 23296,
"class_embed.0.bias": 91,
"class_embed.1.weight": 23296,
"class_embed.1.bias": 91,
"class_embed.2.weight": 23296,
"class_embed.2.bias": 91,
"class_embed.3.weight": 23296,
"class_embed.3.bias": 91,
"class_embed.4.weight": 23296,
"class_embed.4.bias": 91,
"class_embed.5.weight": 23296,
"class_embed.5.bias": 91,
"label_enc.weight": 23460,
"tgt_embed.weight": 76500,
"refpoint_embed.weight": 1200,
"input_proj.0.0.weight": 131072,
"input_proj.0.0.bias": 256,
"input_proj.0.1.weight": 256,
"input_proj.0.1.bias": 256,
"input_proj.1.0.weight": 262144,
"input_proj.1.0.bias": 256,
"input_proj.1.1.weight": 256,
"input_proj.1.1.bias": 256,
"input_proj.2.0.weight": 524288,
"input_proj.2.0.bias": 256,
"input_proj.2.1.weight": 256,
"input_proj.2.1.bias": 256,
"input_proj.3.0.weight": 4718592,
"input_proj.3.0.bias": 256,
"input_proj.3.1.weight": 256,
"input_proj.3.1.bias": 256,
"backbone.0.body.layer2.0.conv1.weight": 32768,
"backbone.0.body.layer2.0.conv2.weight": 147456,
"backbone.0.body.layer2.0.conv3.weight": 65536,
"backbone.0.body.layer2.0.downsample.0.weight": 131072,
"backbone.0.body.layer2.1.conv1.weight": 65536,
"backbone.0.body.layer2.1.conv2.weight": 147456,
"backbone.0.body.layer2.1.conv3.weight": 65536,
"backbone.0.body.layer2.2.conv1.weight": 65536,
"backbone.0.body.layer2.2.conv2.weight": 147456,
"backbone.0.body.layer2.2.conv3.weight": 65536,
"backbone.0.body.layer2.3.conv1.weight": 65536,
"backbone.0.body.layer2.3.conv2.weight": 147456,
"backbone.0.body.layer2.3.conv3.weight": 65536,
"backbone.0.body.layer3.0.conv1.weight": 131072,
"backbone.0.body.layer3.0.conv2.weight": 589824,
"backbone.0.body.layer3.0.conv3.weight": 262144,
"backbone.0.body.layer3.0.downsample.0.weight": 524288,
"backbone.0.body.layer3.1.conv1.weight": 262144,
"backbone.0.body.layer3.1.conv2.weight": 589824,
"backbone.0.body.layer3.1.conv3.weight": 262144,
"backbone.0.body.layer3.2.conv1.weight": 262144,
"backbone.0.body.layer3.2.conv2.weight": 589824,
"backbone.0.body.layer3.2.conv3.weight": 262144,
"backbone.0.body.layer3.3.conv1.weight": 262144,
"backbone.0.body.layer3.3.conv2.weight": 589824,
"backbone.0.body.layer3.3.conv3.weight": 262144,
"backbone.0.body.layer3.4.conv1.weight": 262144,
"backbone.0.body.layer3.4.conv2.weight": 589824,
"backbone.0.body.layer3.4.conv3.weight": 262144,
"backbone.0.body.layer3.5.conv1.weight": 262144,
"backbone.0.body.layer3.5.conv2.weight": 589824,
"backbone.0.body.layer3.5.conv3.weight": 262144,
"backbone.0.body.layer4.0.conv1.weight": 524288,
"backbone.0.body.layer4.0.conv2.weight": 2359296,
"backbone.0.body.layer4.0.conv3.weight": 1048576,
"backbone.0.body.layer4.0.downsample.0.weight": 2097152,
"backbone.0.body.layer4.1.conv1.weight": 1048576,
"backbone.0.body.layer4.1.conv2.weight": 2359296,
"backbone.0.body.layer4.1.conv3.weight": 1048576,
"backbone.0.body.layer4.2.conv1.weight": 1048576,
"backbone.0.body.layer4.2.conv2.weight": 2359296,
"backbone.0.body.layer4.2.conv3.weight": 1048576
}
loading annotations into memory...
Done (t=0.03s)
creating index...
index created!
loading annotations into memory...
Done (t=0.00s)
creating index...
index created!
Start training
F:\1chen\DETR\jin\dn\DN-DETR\models\dn_dab_deformable_detr\position_encoding.py:53: UserWarning: floordiv is deprecated, and its behavior will change in a future version of pytorch. It currently rounds toward 0 (like the 'trunc' function NOT 'floor'). This results in incorrect rounding for negative values. To keep the current behavior, use torch.div(a, b, rounding_mode='trunc'), or for actual floor division, use torch.div(a, b, rounding_mode='floor').
dim_t = self.temperature ** (2 * (dim_t // 2) / self.num_pos_feats)
Traceback (most recent call last):
File "F:/1chen/DETR/jin/dn/DN-DETR/main.py", line 426, in
main(args)
File "F:/1chen/DETR/jin/dn/DN-DETR/main.py", line 352, in main
train_stats = train_one_epoch(
File "F:\1chen\DETR\jin\dn\DN-DETR\engine.py", line 52, in train_one_epoch
outputs = model(samples)
File "D:\anaconda3.9\envs\zj\lib\site-packages\torch\nn\modules\module.py", line 1110, in _call_impl
return forward_call(*input, **kwargs)
File "F:\1chen\DETR\jin\dn\DN-DETR\models\dn_dab_deformable_detr\dab_deformable_detr.py", line 206, in forward
prepare_for_dn(dn_args, tgt_all_embed, refanchor, src.size(0), self.training, self.num_queries, self.num_classes,
File "F:\1chen\DETR\jin\dn\DN-DETR\models\dn_dab_deformable_detr\dn_components.py", line 61, in prepare_for_dn
targets, scalar, label_noise_scale, box_noise_scale, num_patterns = dn_args
TypeError: cannot unpack non-iterable NoneType object

Process finished with exit code 1

Mismatching shape of tgt_embed and pat_embed?

In the forward part of dab_deformable_detr.py

if self.two_stage:
            assert NotImplementedError
        elif self.use_dab:
            if self.num_patterns == 0:
                tgt_all_embed = tgt_embed = self.tgt_embed.weight           # nq, 256
                refanchor = self.refpoint_embed.weight      # nq, 4
                # query_embeds = torch.cat((tgt_embed, refanchor), dim=1)
            else:
                # multi patterns
                tgt_embed = self.tgt_embed.weight           # nq, 256
                pat_embed = self.patterns_embed.weight      # num_pat, 256
                tgt_embed = tgt_embed.repeat(self.num_patterns, 1) # nq*num_pat, 256
                pat_embed = pat_embed[:, None, :].repeat(1, self.num_queries, 1).flatten(0, 1) # nq*num_pat, 256
                tgt_all_embed = tgt_embed + pat_embed
                refanchor = self.refpoint_embed.weight.repeat(self.num_patterns, 1)      # nq*num_pat, 4
                # query_embeds = torch.cat((tgt_all_embed, refanchor), dim=1)
        else:
            assert NotImplementedError

Isn't tgt_embed with the shape nq, hidden_dim - 1 ? How could you add tgt_embed with pat_embed ?

Inverse sigmoid

Hi, I have a question that what the role of “Inverse Sigmoid” is in your code? I mention that Inverse sigmoid is used in many places in your code

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.