bbepoch / hoitransformer Goto Github PK

View Code? Open in Web Editor NEW

140.0 140.0 21.0 7.5 MB

This is the code for HOI Transformer

License: Apache License 2.0

Shell 0.58% Python 99.42%

hoitransformer's People

Contributors

Stargazers

Watchers

Forkers

sx14 pzqhyanxl goukusun dllinks xinjiang1994 leolv131 nabihgit iloveat playerkk pierrehao chinayi lapal0413 mytk2012 zhiqicheng wolfworld6 mango1218 zkba cravioleammo suryathiru dtungbmw

hoitransformer's Issues

The results on HICO-DET is much better than paper. Curious about what's the key modifications?

Thanks for the nice work! By reading your paper and code, I find something very confusing.
Why the performance on VCOCO is similar to the paper, while much better than paper for HICO-DET? What's the key factor for such results?

num_classes = 91?

HICO不是有80 objects吗？为什么代码中num_classes传入的是91呢？

Vcoco 150 epoch or 250 epoch

Thanks for sharing great work

I have one question,
when training vcoco with resnet 50
Should I train 150 epoch or 250 epoch to reproduce the result of 51 AP

during the test.py is --prior used for the test?

如何修改才能对自己的数据集进行评估呢

我尝试用您的程序对自己的数据集进行评估，请问如何实现对自己数据集的评估呢

some questions about the result

Hi,
Thanks for your great work. #Sorry to bother you. I noticed some differences between the repo and the paper.

In repo:

In paper:

Maybe I overlooked something else, but i really can not understand the reasons. Could you please explain this？

About action queries

Thanks for such a great work. I have a general query regarding an action query which is fed into a decoder of the transformer network.
I want to clarify whether the action query is input feature of the sample itself or the feature extracted from the bounding box of the respective interaction bounding box or something else ?.
In this work, num_queries=100 in default. How the features are selected based on number of queries?

Thanks in advance

How to use code to infer in my own data set? My own data set is not labeled, just want to see the actual application effect of HOI algorithm

How to use code to infer in your my data set? My own data set is not labeled, just want to see the actual application effect of HOI algorithm

thank you very much.

如何测试自己的数据集？

我训练了自己的数据集，但在测试的时候出现了问题，测试的时候具体需要修改哪些位置呢

Will the code be published today?

Onnx convversion

@bbepoch thanks for opensourcing the code base just wanted to knw will support be given for onnx conversion of the pre trained model ? THanks in advance

Datasets Split

Hi,

Thank you so much for the great paper!

I'm confused about the data split on VCOCO by going through the codes, did you use train2014 and val2014 for training and testing respectively? I mean there're only training and testing datasets used without a validation set, or I can think the val2014 is used for both validation and testing? Thank you in advance.

Will the code be published today?

Question about the training and evaluation code on VCOCO dataset?

Thanks for your nice work. Could you also provide the source code for training and evaluating on VCOCO dataset since it takes too long for the training on the HICO-Det dataset? Thanks a lot.

Replace Detr to yolor

@bbepoch hi thanks for opensourcing the code base had one query regarding the architecture for the current hoitransformer you are using Detr as the base can we change it yolor ? is so what is the process of changing the code base to yolor can share your thoughts on this
Thanks in adavnce

How to evaluate the model with custom images?

First of all, Thank you so much for such an amazing work.
I have two things to clarify

I have download some images from the google and put it on the same folder 'images' replacing the existing test folder with the new one, but still, on executing the program, it fetches the images from the original test data and not from the new one. Even if i give only the subset of images, it is fetching the whole. Could we test the model over new set of test images which is not present in the database, if so where should i do the changes?
I have tried to visualize the images or save the images with bounding box inserted setting save_image=True of the test.py script, but it is throwing an error "NotImplementedError". Could i visualize the test image with detected bounding boxes and labels?

A question about your V-COCO dataset.

First of all, thanks for sharing the source code of the paper (End-to-End Human Object Interaction Detection with HOI Transformer) which was accepted in CVPR. I have a question about your V-COCO dataset. The original V-coco dataset consists of 5,400 images in the trainval dataset and 4946 images in the test set. But I download your retag dataset from google drive which consists of 4971 images in trainval set and 4539 images in the test set.

May I ask whether I downloaded the wrong data or the data has been processed especially?

I look forward to receiving your reply.

training on swin-b backbone

I find some swin-b code in repositories and tried to train this model on swin-b.But i find the train loss convergence at 70 and it cant drop with any training.Can you share how to train this model on swin-b or teach me if there anything code need to complete in this repositories that i will tried to complete this code.I would be appreciate for your reponse,thank you so much.

How to evaluate the model with Known Object Mode?

Thanks for your great work. I try to evaluate the model with Known Object Mode, is it possible for you to share the code for it?

Question about training parameters.

Thanks for your nice work! I know --epochs=250 for HICO-Det dataset. How many epochs about VCOCO dataset you used? 100 epochs or less than 100 epochs? And how about other parameters, like --lr_drop=200 and --batch_size=2 , are they as same as HICO-Det do? Thanks a lot.

Welcome issues to make this work better!

Thanks for your issue to make this work a strong baseline of HOI detection.

a little question about paper

It's a grate job, Tanks for what you contribute.
I have a question about the paper. In the paper, you said "All one-layer MLP branches for predicting confidence use a softmax function.". But as far as I know, VCOCO dataset's verbs is multi label, I am wonder how can I use softmax to predict multi label.
Hope you can reply me
Thanks

自己的数据集应用hico格式和odgt 训练出错

错误信息:/opt/conda/conda-bld/pytorch_1591914895884/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: operator(): block: [0,0,0], thread: [62,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1591914895884/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=59 : device-side assert triggered
py文件hoitr.py文件num_humans 是不是代表有多少人？那hico数据集中每张图的人员数量不是固定值，这个参数怎么设置呢？

与HOTR论文的区别

做完相关实验后，明白了，抱歉误解了您和他的区别

about number of object categories

Excuse me!

It's a grate job, Tanks for what you contribute.

The paper say that HICO-DET has 80 objects,but num_classes is 91 in /models/hoitr.py 345 lines.Why set it like this？

And could you tell me how to get 'hico_train_retag_hoitr.odgt' from hico-det datasets' file?

thank you very much!

Why do we need self.human_cls_embed?

I don't quite understand the self.human_cls_embed in class HoiTR. Why do we need to classify the human box? What do the 2 classes stand for?

Problem with requirements

Hi, I install the latest version of requirements but make this error message:

ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops'

If I install old version of requirements make the same error. Also some the version of requirements are disqualified.

the test results on HICO-Det and V-COCO after training 150 epochs are very low, it doesn't seem to converge.

I used 32 A100 gpu, batch size per gpu is 6, learing rate is 12e-4, learning rate of backbone is 12e-5, other settings are same.How can i get the correct result? Any helpful advise is appreciated!

detr training

I plan to use hoitransformer on my own dataset, but it seemes to i should train detr model firstly. I have trained my own dataset with detr(facebook), but it didn't work. Can you give me some suggestions on detr training?

vcoco annotation

hello, in your repo, the vcoco_test.odgt contains the 4953 images, but the ori vcoco test set contains above 5000 images, the vcoco performance should be test on 4953 images or 5000+ images? looking forward to your reply, thanks!

A question about odgt annotations.

Hello Author, thank you for your great work!
Recently, I tried to use HoiTransformer to train my own dataset. I have some questions about the odgt annotations.
Taking HICO_train2015_00000001.jpg for example, my questions are as follows:

In the original HICO annotation of this picture, there was only one pair of boxes for motorcycle and person. However, in the ODGT annotation, I found that this picture had more than one pair of boxes for motorcycle and person(but there is only one person and motorcycle in the pic), and the coordinates of each box were not the same, but the difference was not significant. I did not understand this point. (why are the coordinates of boxes of the same person or motorbike is different).
The coordinate of the person box in this picture in the hico annotations is [207,32,426,299], but the coordinates in odgt annotations is [207,32,220,268]. My understanding is that in hico annotations, the coordinates are the coordinates of the upper-left and lower-right points of the box, whereas in your odgt annotation, the coordinates are the upper-left points and the length and width of the box. Although this explanation seems reasonable, according to this understanding, the length and width of the person box should be [426-207=219,299-32=267], but it is [220,268] in ODGT annotations. Please tell me why.

Thanks again for your excellent work! Your answer will be of great help to me.
Looking forward to your reply!

How can I infer my own video just like a demo?

Questions about visualization of attention map

Thanks for your great works!! :)
I found the visualization of attention map in the paper，could u provide the visualization code？
Thank you for your help.

the hyparameters of hoi match loss

hello, in your paper, the beta1 and beta2 is 2 and 1 leads to best results, it shows classication loss is important than bbox loss. but in your code in this picture, the default self.cost_class is 1, self.cost_bbox is 5, self.cost_giou is 2, why do you set these parameters, thanks!

nori2

nori2是什么第三方库？pip的安装命令是什么？

training log file

Could you provide the training log files? Thanks

viz_hoi_result draw object box retangle incorrect position

Hi there,

I trained model with a small dataset (person raising hand) without any problem. But when I run test_on_images to test prediction, it drawed incorrect object box position (attached image). Could you help me figure out what I've done incorrectly?

Best regards,
MT.

关于单卡训练的一些问题

    作者您好，我使用一个3090训练以resnet50为主干的vcoco数据集，但是并没有得到51.9的测试结果，只有46左右，请问是什么问题导致的，我是按照模型默认的参数进行训练的，只把num_worker的值改为了8。

error when training

I try to train on V-COCO.
But when I run
python main.py --epochs=250 --lr_drop=110 --dataset_file=vcoco --batch_size=16 --backbone=resnet50

I got an error of :

  File "E:\project\HoiTransformer-master\models\hoi_matcher.py", line 80, in forward
    human_cost_class = -human_out_prob[:, human_tgt_ids]
IndexError: tensors used as indices must be long, byte or bool tensors

How to solve this error?

corre_hico.npy not found?

Hello!

I am running the evaluation code and it throws error while trying to load corre_hico.npy. Is this provided in the repo?

Thanks.

The script to generate ODGT annotation files

The ODGT annotations are indeed much easier to understand for HOI detection. I was wondering if the script to convert V-COCO's raw annotations to the ODGT format could be shared. Thank you.

dimension mismatch in hoi_matcher.py

line80 in hoi_matcher.py, human_cost_class = -human_out_prob[:, human_tgt_ids],where human_out_prob is [200, 3], however human_tgt_ids is [4]

能训练自己的数据集吗

怎么训练自己的数据集？

Person activity without object detection

@bbepoch hi thanks for sharing the code base great work, but i had one query, currently when i tested the model for some scenes like an only person running on a beach without any other object present there is no detections/activity in the output, is there any way i can get results like people walking , fighting, waving without depending on the object present in the scene

Thanks in advance

Modify num_worker to 1, 2 or other number, RuntimeError: unable to mmap 32 bytes from file </torch_239305_1669300387>: Cannot allocate memory (12)

Traceback (most recent call last):
File "slurm_main.py", line 243, in
main(args)
File "slurm_main.py", line 200, in main
args.clip_max_norm)
File "/mnt/lustre/penghuan/HoiTransformer/engine.py", line 32, in train_one_epoch for samples, targets in metric_logger.log_every(data_loader, print_freq, header):
File "/mnt/lustre/penghuan/HoiTransformer/util/misc.py", line 223, in log_every
for obj in iterable:
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line
355, in iter
return self._get_iterator()
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line
301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line
914, in init
w.start()
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 321, in reduce_storage
fd, size = storage.share_fd()
RuntimeError: unable to mmap 32 bytes from file </torch_239305_1669300387>: Cannot allocate memory (12)

Demo Code

Hi,

Will you release a demo code for this work? I really wish to try out your work but I'm stuck with limited resources. It'd be great if you could release your well-trained models. Thank you so much.