Git Product home page Git Product logo

hoitransformer's People

Contributors

bbepoch avatar iloveat avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

hoitransformer's Issues

num_classes = 91?

HICO不是有80 objects吗?为什么代码中num_classes传入的是91呢?

Vcoco 150 epoch or 250 epoch

Thanks for sharing great work

I have one question,
when training vcoco with resnet 50
Should I train 150 epoch or 250 epoch to reproduce the result of 51 AP

  • during the test.py is --prior used for the test?

some questions about the result

Hi,
Thanks for your great work. #Sorry to bother you. I noticed some differences between the repo and the paper.

In repo:
image

In paper:
image

Maybe I overlooked something else, but i really can not understand the reasons. Could you please explain this?

About action queries

Thanks for such a great work. I have a general query regarding an action query which is fed into a decoder of the transformer network.
I want to clarify whether the action query is input feature of the sample itself or the feature extracted from the bounding box of the respective interaction bounding box or something else ?.
In this work, num_queries=100 in default. How the features are selected based on number of queries?

Thanks in advance

Onnx convversion

@bbepoch thanks for opensourcing the code base just wanted to knw will support be given for onnx conversion of the pre trained model ? THanks in advance

Datasets Split

Hi,

Thank you so much for the great paper!

I'm confused about the data split on VCOCO by going through the codes, did you use train2014 and val2014 for training and testing respectively? I mean there're only training and testing datasets used without a validation set, or I can think the val2014 is used for both validation and testing? Thank you in advance.

Replace Detr to yolor

@bbepoch hi thanks for opensourcing the code base had one query regarding the architecture for the current hoitransformer you are using Detr as the base can we change it yolor ? is so what is the process of changing the code base to yolor can share your thoughts on this
Thanks in adavnce

How to evaluate the model with custom images?

First of all, Thank you so much for such an amazing work.
I have two things to clarify

  1. I have download some images from the google and put it on the same folder 'images' replacing the existing test folder with the new one, but still, on executing the program, it fetches the images from the original test data and not from the new one. Even if i give only the subset of images, it is fetching the whole. Could we test the model over new set of test images which is not present in the database, if so where should i do the changes?
  2. I have tried to visualize the images or save the images with bounding box inserted setting save_image=True of the test.py script, but it is throwing an error "NotImplementedError". Could i visualize the test image with detected bounding boxes and labels?

A question about your V-COCO dataset.

First of all, thanks for sharing the source code of the paper (End-to-End Human Object Interaction Detection with HOI Transformer) which was accepted in CVPR. I have a question about your V-COCO dataset. The original V-coco dataset consists of 5,400 images in the trainval dataset and 4946 images in the test set. But I download your retag dataset from google drive which consists of 4971 images in trainval set and 4539 images in the test set.

May I ask whether I downloaded the wrong data or the data has been processed especially?

I look forward to receiving your reply.

training on swin-b backbone

I find some swin-b code in repositories and tried to train this model on swin-b.But i find the train loss convergence at 70 and it cant drop with any training.Can you share how to train this model on swin-b or teach me if there anything code need to complete in this repositories that i will tried to complete this code.I would be appreciate for your reponse,thank you so much.

Question about training parameters.

Thanks for your nice work! I know --epochs=250 for HICO-Det dataset. How many epochs about VCOCO dataset you used? 100 epochs or less than 100 epochs? And how about other parameters, like --lr_drop=200 and --batch_size=2 , are they as same as HICO-Det do? Thanks a lot.

a little question about paper

It's a grate job, Tanks for what you contribute.
I have a question about the paper. In the paper, you said "All one-layer MLP branches for predicting confidence use a softmax function.". But as far as I know, VCOCO dataset's verbs is multi label, I am wonder how can I use softmax to predict multi label.
Hope you can reply me
Thanks

自己的数据集应用hico格式和odgt 训练出错

错误信息:/opt/conda/conda-bld/pytorch_1591914895884/work/aten/src/ATen/native/cuda/IndexKernel.cu:60: operator(): block: [0,0,0], thread: [62,0,0] Assertion index >= -sizes[i] && index < sizes[i] && "index out of bounds" failed.
THCudaCheck FAIL file=/opt/conda/conda-bld/pytorch_1591914895884/work/aten/src/THC/THCCachingHostAllocator.cpp line=278 error=59 : device-side assert triggered
py文件hoitr.py文件num_humans 是不是代表有多少人?那hico数据集中每张图的人员数量不是固定值,这个参数怎么设置呢?

about number of object categories

Excuse me!

It's a grate job, Tanks for what you contribute.

The paper say that HICO-DET has 80 objects,but num_classes is 91 in /models/hoitr.py 345 lines.Why set it like this?

And could you tell me how to get 'hico_train_retag_hoitr.odgt' from hico-det datasets' file?

thank you very much!

Problem with requirements

Hi, I install the latest version of requirements but make this error message:

ImportError: cannot import name '_new_empty_tensor' from 'torchvision.ops'

If I install old version of requirements make the same error. Also some the version of requirements are disqualified.

detr training

I plan to use hoitransformer on my own dataset, but it seemes to i should train detr model firstly. I have trained my own dataset with detr(facebook), but it didn't work. Can you give me some suggestions on detr training?

vcoco annotation

hello, in your repo, the vcoco_test.odgt contains the 4953 images, but the ori vcoco test set contains above 5000 images, the vcoco performance should be test on 4953 images or 5000+ images? looking forward to your reply, thanks!

A question about odgt annotations.

Hello Author, thank you for your great work!
Recently, I tried to use HoiTransformer to train my own dataset. I have some questions about the odgt annotations.
Taking HICO_train2015_00000001.jpg for example, my questions are as follows:

  1. In the original HICO annotation of this picture, there was only one pair of boxes for motorcycle and person. However, in the ODGT annotation, I found that this picture had more than one pair of boxes for motorcycle and person(but there is only one person and motorcycle in the pic), and the coordinates of each box were not the same, but the difference was not significant. I did not understand this point. (why are the coordinates of boxes of the same person or motorbike is different).

  2. The coordinate of the person box in this picture in the hico annotations is [207,32,426,299], but the coordinates in odgt annotations is [207,32,220,268]. My understanding is that in hico annotations, the coordinates are the coordinates of the upper-left and lower-right points of the box, whereas in your odgt annotation, the coordinates are the upper-left points and the length and width of the box. Although this explanation seems reasonable, according to this understanding, the length and width of the person box should be [426-207=219,299-32=267], but it is [220,268] in ODGT annotations. Please tell me why.

Thanks again for your excellent work! Your answer will be of great help to me.
Looking forward to your reply!

the hyparameters of hoi match loss

image
hello, in your paper, the beta1 and beta2 is 2 and 1 leads to best results, it shows classication loss is important than bbox loss. but in your code in this picture, the default self.cost_class is 1, self.cost_bbox is 5, self.cost_giou is 2, why do you set these parameters, thanks!

nori2

nori2是什么第三方库?pip的安装命令是什么?

viz_hoi_result draw object box retangle incorrect position

Hi there,

I trained model with a small dataset (person raising hand) without any problem. But when I run test_on_images to test prediction, it drawed incorrect object box position (attached image). Could you help me figure out what I've done incorrectly?

img_1-ch01_20210331111737_0013 jpg_000000

Best regards,
MT.

关于单卡训练的一些问题

    作者您好,我使用一个3090训练以resnet50为主干的vcoco数据集,但是并没有得到51.9的测试结果,只有46左右,请问是什么问题导致的,我是按照模型默认的参数进行训练的,只把num_worker的值改为了8。

error when training

I try to train on V-COCO.
But when I run
python main.py --epochs=250 --lr_drop=110 --dataset_file=vcoco --batch_size=16 --backbone=resnet50

I got an error of :

  File "E:\project\HoiTransformer-master\models\hoi_matcher.py", line 80, in forward
    human_cost_class = -human_out_prob[:, human_tgt_ids]
IndexError: tensors used as indices must be long, byte or bool tensors

How to solve this error?

corre_hico.npy not found?

Hello!

I am running the evaluation code and it throws error while trying to load corre_hico.npy. Is this provided in the repo?

Thanks.

The script to generate ODGT annotation files

The ODGT annotations are indeed much easier to understand for HOI detection. I was wondering if the script to convert V-COCO's raw annotations to the ODGT format could be shared. Thank you.

dimension mismatch in hoi_matcher.py

line80 in hoi_matcher.py, human_cost_class = -human_out_prob[:, human_tgt_ids],where human_out_prob is [200, 3], however human_tgt_ids is [4]

Person activity without object detection

@bbepoch hi thanks for sharing the code base great work, but i had one query, currently when i tested the model for some scenes like an only person running on a beach without any other object present there is no detections/activity in the output, is there any way i can get results like people walking , fighting, waving without depending on the object present in the scene

Thanks in advance

Modify num_worker to 1, 2 or other number, RuntimeError: unable to mmap 32 bytes from file </torch_239305_1669300387>: Cannot allocate memory (12)

Traceback (most recent call last):
File "slurm_main.py", line 243, in
main(args)
File "slurm_main.py", line 200, in main
args.clip_max_norm)
File "/mnt/lustre/penghuan/HoiTransformer/engine.py", line 32, in train_one_epoch for samples, targets in metric_logger.log_every(data_loader, print_freq, header):
File "/mnt/lustre/penghuan/HoiTransformer/util/misc.py", line 223, in log_every
for obj in iterable:
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line
355, in iter
return self._get_iterator()
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line
301, in _get_iterator
return _MultiProcessingDataLoaderIter(self)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/site-packages/torch/utils/data/dataloader.py", line
914, in init
w.start()
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/process.py", line 105, in start
self._popen = self._Popen(self)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/context.py", line 223, in _Popen return _default_context.get_context().Process._Popen(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/context.py", line 284, in _Popen
return Popen(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 32, in init
super().init(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/popen_fork.py", line 19, in init
self._launch(process_obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/popen_spawn_posix.py", line 47, in _launch
reduction.dump(process_obj, fp)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/multiprocessing/reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
File "/mnt/cache/share/spring/conda_envs/miniconda3/envs/r1024/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 321, in reduce_storage
fd, size = storage.share_fd()
RuntimeError: unable to mmap 32 bytes from file </torch_239305_1669300387>: Cannot allocate memory (12)

Demo Code

Hi,

Will you release a demo code for this work? I really wish to try out your work but I'm stuck with limited resources. It'd be great if you could release your well-trained models. Thank you so much.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.