dawn-lx / vidsgg-big Goto Github PK

Pytorch implementation of our paper Classification-Then-Grounding: Reformulating Video Scene Graphs as Temporal Bipartite Graphs, which is accepted by CVPR2022

Python 100.00%

vidsgg-big's Introduction

vidsgg-big's People

Contributors

Stargazers

Watchers

Forkers

colaaaaaa yogesh-iitj leo-bright isenya bigbillfighter

vidsgg-big's Issues

The generation of .npy in prepared_data folder

Hi, how did these .npy generate in prepared_data folder? pred_bias_matrix_vidvrd.npy and vidvrd_EntiNameEmb.npy. I did not find the corresponding code for these. Thank you.

Device information

When I loaded all the cached VidOR data and dataloader started to fork, the program crashed due to error OSError: [Errno 12] Cannot allocate memory. I tried lower number of threads, but it did not work.

I am wondering whether my memory is too low (189GB, occupied memory is about 150+GB before crashing). Could you please provide your device information?

Thank you!

About the pre-prepared cache data for VidOR

Hi, could you introduce the structure of the pre-prepared cache data for VidOR, for example, MEGAv7_VidORtrain_freq1_part01_th_15-180-200-0.40.pkl?

Besides, when I trainning grounding stage, I have a questions about the code of DEBUG.
In models/grd_model_v5.py, line 268, does the inter_dura means the time slot that subject and object are all appearing in the trajectory? And what's the meaning of the index_map?

Thank you very much!

Tracklet Data of VidVRD

Hello,
The link for downloading the tracklet data VidVRD_test_every1frames is invalid. Could you please provide the new one?
Thank you very much.

About classme feature

According to tools_draft/extract_classme.py, I run the tools_draft/construct_CatName2vec.py first. But no file named vidor_CatName2vec_dict.pkl is generated. Could you help me?
Thank you!

Make the qualitative results

Thank you for your sharing the source code!
I would like to know how we can run experiments from Figure 6 in the paper.
And about the VidVRD Dataset, can we get qualitative results as visualization results?

DEBUG model

I find this snippet in grd_model_v5.py, which is used for DEBUG model to predict time boundaries. But, in regression head, why the left (right) offset is mapping to [0, 1] by sigmoid ? And how do you transform time boundaries to video frame feature sequences ?

        temp = nn.Sequential(
            DepthWiseSeparableConv1d(self.dim_hidden,self.dim_hidden,3),
            nn.ReLU()
        )
        temp2 = [copy.deepcopy(temp) for _ in range(4)] \
            + [DepthWiseSeparableConv1d(self.dim_hidden,self.num_bins,3)]
        temp3 = [copy.deepcopy(temp) for _ in range(4)] \
            + [DepthWiseSeparableConv1d(self.dim_hidden,2*self.num_bins,3),nn.Sigmoid()]
        
        self.cls_head = nn.Sequential(*temp2)
        self.conf_head = copy.deepcopy(self.cls_head)
        self.regr_head = nn.Sequential(*temp3)

About VidVRD dataset

Hello, could you provide your extracted tracklet data VidVRD_test_every1frames? Thank you very much.

About the prepared data

Hello, when I download the prepared data, I find the VidVRD test data from zju needs the permission, and the pku data is no longer available. Could you please update the link? Thanks a lot.

About the pre-prepared cache data for VidOR

Hi, could you introduce the structure of the pre-prepared cache data for VidOR, for example, MEGAv7_VidORtrain_freq1_part01_th_15-180-200-0.40.pkl?

Thank you very much!

About inference results

Hi, when I evaluate grounding stage on VidOR by running eval_vidor.py, I find inference result has been load before evaluation, e,i,, VidORval_infer_results_topk3_epoch60_debug.pkl'. Could you introduce the structure of the data of the file?
Thank you very much!

Mis-matching between Trajectory and gt_graph

Hi, I find that the cat_ids of the trajectory proposal is different from the traj_cat_ids of the paired gt_graph.
For example:
proposal.cat_ids: tensor([ 4, 4, 24, 4, 31, 31]), gt_graph.traj_cat_ids: tensor([11, 11, 7, 35, 35])
There is no common trajectory categories between each video-gt_graph pairs. But I think there should be the same category.
So I want to know how to obtain the concrete catetory of a trajectory proposal or gt_graph.
Thanks!

Box shifting: some boxes may appear as background after tracking (when using dataloader_vidor.py)

Tips from @Dawn-LX :

This problem originates from

VidSGG-BIG/dataloaders/dataloader_vidor.py

Lines 488 to 508 in eaf7578

 for idx,box_info in enumerate(track_res): 

 if not isinstance(box_info,list): 

 box_info = box_info.tolist() 

 assert len(box_info) == 6 or len(box_info) == 12 + self.dim_boxfeature,"len(box_info)=={}".format(len(box_info)) 

 frame_id = box_info[0] 

 tid = box_info[1] 

 tracklet_xywh = box_info[2:6] 

 xmin_t,ymin_t,w_t,h_t = tracklet_xywh 

 xmax_t = xmin_t + w_t 

 ymax_t = ymin_t + h_t 

 bbox_t = [xmin_t,ymin_t,xmax_t,ymax_t] 

 confidence = float(0) 

 if len(box_info) == 12 + self.dim_boxfeature: 

 confidence = box_info[6] 

 cat_id = box_info[7] 

 xywh = box_info[8:12] 

 xmin,ymin,w,h = xywh 

 xmax = xmin+w 

 ymax = ymin+h 

 bbox = [(xmin+xmin_t)/2, (ymin+ymin_t)/2, (xmax+xmax_t)/2,(ymax+ymax_t)/2]

Here, we notice that tracking results for each box at one specific frame consist of a 6-dim vector or a (12+dim_boxfeature)-dim vector.

If the 6-dim vector appears, corresponding box will be viewed as background.
Otherwise, the first 12-dim of box_info, which consists of frame_id, tracklet_id, 4-dim bbox coordinates, confidence, category_id, 4-dim bbox coordinates, will be used to determine the final location of bbox.

The first 4-dim bbox coordinates (box_info[2:6]) is generated by tracker, and the second one box_info[8:12] is generated by our video obeject detector. The reason why box shift is that we calculate an average bbox coordinates by the two mentioned one. Because detected object location maybe inconsistent with current tracklet, and the tracker-generated one is more precise, so this averaging manner may merge two boxes to a background one.

Specifically, box generated by tracker is much more precise since it considers boxes in previous frames, current detected box, and visual similarity. But box from video object detector maybe wrongly linked to current tracklet (which does not mean it is a background box itself). So this averaging manner is not strictly correct in these cases and that is why we only use track-generated one (box_info[2:6]) in

VidSGG-BIG/dataloaders/dataloader_vidor_v3.py

Lines 414 to 421 in eaf7578

 tracklet_xywh = box_info[2:6] 

 xmin_t,ymin_t,w_t,h_t = tracklet_xywh 

 xmax_t = xmin_t + w_t 

 ymax_t = ymin_t + h_t 

 confidence = box_info[6] 

 bbox_t = [xmin_t,ymin_t,xmax_t,ymax_t,confidence] 

 cat_id = box_info[7] 

 # xywh = box_info[8:12]

However, tracklet_mAP does not improve by switching from averaging manner to unique manner. The reasons maybe

Cases of box shifting are rarely seen, so final performance benefits little from this fixing.
Averaging manner may serve as a more precise way to combine/choose these two kinds of boxes for most cases, so unique manner may lose some accuracy.

Negative coordinates in the proposal bounding boxes list

Hi!

I found some negative values in the bounding boxes of proposal. Does it have some special meanings?

Thank you!

No such file or directory

Hello, where can I get the file prepared_data/vidvrd_EntiNameEmb_pku.npy and prepared_data/pred_bias_matrix_vidvrd_pku.npy? I find no way to generate them by myself. Than you very much!

tracklets with features link expired

Hello! I tried to download the tracklets with features from the author's website http://www.muyadong.com/publication.html
under this paper
Beyond Short-Term Snippet: Video Relation Detection with Spatio-Temporal Global Context, but the link has expired.
Are there any ways to get the data? i have read that you trained your own RCNN to get the tracklet with features for one file that was missing. Should I do the same? Maybe you have the downloaded dataset of features saved somewhere?

Classification stage and grounding stage

Hi, I have a one question.

Could you please explain the choice that only applying classification stage on VidVRD dataset?

Thank you!

轨迹提取的模型和细节

您好，我想做一些轨迹方面的消融实验，能提供一下您当时使用deepSORT方法时所用到的参数吗，另外还有提取物体的deepSORT feature时所用到的模型链接

classeme features

refer to #2 (comment)

About the prepared data for Vidor

Hello, when I try to look the structure for "MEGAv7_VidORtrain_freq1_part01_th_15-180-200-0.40.pkl", and I try pickle.load, but it got an error as "_pickle.UnpicklingError: pickle data was truncated". I want to know how can I use these prepared data and the structure of it. Thank a lot.

请问，可以提供一下用于提取I3D特征的I3D模型吗

请问可以提供一下I3D模型的链接吗我想自己用于提取特征谢谢

Pre-prepared cache data for VidOR

您好，请问能提供一下数据集VidOR的pre-prepared cache data的百度云或者其他的下载方式吗？由于文件太大MEGA上的数据无法下载完成，谢谢。

	for idx,box_info in enumerate(track_res):
	if not isinstance(box_info,list):
	box_info = box_info.tolist()
	assert len(box_info) == 6 or len(box_info) == 12 + self.dim_boxfeature,"len(box_info)=={}".format(len(box_info))

	frame_id = box_info[0]
	tid = box_info[1]
	tracklet_xywh = box_info[2:6]
	xmin_t,ymin_t,w_t,h_t = tracklet_xywh
	xmax_t = xmin_t + w_t
	ymax_t = ymin_t + h_t
	bbox_t = [xmin_t,ymin_t,xmax_t,ymax_t]
	confidence = float(0)
	if len(box_info) == 12 + self.dim_boxfeature:
	confidence = box_info[6]
	cat_id = box_info[7]
	xywh = box_info[8:12]
	xmin,ymin,w,h = xywh
	xmax = xmin+w
	ymax = ymin+h
	bbox = [(xmin+xmin_t)/2, (ymin+ymin_t)/2, (xmax+xmax_t)/2,(ymax+ymax_t)/2]

dawn-lx / vidsgg-big Goto Github PK

vidsgg-big's Introduction

vidsgg-big's People

Contributors

Stargazers

Watchers

Forkers

vidsgg-big's Issues

Recommend Projects

Recommend Topics

Recommend Org