zeliu98 / group-free-3d Goto Github PK

View Code? Open in Web Editor NEW

242.0 242.0 34.0 3.45 MB

Group-Free 3D Object Detection via Transformers

License: MIT License

Python 88.69% Shell 0.03% C++ 3.71% Cuda 5.08% MATLAB 1.81% C 0.68%

group-free-3d's People

Stargazers

Watchers

group-free-3d's Issues

5-times evaluation

Hi, thank you for releasing your codebase!

I wanted to ask - SUN-RGBD results seem to be unstable. I was wondering if you trained a single model and evaluated on 5 seeds, or you trained 5 models on different seeds?

Further, did you notice some variations between training runs?

Is there any code for visualizing borders?

Your work is very good, can you please share the code for the visualization box during validation?

I want to know how to visualize the bounding box

Dear Author：
I want to know how to visualize the bounding box

Question about results reproduction

Hi, thanks for the nice work.

I train your network on SUN RGBD dataset with the training script:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 2222 --nproc_per_node 4 train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 --size_cls_agnostic --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 --dataset sunrgbd --data_root .

I obtain the following results:

[08/24 23:51:00 group-free]: IoU[0.25]: 0head_: 0.6363  1head_: 0.6320  2head_: 0.6202  3head_: 0.6132  4head_: 0.6163  last_: 0.6164   proposal_: 0.6108
[08/24 23:51:00 group-free]: IoU[0.5]:  0head_: 0.4328  1head_: 0.4388  2head_: 0.4095  3head_: 0.4329  4head_: 0.4441  last_: 0.4282   proposal_: 0.3599

Question 1: There are several results (0head_, 1head_, 2head_, 3head_, 4head_, proposal_), and which one is proper to be reported in the paper?
Question 2: These results are not very comparable to the results in your paper (IoU[0.25] 63.0, IoU[0.5] 45.2). I'm not sure what's going wrong.

Thank you and look forward for your reply.

Visualizations on cross-attention weight in different decoder stages

Dear author,
Thank you for your good work.
I want to know how to visualize the results in Figure 5. Can you provide the corresponding visualization code?

Code Question

First of all, thank you for sharing your work. I've been working with your code recently, modifying a few sections and noticed a few things I don't quite understand.

In your paper you stated that you used a random scaling of 0.9 to 1.1 as augmentation on the Scannet data set, however the augmentation code provided only applies random flipping and rotation or did I miss the section where the scaling is applied?

Secondly I wasn't able to reproduce your results on the Scannet data set, coming 1% short on the mAP score @25 IoU and @50 IoU. Now I'm wondering if this may be due to the fact that I'm only using a single GPU for training? As far as I understand you do not sync the batch norm across GPUs and the batch on each GPU being smaller may actually beneficial to the training?

And when I looked at the transformer code, I noticed that each attention layer uses 288 dimensional features. I was wondering if there is a specific reason for choosing this value, as it seems quite low to me and I would have thought that a power of 2 would be more inline with most architectures.

I would really appreciate it if I could gain your insights on this.

Prepare SUN RGB-D Data

Eval: Some classes output NaN because of Npos=0

Hi!
I am trying to evaluate on all Scannet classes (485). Since some classes are very rare, running the eval_det_cls for them throws NaN because of npos=0. Can you recommend a fix for this?

Question about the size_cls_agnostic

I found that ‘size_gts’ is used to supervise the pred-size of object when setting size_cls_agnostic is True. I would like to ask the reason for using ‘size_gts’ instead of ‘box3d_size’ as supervision information.

Some files missed on SUNRGBD

I have followed the process under sunrgbd, and the dataset can be run with votenet.
But it failed to run with Group-Free-3D.
The following file is missed:
all_obbs_modified_nearest_has_empty.pkl
all_pc_modified_nearest_has_empty.pkl
all_point_labels_nearest_has_empty.pkl
Can you provide related files?
Thanks 😀

The sunrgbd data set is too large and burst when loaded into memory

How to modify the code？

Loss become nan when at about 300 epoch

Thanks for your excellent work!

I encountered a problem during the training.
Since I only have one GPU, so I modified the train_dist.py to a single GPU version (just remove the codes about distribution).

I want to know if is there anything else needs to be modified? And if you have any suggestion about this problem? Thanks very much!!

Can you provide the dataset address?

I download the SUN RGB-D dataset, but it does not meet the program you need!

Could please share with me processed Scannet V2 ?

Thank you for your great work. Could you please share with me your processed Scannet V2 ?

ModuleNotFoundError: No module named 'pointnet2._ext'

I encounter the problem when I run the train_dist.py. I want to konw how to solve this question. I look forward for your reply!

the eval ap lower than the train ap

Dear author, the result of running eval_avg.py is not as good as that of evaluating during training, and the effect is reduced by 4%. Is this due to overfitting?

About voting

Thanks for your great work. You mentioned in appendix A1.2 that you implemented voting into the framework. But it seems that no experiment or code can be found in the paper or in this repo.

There is no "demo.py"

I wonder how different stages of results are ensembled in this method. This part of the code is not given, which should be very important according to the paper. Even in the evaluation and test stage, the loss is an average of loss in different stages, but not the loss of a final estimated result. I don't think this is reasonable.

Are there any code for visualization？

Good work，I have run training on my own data and want to visualize the result，where can I find the code？

many thanks！

Could you please release the code for visualizing cross-attention map?

It will be very helpful for us to get more insight from your work, thanks!

About the outdoor datasets

Dear author ,have you try your code on outdoor datasets, like KITTI? How about the performance?~~~

Question about iterative object box prediction

Hi, thanks for your sharing. I find for each decoder layer, you use the cluster_xyz as the initial location instead of the updated base_xyz

Group-Free-3D/models/detector.py

Lines 221 to 227 in ef8b7bb

 base_xyz, base_size = self.prediction_heads[i](query, 

 base_xyz=cluster_xyz, 

 end_points=end_points, 

 prefix=prefix) 

 base_xyz = base_xyz.detach().clone() 

 base_size = base_size.detach().clone()

My question is since each layer uses the box location of the previous layer to produce the spatial encoding, why does each layer predict the offset to the gt box location relative to the initial cluster center instead of the updated center of the previous layer? In another word, why not

base_xyz, base_size = self.prediction_heads[i](query,
                           base_xyz=base_xyz,                                               
                           end_points=end_points, 
                           prefix=prefix)

And under your setting, I think the "auxiliary loss" is necessary? The reason is that if no auxiliary loss is applied, the prediction head of the first N-1 decoder layers will not get supervision for the center_residual, the updated box prediction and spatial encoding for the next decoder layer will be meanless. Am I correct?

Best,
Xuyang

Question about label assign

Hi, Thanks for sharing this excellent work. In the paper, you mentioned that you manually assign the object candidates to ground-truth. Could you please explain the detail a little bit and point out where is the code?

Inference Queries

@stupidZZ Thanks for opensourcing the code base , i have few queries

how to run on custom point cloud dataset , should we pre process into any one of the format
how to visualize the results shown in the paper , can you please sharre the visualization code base
i am able to run the model and getting the following results hwo to validate the results by metrics and visualization
dict_keys(['sa1_inds', 'sa1_xyz', 'sa1_features', 'sa2_inds', 'sa2_xyz', 'sa2_features', 'sa3_xyz', 'sa3_features', 'sa4_xyz', 'sa4_features', 'fp2_features', 'fp2_xyz', 'fp2_inds', 'seed_inds', 'seed_xyz', 'seed_features', 'seeds_obj_cls_logits', 'query_points_xyz', 'query_points_feature', 'query_points_sample_inds', 'proposal_base_xyz', 'proposal_objectness_scores', 'proposal_center', 'proposal_heading_scores', 'proposal_heading_residuals_normalized', 'proposal_heading_residuals', 'proposal_pred_size', 'proposal_sem_cls_scores', '0head_base_xyz', '0head_objectness_scores', '0head_center', '0head_heading_scores', '0head_heading_residuals_normalized', '0head_heading_residuals', '0head_pred_size', '0head_sem_cls_scores', '1head_base_xyz', '1head_objectness_scores', '1head_center', '1head_heading_scores', '1head_heading_residuals_normalized', '1head_heading_residuals', '1head_pred_size', '1head_sem_cls_scores', '2head_base_xyz', '2head_objectness_scores', '2head_center', '2head_heading_scores', '2head_heading_residuals_normalized', '2head_heading_residuals', '2head_pred_size', '2head_sem_cls_scores', '3head_base_xyz', '3head_objectness_scores', '3head_center', '3head_heading_scores', '3head_heading_residuals_normalized', '3head_heading_residuals', '3head_pred_size', '3head_sem_cls_scores', '4head_base_xyz', '4head_objectness_scores', '4head_center', '4head_heading_scores', '4head_heading_residuals_normalized', '4head_heading_residuals', '4head_pred_size', '4head_sem_cls_scores', 'last_base_xyz', 'last_objectness_scores', 'last_center', 'last_heading_scores', 'last_heading_residuals_normalized', 'last_heading_residuals', 'last_pred_size', 'last_sem_cls_scores', 'point_clouds', 'center_label', 'heading_class_label', 'heading_residual_label', 'size_class_label', 'size_residual_label', 'size_gts', 'sem_cls_label', 'box_label_mask', 'point_obj_mask', 'point_instance_label', 'scan_idx', 'max_gt_bboxes', 'points_hard_topk4_pos_ratio', 'points_hard_topk4_neg_ratio', 'points_hard_topk4_upper_recall_ratio', 'query_points_generation_loss', 'proposal_objectness_label', 'proposal_objectness_mask', 'proposal_object_assignment', 'proposal_pos_ratio', 'proposal_neg_ratio', 'proposal_objectness_loss', 'last_objectness_label', 'last_objectness_mask', 'last_object_assignment', 'last_pos_ratio', 'last_neg_ratio', 'last_objectness_loss', '0head_objectness_label', '0head_objectness_mask', '0head_object_assignment', '0head_pos_ratio', '0head_neg_ratio', '0head_objectness_loss', '1head_objectness_label', '1head_objectness_mask', '1head_object_assignment', '1head_pos_ratio', '1head_neg_ratio', '1head_objectness_loss', '2head_objectness_label', '2head_objectness_mask', '2head_object_assignment', '2head_pos_ratio', '2head_neg_ratio', '2head_objectness_loss', '3head_objectness_label', '3head_objectness_mask', '3head_object_assignment', '3head_pos_ratio', '3head_neg_ratio', '3head_objectness_loss', '4head_objectness_label', '4head_objectness_mask', '4head_object_assignment', '4head_pos_ratio', '4head_neg_ratio', '4head_objectness_loss', 'sum_heads_objectness_loss', 'proposal_center_loss', 'proposal_heading_cls_loss', 'proposal_heading_reg_loss', 'proposal_size_reg_loss', 'proposal_box_loss', 'proposal_sem_cls_loss', 'last_center_loss', 'last_heading_cls_loss', 'last_heading_reg_loss', 'last_size_reg_loss', 'last_box_loss', 'last_sem_cls_loss', '0head_center_loss', '0head_heading_cls_loss', '0head_heading_reg_loss', '0head_size_reg_loss', '0head_box_loss', '0head_sem_cls_loss', '1head_center_loss', '1head_heading_cls_loss', '1head_heading_reg_loss', '1head_size_reg_loss', '1head_box_loss', '1head_sem_cls_loss', '2head_center_loss', '2head_heading_cls_loss', '2head_heading_reg_loss', '2head_size_reg_loss', '2head_box_loss', '2head_sem_cls_loss', '3head_center_loss', '3head_heading_cls_loss', '3head_heading_reg_loss', '3head_size_reg_loss', '3head_box_loss', '3head_sem_cls_loss', '4head_center_loss', '4head_heading_cls_loss', '4head_heading_reg_loss', '4head_size_reg_loss', '4head_box_loss', '4head_sem_cls_loss', 'sum_heads_box_loss', 'sum_heads_sem_cls_loss', 'loss', 'batch_gt_map_cls'])
Thanks in advance

Training without Instance Segmentation

Is it possible to train the group-free network without having access to per point instance labels, so only using 3D bounding boxes? The loss calculation seems to depend on instance labels as far as I can tell

can not gain “all_point_labels_nearest_has_empty.pkl”.

hello. I process this script: sunrgbd_data.py ，but can not gain “all_point_labels_nearest_has_empty.pkl”. .. Thank you for you help.

AttributeError: Can't pickle local object 'get_loader.<locals>.my_worker_init_fn'

It raised an error when I ran train_dist.py.
AttributeError: Can't pickle local object 'get_loader..my_worker_init_fn'
Anyone in the same situation?

why 'end_points['fp2_inds'] = end_points['sa1_inds'][:, 0:num_seed]' ?

In the models/backbone_module.py, you select the first 1024 out of 2048 sa1_inds as fp2_inds. I can understand that the intention behind this is to obtain the indices of these 1024 seed points in the entire point cloud, in order to participate in the loss calculation in the function compute_points_obj_cls_loss_hard_topk.

However, directly selecting the first 1024 out of 2048 sa1_inds does not correspond one-to-one with fp2_xyz. This mismatch would cause euclidean_dist1 and object_assignment_one_hot variables in the function compute_points_obj_cls_loss_hard_topk to not be aligned one-to-one. Doesn't this introduce an error in the supervision signal for KPS?

models/backbone_module.py:

        # --------- 2 FEATURE UPSAMPLING LAYERS --------
        features = self.fp1(end_points['sa3_xyz'], end_points['sa4_xyz'], end_points['sa3_features'],
                            end_points['sa4_features'])
        features = self.fp2(end_points['sa2_xyz'], end_points['sa3_xyz'], end_points['sa2_features'], features)
        end_points['fp2_features'] = features
        end_points['fp2_xyz'] = end_points['sa2_xyz']
        num_seed = end_points['fp2_xyz'].shape[1]
        end_points['fp2_inds'] = end_points['sa1_inds'][:, 0:num_seed]  # indices among the entire input point clouds

        return end_points

models/loss_helper.py:

def compute_points_obj_cls_loss_hard_topk(end_points, topk):
    box_label_mask = end_points['box_label_mask']
    seed_inds = end_points['seed_inds'].long()  # B, K
    seed_xyz = end_points['seed_xyz']  # B, K, 3
    seeds_obj_cls_logits = end_points['seeds_obj_cls_logits']  # B, 1, K
    gt_center = end_points['center_label'][:, :, 0:3]  # B, K2, 3
    gt_size = end_points['size_gts'][:, :, 0:3]  # B, K2, 3
    B = gt_center.shape[0]
    K = seed_xyz.shape[1]
    K2 = gt_center.shape[1]

    point_instance_label = end_points['point_instance_label']  # B, num_points
    object_assignment = torch.gather(point_instance_label, 1, seed_inds)  # B, num_seed
    object_assignment[object_assignment < 0] = K2 - 1  # set background points to the last gt bbox
    object_assignment_one_hot = torch.zeros((B, K, K2)).to(seed_xyz.device)
    object_assignment_one_hot.scatter_(2, object_assignment.unsqueeze(-1), 1)  # (B, K, K2)
    delta_xyz = seed_xyz.unsqueeze(2) - gt_center.unsqueeze(1)  # (B, K, K2, 3)
    delta_xyz = delta_xyz / (gt_size.unsqueeze(1) + 1e-6)  # (B, K, K2, 3)
    new_dist = torch.sum(delta_xyz ** 2, dim=-1)
    euclidean_dist1 = torch.sqrt(new_dist + 1e-6)  # BxKxK2
    euclidean_dist1 = euclidean_dist1 * object_assignment_one_hot + 100 * (1 - object_assignment_one_hot)  # BxKxK2

Inference on custom scans

Hi,
Can you please guide me that how can I use your trained Model on a custom scanned 3D point cloud for object detection.

Thanks

where is the ensemble code

hi , nice repo!

I wonder to know where is the ensemble code?

thanks!

	base_xyz, base_size = self.prediction_heads[i](query,
	base_xyz=cluster_xyz,
	end_points=end_points,
	prefix=prefix)

	base_xyz = base_xyz.detach().clone()
	base_size = base_size.detach().clone()

zeliu98 / group-free-3d Goto Github PK

group-free-3d's People

Stargazers

Watchers

Forkers

group-free-3d's Issues

Recommend Projects

Recommend Topics

Recommend Org