Git Product home page Git Product logo

group-free-3d's People

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

group-free-3d's Issues

5-times evaluation

Hi, thank you for releasing your codebase!

I wanted to ask - SUN-RGBD results seem to be unstable. I was wondering if you trained a single model and evaluated on 5 seeds, or you trained 5 models on different seeds?

Further, did you notice some variations between training runs?

Question about results reproduction

Hi, thanks for the nice work.

I train your network on SUN RGBD dataset with the training script:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 2222 --nproc_per_node 4 train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 --size_cls_agnostic --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 --dataset sunrgbd --data_root .

I obtain the following results:

[08/24 23:51:00 group-free]: IoU[0.25]: 0head_: 0.6363  1head_: 0.6320  2head_: 0.6202  3head_: 0.6132  4head_: 0.6163  last_: 0.6164   proposal_: 0.6108
[08/24 23:51:00 group-free]: IoU[0.5]:  0head_: 0.4328  1head_: 0.4388  2head_: 0.4095  3head_: 0.4329  4head_: 0.4441  last_: 0.4282   proposal_: 0.3599

Question 1: There are several results (0head_, 1head_, 2head_, 3head_, 4head_, proposal_), and which one is proper to be reported in the paper?
Question 2: These results are not very comparable to the results in your paper (IoU[0.25] 63.0, IoU[0.5] 45.2). I'm not sure what's going wrong.

Thank you and look forward for your reply.

Code Question

First of all, thank you for sharing your work. I've been working with your code recently, modifying a few sections and noticed a few things I don't quite understand.

In your paper you stated that you used a random scaling of 0.9 to 1.1 as augmentation on the Scannet data set, however the augmentation code provided only applies random flipping and rotation or did I miss the section where the scaling is applied?

Secondly I wasn't able to reproduce your results on the Scannet data set, coming 1% short on the mAP score @25 IoU and @50 IoU. Now I'm wondering if this may be due to the fact that I'm only using a single GPU for training? As far as I understand you do not sync the batch norm across GPUs and the batch on each GPU being smaller may actually beneficial to the training?

And when I looked at the transformer code, I noticed that each attention layer uses 288 dimensional features. I was wondering if there is a specific reason for choosing this value, as it seems quite low to me and I would have thought that a power of 2 would be more inline with most architectures.

I would really appreciate it if I could gain your insights on this.

Eval: Some classes output NaN because of Npos=0

Hi!
I am trying to evaluate on all Scannet classes (485). Since some classes are very rare, running the eval_det_cls for them throws NaN because of npos=0. Can you recommend a fix for this?

Question about the size_cls_agnostic

I found that ‘size_gts’ is used to supervise the pred-size of object when setting size_cls_agnostic is True. I would like to ask the reason for using ‘size_gts’ instead of ‘box3d_size’ as supervision information.

Some files missed on SUNRGBD

I have followed the process under sunrgbd, and the dataset can be run with votenet.
But it failed to run with Group-Free-3D.
The following file is missed:
all_obbs_modified_nearest_has_empty.pkl
all_pc_modified_nearest_has_empty.pkl
all_point_labels_nearest_has_empty.pkl
Can you provide related files?
Thanks 😀

Loss become nan when at about 300 epoch

Thanks for your excellent work!

I encountered a problem during the training.
Since I only have one GPU, so I modified the train_dist.py to a single GPU version (just remove the codes about distribution).
Screenshot from 2021-05-11 09-31-19

I want to know if is there anything else needs to be modified? And if you have any suggestion about this problem? Thanks very much!!

the eval ap lower than the train ap

Dear author, the result of running eval_avg.py is not as good as that of evaluating during training, and the effect is reduced by 4%. Is this due to overfitting?

About voting

Thanks for your great work. You mentioned in appendix A1.2 that you implemented voting into the framework. But it seems that no experiment or code can be found in the paper or in this repo.

There is no "demo.py"

I wonder how different stages of results are ensembled in this method. This part of the code is not given, which should be very important according to the paper. Even in the evaluation and test stage, the loss is an average of loss in different stages, but not the loss of a final estimated result. I don't think this is reasonable.

Question about iterative object box prediction

Hi, thanks for your sharing. I find for each decoder layer, you use the cluster_xyz as the initial location instead of the updated base_xyz

base_xyz, base_size = self.prediction_heads[i](query,
base_xyz=cluster_xyz,
end_points=end_points,
prefix=prefix)
base_xyz = base_xyz.detach().clone()
base_size = base_size.detach().clone()

My question is since each layer uses the box location of the previous layer to produce the spatial encoding, why does each layer predict the offset to the gt box location relative to the initial cluster center instead of the updated center of the previous layer? In another word, why not

base_xyz, base_size = self.prediction_heads[i](query,
                           base_xyz=base_xyz,                                               
                           end_points=end_points, 
                           prefix=prefix)

And under your setting, I think the "auxiliary loss" is necessary? The reason is that if no auxiliary loss is applied, the prediction head of the first N-1 decoder layers will not get supervision for the center_residual, the updated box prediction and spatial encoding for the next decoder layer will be meanless. Am I correct?

Best,
Xuyang

Question about label assign

Hi, Thanks for sharing this excellent work. In the paper, you mentioned that you manually assign the object candidates to ground-truth. Could you please explain the detail a little bit and point out where is the code?

Inference Queries

@stupidZZ Thanks for opensourcing the code base , i have few queries

  1. how to run on custom point cloud dataset , should we pre process into any one of the format
  2. how to visualize the results shown in the paper , can you please sharre the visualization code base
  3. i am able to run the model and getting the following results hwo to validate the results by metrics and visualization
    dict_keys(['sa1_inds', 'sa1_xyz', 'sa1_features', 'sa2_inds', 'sa2_xyz', 'sa2_features', 'sa3_xyz', 'sa3_features', 'sa4_xyz', 'sa4_features', 'fp2_features', 'fp2_xyz', 'fp2_inds', 'seed_inds', 'seed_xyz', 'seed_features', 'seeds_obj_cls_logits', 'query_points_xyz', 'query_points_feature', 'query_points_sample_inds', 'proposal_base_xyz', 'proposal_objectness_scores', 'proposal_center', 'proposal_heading_scores', 'proposal_heading_residuals_normalized', 'proposal_heading_residuals', 'proposal_pred_size', 'proposal_sem_cls_scores', '0head_base_xyz', '0head_objectness_scores', '0head_center', '0head_heading_scores', '0head_heading_residuals_normalized', '0head_heading_residuals', '0head_pred_size', '0head_sem_cls_scores', '1head_base_xyz', '1head_objectness_scores', '1head_center', '1head_heading_scores', '1head_heading_residuals_normalized', '1head_heading_residuals', '1head_pred_size', '1head_sem_cls_scores', '2head_base_xyz', '2head_objectness_scores', '2head_center', '2head_heading_scores', '2head_heading_residuals_normalized', '2head_heading_residuals', '2head_pred_size', '2head_sem_cls_scores', '3head_base_xyz', '3head_objectness_scores', '3head_center', '3head_heading_scores', '3head_heading_residuals_normalized', '3head_heading_residuals', '3head_pred_size', '3head_sem_cls_scores', '4head_base_xyz', '4head_objectness_scores', '4head_center', '4head_heading_scores', '4head_heading_residuals_normalized', '4head_heading_residuals', '4head_pred_size', '4head_sem_cls_scores', 'last_base_xyz', 'last_objectness_scores', 'last_center', 'last_heading_scores', 'last_heading_residuals_normalized', 'last_heading_residuals', 'last_pred_size', 'last_sem_cls_scores', 'point_clouds', 'center_label', 'heading_class_label', 'heading_residual_label', 'size_class_label', 'size_residual_label', 'size_gts', 'sem_cls_label', 'box_label_mask', 'point_obj_mask', 'point_instance_label', 'scan_idx', 'max_gt_bboxes', 'points_hard_topk4_pos_ratio', 'points_hard_topk4_neg_ratio', 'points_hard_topk4_upper_recall_ratio', 'query_points_generation_loss', 'proposal_objectness_label', 'proposal_objectness_mask', 'proposal_object_assignment', 'proposal_pos_ratio', 'proposal_neg_ratio', 'proposal_objectness_loss', 'last_objectness_label', 'last_objectness_mask', 'last_object_assignment', 'last_pos_ratio', 'last_neg_ratio', 'last_objectness_loss', '0head_objectness_label', '0head_objectness_mask', '0head_object_assignment', '0head_pos_ratio', '0head_neg_ratio', '0head_objectness_loss', '1head_objectness_label', '1head_objectness_mask', '1head_object_assignment', '1head_pos_ratio', '1head_neg_ratio', '1head_objectness_loss', '2head_objectness_label', '2head_objectness_mask', '2head_object_assignment', '2head_pos_ratio', '2head_neg_ratio', '2head_objectness_loss', '3head_objectness_label', '3head_objectness_mask', '3head_object_assignment', '3head_pos_ratio', '3head_neg_ratio', '3head_objectness_loss', '4head_objectness_label', '4head_objectness_mask', '4head_object_assignment', '4head_pos_ratio', '4head_neg_ratio', '4head_objectness_loss', 'sum_heads_objectness_loss', 'proposal_center_loss', 'proposal_heading_cls_loss', 'proposal_heading_reg_loss', 'proposal_size_reg_loss', 'proposal_box_loss', 'proposal_sem_cls_loss', 'last_center_loss', 'last_heading_cls_loss', 'last_heading_reg_loss', 'last_size_reg_loss', 'last_box_loss', 'last_sem_cls_loss', '0head_center_loss', '0head_heading_cls_loss', '0head_heading_reg_loss', '0head_size_reg_loss', '0head_box_loss', '0head_sem_cls_loss', '1head_center_loss', '1head_heading_cls_loss', '1head_heading_reg_loss', '1head_size_reg_loss', '1head_box_loss', '1head_sem_cls_loss', '2head_center_loss', '2head_heading_cls_loss', '2head_heading_reg_loss', '2head_size_reg_loss', '2head_box_loss', '2head_sem_cls_loss', '3head_center_loss', '3head_heading_cls_loss', '3head_heading_reg_loss', '3head_size_reg_loss', '3head_box_loss', '3head_sem_cls_loss', '4head_center_loss', '4head_heading_cls_loss', '4head_heading_reg_loss', '4head_size_reg_loss', '4head_box_loss', '4head_sem_cls_loss', 'sum_heads_box_loss', 'sum_heads_sem_cls_loss', 'loss', 'batch_gt_map_cls'])
    Thanks in advance

Training without Instance Segmentation

Is it possible to train the group-free network without having access to per point instance labels, so only using 3D bounding boxes? The loss calculation seems to depend on instance labels as far as I can tell

why 'end_points['fp2_inds'] = end_points['sa1_inds'][:, 0:num_seed]' ?

In the models/backbone_module.py, you select the first 1024 out of 2048 sa1_inds as fp2_inds. I can understand that the intention behind this is to obtain the indices of these 1024 seed points in the entire point cloud, in order to participate in the loss calculation in the function compute_points_obj_cls_loss_hard_topk.

However, directly selecting the first 1024 out of 2048 sa1_inds does not correspond one-to-one with fp2_xyz. This mismatch would cause euclidean_dist1 and object_assignment_one_hot variables in the function compute_points_obj_cls_loss_hard_topk to not be aligned one-to-one. Doesn't this introduce an error in the supervision signal for KPS?

models/backbone_module.py:

        # --------- 2 FEATURE UPSAMPLING LAYERS --------
        features = self.fp1(end_points['sa3_xyz'], end_points['sa4_xyz'], end_points['sa3_features'],
                            end_points['sa4_features'])
        features = self.fp2(end_points['sa2_xyz'], end_points['sa3_xyz'], end_points['sa2_features'], features)
        end_points['fp2_features'] = features
        end_points['fp2_xyz'] = end_points['sa2_xyz']
        num_seed = end_points['fp2_xyz'].shape[1]
        end_points['fp2_inds'] = end_points['sa1_inds'][:, 0:num_seed]  # indices among the entire input point clouds

        return end_points

models/loss_helper.py:

def compute_points_obj_cls_loss_hard_topk(end_points, topk):
    box_label_mask = end_points['box_label_mask']
    seed_inds = end_points['seed_inds'].long()  # B, K
    seed_xyz = end_points['seed_xyz']  # B, K, 3
    seeds_obj_cls_logits = end_points['seeds_obj_cls_logits']  # B, 1, K
    gt_center = end_points['center_label'][:, :, 0:3]  # B, K2, 3
    gt_size = end_points['size_gts'][:, :, 0:3]  # B, K2, 3
    B = gt_center.shape[0]
    K = seed_xyz.shape[1]
    K2 = gt_center.shape[1]

    point_instance_label = end_points['point_instance_label']  # B, num_points
    object_assignment = torch.gather(point_instance_label, 1, seed_inds)  # B, num_seed
    object_assignment[object_assignment < 0] = K2 - 1  # set background points to the last gt bbox
    object_assignment_one_hot = torch.zeros((B, K, K2)).to(seed_xyz.device)
    object_assignment_one_hot.scatter_(2, object_assignment.unsqueeze(-1), 1)  # (B, K, K2)
    delta_xyz = seed_xyz.unsqueeze(2) - gt_center.unsqueeze(1)  # (B, K, K2, 3)
    delta_xyz = delta_xyz / (gt_size.unsqueeze(1) + 1e-6)  # (B, K, K2, 3)
    new_dist = torch.sum(delta_xyz ** 2, dim=-1)
    euclidean_dist1 = torch.sqrt(new_dist + 1e-6)  # BxKxK2
    euclidean_dist1 = euclidean_dist1 * object_assignment_one_hot + 100 * (1 - object_assignment_one_hot)  # BxKxK2

Inference on custom scans

Hi,
Can you please guide me that how can I use your trained Model on a custom scanned 3D point cloud for object detection.

Thanks

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.