zeliu98 / group-free-3d Goto Github PK
View Code? Open in Web Editor NEWGroup-Free 3D Object Detection via Transformers
License: MIT License
Group-Free 3D Object Detection via Transformers
License: MIT License
Hi, thank you for releasing your codebase!
I wanted to ask - SUN-RGBD results seem to be unstable. I was wondering if you trained a single model and evaluated on 5 seeds, or you trained 5 models on different seeds?
Further, did you notice some variations between training runs?
Your work is very good, can you please share the code for the visualization box during validation?
Dear Author:
I want to know how to visualize the bounding box
Hi, thanks for the nice work.
I train your network on SUN RGBD dataset with the training script:
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --master_port 2222 --nproc_per_node 4 train_dist.py --max_epoch 600 --lr_decay_epochs 420 480 540 --num_point 20000 --num_decoder_layers 6 --size_cls_agnostic --size_delta 0.0625 --heading_delta 0.04 --center_delta 0.1111111111111 --learning_rate 0.004 --decoder_learning_rate 0.0002 --weight_decay 0.00000001 --query_points_generator_loss_coef 0.2 --obj_loss_coef 0.4 --dataset sunrgbd --data_root .
I obtain the following results:
[08/24 23:51:00 group-free]: IoU[0.25]: 0head_: 0.6363 1head_: 0.6320 2head_: 0.6202 3head_: 0.6132 4head_: 0.6163 last_: 0.6164 proposal_: 0.6108
[08/24 23:51:00 group-free]: IoU[0.5]: 0head_: 0.4328 1head_: 0.4388 2head_: 0.4095 3head_: 0.4329 4head_: 0.4441 last_: 0.4282 proposal_: 0.3599
Question 1: There are several results (0head_, 1head_, 2head_, 3head_, 4head_, proposal_), and which one is proper to be reported in the paper?
Question 2: These results are not very comparable to the results in your paper (IoU[0.25] 63.0, IoU[0.5] 45.2). I'm not sure what's going wrong.
Thank you and look forward for your reply.
Dear author,
Thank you for your good work.
I want to know how to visualize the results in Figure 5. Can you provide the corresponding visualization code?
First of all, thank you for sharing your work. I've been working with your code recently, modifying a few sections and noticed a few things I don't quite understand.
In your paper you stated that you used a random scaling of 0.9 to 1.1 as augmentation on the Scannet data set, however the augmentation code provided only applies random flipping and rotation or did I miss the section where the scaling is applied?
Secondly I wasn't able to reproduce your results on the Scannet data set, coming 1% short on the mAP score @25 IoU and @50 IoU. Now I'm wondering if this may be due to the fact that I'm only using a single GPU for training? As far as I understand you do not sync the batch norm across GPUs and the batch on each GPU being smaller may actually beneficial to the training?
And when I looked at the transformer code, I noticed that each attention layer uses 288 dimensional features. I was wondering if there is a specific reason for choosing this value, as it seems quite low to me and I would have thought that a power of 2 would be more inline with most architectures.
I would really appreciate it if I could gain your insights on this.
Hi!
I am trying to evaluate on all Scannet classes (485). Since some classes are very rare, running the eval_det_cls
for them throws NaN
because of npos=0
. Can you recommend a fix for this?
I found that ‘size_gts’ is used to supervise the pred-size of object when setting size_cls_agnostic is True. I would like to ask the reason for using ‘size_gts’ instead of ‘box3d_size’ as supervision information.
I have followed the process under sunrgbd
, and the dataset can be run with votenet.
But it failed to run with Group-Free-3D.
The following file is missed:
all_obbs_modified_nearest_has_empty.pkl
all_pc_modified_nearest_has_empty.pkl
all_point_labels_nearest_has_empty.pkl
Can you provide related files?
Thanks 😀
How to modify the code?
Thanks for your excellent work!
I encountered a problem during the training.
Since I only have one GPU, so I modified the train_dist.py to a single GPU version (just remove the codes about distribution).
I want to know if is there anything else needs to be modified? And if you have any suggestion about this problem? Thanks very much!!
I download the SUN RGB-D dataset, but it does not meet the program you need!
Thank you for your great work. Could you please share with me your processed Scannet V2 ?
I encounter the problem when I run the train_dist.py. I want to konw how to solve this question. I look forward for your reply!
Dear author, the result of running eval_avg.py is not as good as that of evaluating during training, and the effect is reduced by 4%. Is this due to overfitting?
Thanks for your great work. You mentioned in appendix A1.2 that you implemented voting into the framework. But it seems that no experiment or code can be found in the paper or in this repo.
I wonder how different stages of results are ensembled in this method. This part of the code is not given, which should be very important according to the paper. Even in the evaluation and test stage, the loss is an average of loss in different stages, but not the loss of a final estimated result. I don't think this is reasonable.
Good work,I have run training on my own data and want to visualize the result,where can I find the code?
many thanks!
It will be very helpful for us to get more insight from your work, thanks!
Dear author ,have you try your code on outdoor datasets, like KITTI? How about the performance?~~~
Hi, thanks for your sharing. I find for each decoder layer, you use the cluster_xyz
as the initial location instead of the updated base_xyz
Group-Free-3D/models/detector.py
Lines 221 to 227 in ef8b7bb
My question is since each layer uses the box location of the previous layer to produce the spatial encoding, why does each layer predict the offset to the gt box location relative to the initial cluster center instead of the updated center of the previous layer? In another word, why not
base_xyz, base_size = self.prediction_heads[i](query,
base_xyz=base_xyz,
end_points=end_points,
prefix=prefix)
And under your setting, I think the "auxiliary loss" is necessary? The reason is that if no auxiliary loss is applied, the prediction head of the first N-1 decoder layers will not get supervision for the center_residual
, the updated box prediction and spatial encoding for the next decoder layer will be meanless. Am I correct?
Best,
Xuyang
Hi, Thanks for sharing this excellent work. In the paper, you mentioned that you manually assign the object candidates to ground-truth. Could you please explain the detail a little bit and point out where is the code?
@stupidZZ Thanks for opensourcing the code base , i have few queries
Is it possible to train the group-free network without having access to per point instance labels, so only using 3D bounding boxes? The loss calculation seems to depend on instance labels as far as I can tell
hello. I process this script: sunrgbd_data.py ,but can not gain “all_point_labels_nearest_has_empty.pkl”. .. Thank you for you help.
It raised an error when I ran train_dist.py.
AttributeError: Can't pickle local object 'get_loader..my_worker_init_fn'
Anyone in the same situation?
In the models/backbone_module.py, you select the first 1024 out of 2048 sa1_inds
as fp2_inds
. I can understand that the intention behind this is to obtain the indices of these 1024 seed points in the entire point cloud, in order to participate in the loss calculation in the function compute_points_obj_cls_loss_hard_topk
.
However, directly selecting the first 1024 out of 2048 sa1_inds
does not correspond one-to-one with fp2_xyz
. This mismatch would cause euclidean_dist1
and object_assignment_one_hot variables
in the function compute_points_obj_cls_loss_hard_topk
to not be aligned one-to-one. Doesn't this introduce an error in the supervision signal for KPS?
models/backbone_module.py:
# --------- 2 FEATURE UPSAMPLING LAYERS --------
features = self.fp1(end_points['sa3_xyz'], end_points['sa4_xyz'], end_points['sa3_features'],
end_points['sa4_features'])
features = self.fp2(end_points['sa2_xyz'], end_points['sa3_xyz'], end_points['sa2_features'], features)
end_points['fp2_features'] = features
end_points['fp2_xyz'] = end_points['sa2_xyz']
num_seed = end_points['fp2_xyz'].shape[1]
end_points['fp2_inds'] = end_points['sa1_inds'][:, 0:num_seed] # indices among the entire input point clouds
return end_points
models/loss_helper.py:
def compute_points_obj_cls_loss_hard_topk(end_points, topk):
box_label_mask = end_points['box_label_mask']
seed_inds = end_points['seed_inds'].long() # B, K
seed_xyz = end_points['seed_xyz'] # B, K, 3
seeds_obj_cls_logits = end_points['seeds_obj_cls_logits'] # B, 1, K
gt_center = end_points['center_label'][:, :, 0:3] # B, K2, 3
gt_size = end_points['size_gts'][:, :, 0:3] # B, K2, 3
B = gt_center.shape[0]
K = seed_xyz.shape[1]
K2 = gt_center.shape[1]
point_instance_label = end_points['point_instance_label'] # B, num_points
object_assignment = torch.gather(point_instance_label, 1, seed_inds) # B, num_seed
object_assignment[object_assignment < 0] = K2 - 1 # set background points to the last gt bbox
object_assignment_one_hot = torch.zeros((B, K, K2)).to(seed_xyz.device)
object_assignment_one_hot.scatter_(2, object_assignment.unsqueeze(-1), 1) # (B, K, K2)
delta_xyz = seed_xyz.unsqueeze(2) - gt_center.unsqueeze(1) # (B, K, K2, 3)
delta_xyz = delta_xyz / (gt_size.unsqueeze(1) + 1e-6) # (B, K, K2, 3)
new_dist = torch.sum(delta_xyz ** 2, dim=-1)
euclidean_dist1 = torch.sqrt(new_dist + 1e-6) # BxKxK2
euclidean_dist1 = euclidean_dist1 * object_assignment_one_hot + 100 * (1 - object_assignment_one_hot) # BxKxK2
Hi,
Can you please guide me that how can I use your trained Model on a custom scanned 3D point cloud for object detection.
Thanks
hi , nice repo!
I wonder to know where is the ensemble code?
thanks!
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.