hongfz16 / ds-net Goto Github PK

View Code? Open in Web Editor NEW

240.0 10.0 29.0 553 KB

[CVPR 2021/TPAMI 2023] Rank 1st in the public leaderboard of SemanticKITTI Panoptic Segmentation (2020-11-16)

License: MIT License

Python 97.24% Shell 2.76%

semantickitti panoptic-segmentation

ds-net's Issues

loss functiond of training ds-net

DS-Net/network/model_zoo.py

Line 365 in 4a73e2f

def calc_loss(self, sem_logits, pred_offsets, inputs, need_minus_one=True):

HI, if I want to train the ds-net, which if-else line should I use to get the offset_loss_list? And what is the cfg.MODEL.INS_LOSS here?

Difference between original and latest version of Cylinder3D

Hi,

in the README you say that DS-Net is based on the original version of Cylinder3D and link to the paper that reports 64.3% mIoU on the SemanticKITTI val set.
I compared the layers in the pretrained weights you provide here with the pretrained weights of the current Cylinder3D version that are provided in their repo.
It seems that the only difference is that MODEL.VFE.OUT_CHANNEL = 64 in the "original" version and 256 in the current version.
With this parameter change I evaluated the "sem_pretrain.pth" on the SemanticKITTI val set and got an mIoU of 58.5% which is very far away from the 64.3% reported in the paper.

Can you explain how you trained the pretrained model of step 1 and what exactly is the difference to the latest version of Cylinder3D?

Thank you

Input files for inference

Hi! Thanks for sharing your great work. I've done some inference tests using the DS-Net model and realized that the calibration and odometry files of the Kitti dataset (poses/time/calib) are required for running the code without error.
I was wondering if the method uses the information of these files for anything. If that's the case, what is it for? I don't find the information in the paper (LiDAR-based Panoptic Segmentation via Dynamic Shifting Network). Thank you in advance!

Error: Caught Exception in DataLoader worker process 0

Hi, thanks for you excellent work firstly.

I got the error like

TypeError: Caught TypeError in DataLoader worker process 0.
TypeError: expected dtype object, got 'numpy.dtype[uint8]'

when running into the line for i_iter, inputs in enumerate(val_dataset_loader): in cfg_train.py script, I have googled different solutions but not worked.

Hope you could do me a favor, thanks in advance.

Looking forward to the code

Exact formula to get weights for weighted cross-entropy loss

Hi, in network/model_zoo.py, there are weights for weighted cross-entropy loss(in PolarSpconv class). Could you tell me the exact formula to get each weight in the cross-entropy loss?

Semantic segmentation training code release?

When would you like to release the training code of Semantic segmentation?

About ins_labels

Thank you open you code!

dataloader -> dataset.py line 79:
"ins_labels = annotated_data" should be changed to "ins_labels = annotated_data >> 16"?

About Consensus-driven Fusion module

Hi, thanks for your great work. I wonder to know that how the Consensus-driven Fusion module implementing？I found the merge_ins_sem in common_utils. But I donot find how to use the 'merge_ins_sem' in other code.

KITTI classes used for training

Hi! First of all, thanks for sharing this excellent work.
In the semantic-kitti.yaml configuration file, you list 34 labels, most of them (28) corresponding to the KITTI classes. The rest of them (6) are not included in the KITTI classes. Also, in your paper, you explain that you conducted an experiment with the KITTI dataset that has 28 annotated semantic classes. Where does this discrepancy come from? Did you use this class list for training the model, or you just used the KITTIs?
Thanks in advance!!

Number of GPUs for training of 4D-DS-Net

Hi! Thanks for sharing your great project!
I have a question regarding 4D-DS-Net. I tried to finetune the backbone of 4D-DS-Net. I have the possibility to train on 4 Geforce GTX 1080Ti. Unfortunately even on 4 Geforce GTX 1080Ti I am getting a "CUDA OUT OF MEMORY ERROR". On how many GPUs you trained the 4D-DS-Net?
Is there a possibility that you share your pretrained models for 4D-DS-Net?
Thanks in advance!

Validation script error: Net/network/modules/pytorch_meanshift.py(95)calc_shifted_matrix_flat_kernel_bandwidth_weight()

Hi!

I'm trying to reproduce the validation results from your work using the validation script for pytorch. I changed the path to the dataset and ran the command sh ./scripts/release/dsnet/val_dsnet_pytorch_dist_custom.sh

It stats to execute correctly, but at 7th of 4071 steps of validation it freezes and outputs:

fname=08/velodyne/000007.bin, ins_num=29]> /home/fkitashov/Documents/repositories/DS-Net/network/modules/pytorch_meanshift.py(95)calc_shifted_matrix_flat_kernel_bandwidth_weight() -> if self.data_mode == 'offset': (Pdb)

Have you ever encountered such a problem? If so, how did you resolve it? Thank you

Terminal output:

/home/fkitashov/anaconda3/lib/python3.8/site-packages/torch/distributed/launch.py:163: DeprecationWarning: The 'warn' method is deprecated, use 'warning' instead
logger.warn(
The module torch.distributed.launch is deprecated and going to be removed in future.Migrate to torch.distributed.run
WARNING:torch.distributed.run:--use_env is deprecated and will be removed in future releases.
Please read local_rank from os.environ('LOCAL_RANK') instead.
INFO:torch.distributed.launcher.api:Starting elastic_operator with launch configs:
entrypoint : cfg_train.py
min_nodes : 1
max_nodes : 1
nproc_per_node : 1
run_id : none
rdzv_backend : static
rdzv_endpoint : 127.0.0.1:29500
rdzv_configs : {'rank': 0, 'timeout': 900}
max_restarts : 3
monitor_interval : 5
log_dir : None
metrics_cfg : {}

INFO:torch.distributed.elastic.agent.server.local_elastic_agent:log directory set to: /tmp/torchelastic_c3imu6dz/none_n8t9sofz
INFO:torch.distributed.elastic.agent.server.api:[default] starting workers for entrypoint: python
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous'ing worker group
/home/fkitashov/anaconda3/lib/python3.8/site-packages/torch/distributed/elastic/utils/store.py:52: FutureWarning: This is an experimental API and will be changed in future.
warnings.warn(
INFO:torch.distributed.elastic.agent.server.api:[default] Rendezvous complete for workers. Result:
restart_count=0
master_addr=127.0.0.1
master_port=29500
group_rank=0
group_world_size=1
local_ranks=[0]
role_ranks=[0]
global_ranks=[0]
role_world_sizes=[1]
global_world_sizes=[1]

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_c3imu6dz/none_n8t9sofz/attempt_0/0/error.json
2021-08-04 18:59:39,727 INFO Start logging
2021-08-04 18:59:39,727 INFO CUDA_VISIBLE_DEVICES=ALL
2021-08-04 18:59:39,727 INFO total_batch_size: 1
2021-08-04 18:59:39,727 INFO config cfgs/release/dsnet_custom.yaml
2021-08-04 18:59:39,727 INFO ckpt_name PolarOffset.pth
2021-08-04 18:59:39,727 INFO launcher pytorch
2021-08-04 18:59:39,727 INFO batch_size 1
2021-08-04 18:59:39,727 INFO tcp_port 12345
2021-08-04 18:59:39,727 INFO local_rank 0
2021-08-04 18:59:39,727 INFO sync_bn False
2021-08-04 18:59:39,727 INFO tag val_dsnet_pytorch_dist
2021-08-04 18:59:39,727 INFO onlyval True
2021-08-04 18:59:39,727 INFO saveval False
2021-08-04 18:59:39,727 INFO onlytest False
2021-08-04 18:59:39,727 INFO pretrained_ckpt pretrained_weight/dsnet_pretrain_pq_0.577.pth
2021-08-04 18:59:39,727 INFO nofix False
2021-08-04 18:59:39,727 INFO fix_semantic_instance True
2021-08-04 18:59:39,727 INFO cfg.ROOT_DIR: /home/fkitashov/Documents/repositories/DS-Net
2021-08-04 18:59:39,728 INFO cfg.LOCAL_RANK: 0
2021-08-04 18:59:39,728 INFO
cfg.DATA_CONFIG = edict()
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATASET_NAME: SemanticKitti
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATASET_PATH: /datasets/KITTI/dataset/
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.NCLASS: 20
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.RETURN_REF: True
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.RETURN_INS_ID: True
2021-08-04 18:59:39,728 INFO
cfg.DATA_CONFIG.DATALOADER = edict()
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.VOXEL_TYPE: Spherical
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.GRID_SIZE: [480, 360, 32]
2021-08-04 18:59:39,728 INFO
cfg.DATA_CONFIG.DATALOADER.AUGMENTATION = edict()
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.AUGMENTATION.ROTATE: True
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.AUGMENTATION.FLIP: True
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.AUGMENTATION.TRANSFORM: True
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.AUGMENTATION.TRANSFORM_STD: [0.1, 0.1, 0.1]
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.AUGMENTATION.SCALE: True
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.IGNORE_LABEL: 255
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.CONVERT_IGNORE_LABEL: 0
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.FIXED_VOLUME_SPACE: True
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.MAX_VOLUME_SPACE: [50, 3.141592653589793, 1.5]
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.MIN_VOLUME_SPACE: [3, -3.141592653589793, -3]
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.CENTER_TYPE: Axis_center
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.DATA_DIM: 9
2021-08-04 18:59:39,728 INFO cfg.DATA_CONFIG.DATALOADER.NUM_WORKER: 1
2021-08-04 18:59:39,728 INFO
cfg.OPTIMIZE = edict()
2021-08-04 18:59:39,728 INFO cfg.OPTIMIZE.LR: 0.002
2021-08-04 18:59:39,728 INFO cfg.OPTIMIZE.MAX_EPOCH: 50
2021-08-04 18:59:39,728 INFO
cfg.MODEL = edict()
2021-08-04 18:59:39,728 INFO cfg.MODEL.NAME: PolarOffsetSpconvPytorchMeanshift
2021-08-04 18:59:39,728 INFO
cfg.MODEL.MODEL_FN = edict()
2021-08-04 18:59:39,729 INFO cfg.MODEL.MODEL_FN.PT_POOLING: max
2021-08-04 18:59:39,729 INFO cfg.MODEL.MODEL_FN.MAX_PT_PER_ENCODE: 256
2021-08-04 18:59:39,729 INFO cfg.MODEL.MODEL_FN.PT_SELECTION: random
2021-08-04 18:59:39,729 INFO cfg.MODEL.MODEL_FN.FEATURE_COMPRESSION: 16
2021-08-04 18:59:39,729 INFO
cfg.MODEL.VFE = edict()
2021-08-04 18:59:39,729 INFO cfg.MODEL.VFE.NAME: PointNet
2021-08-04 18:59:39,729 INFO cfg.MODEL.VFE.OUT_CHANNEL: 64
2021-08-04 18:59:39,729 INFO
cfg.MODEL.BACKBONE = edict()
2021-08-04 18:59:39,729 INFO cfg.MODEL.BACKBONE.NAME: Spconv_salsaNet_res_cfg
2021-08-04 18:59:39,729 INFO cfg.MODEL.BACKBONE.INIT_SIZE: 32
2021-08-04 18:59:39,729 INFO
cfg.MODEL.SEM_HEAD = edict()
2021-08-04 18:59:39,729 INFO cfg.MODEL.SEM_HEAD.NAME: Spconv_sem_logits_head_cfg
2021-08-04 18:59:39,729 INFO
cfg.MODEL.INS_HEAD = edict()
2021-08-04 18:59:39,729 INFO cfg.MODEL.INS_HEAD.NAME: Spconv_ins_offset_concatxyz_threelayers_head_cfg
2021-08-04 18:59:39,729 INFO cfg.MODEL.INS_HEAD.EMBEDDING_CHANNEL: 3
2021-08-04 18:59:39,729 INFO
cfg.MODEL.MEANSHIFT = edict()
2021-08-04 18:59:39,729 INFO cfg.MODEL.MEANSHIFT.NAME: pytorch_meanshift
2021-08-04 18:59:39,729 INFO cfg.MODEL.MEANSHIFT.BANDWIDTH: [0.2, 1.7, 3.2]
2021-08-04 18:59:39,729 INFO cfg.MODEL.MEANSHIFT.ITERATION: 4
2021-08-04 18:59:39,729 INFO cfg.MODEL.MEANSHIFT.DATA_MODE: offset
2021-08-04 18:59:39,729 INFO cfg.MODEL.MEANSHIFT.SHIFT_MODE: matrix_flat_kernel_bandwidth_weight
2021-08-04 18:59:39,729 INFO cfg.MODEL.MEANSHIFT.DOWNSAMPLE_MODE: xyz
2021-08-04 18:59:39,729 INFO cfg.MODEL.MEANSHIFT.POINT_NUM_TH: 10000
2021-08-04 18:59:39,729 INFO cfg.MODEL.SEM_LOSS: Lovasz_loss
2021-08-04 18:59:39,729 INFO cfg.MODEL.INS_LOSS: offset_loss_regress_vec
2021-08-04 18:59:39,729 INFO
cfg.MODEL.POST_PROCESSING = edict()
2021-08-04 18:59:39,729 INFO cfg.MODEL.POST_PROCESSING.CLUSTER_ALGO: MeanShift_embedding_cluster
2021-08-04 18:59:39,729 INFO cfg.MODEL.POST_PROCESSING.BANDWIDTH: 0.65
2021-08-04 18:59:39,729 INFO cfg.MODEL.POST_PROCESSING.MERGE_FUNC: merge_ins_sem
2021-08-04 18:59:39,729 INFO cfg.DIST_TRAIN: True
2021-08-04 18:59:39,731 INFO Building dataloader for val set.
2021-08-04 18:59:39,765 INFO Flip Augmentation: False
2021-08-04 18:59:39,765 INFO Scale Augmentation: False
2021-08-04 18:59:39,765 INFO Transform Augmentation: False
2021-08-04 18:59:39,765 INFO Rotate Augmentation: False
2021-08-04 18:59:39,765 INFO Shuffle: False
2021-08-04 18:59:41,624 INFO ==> Loading parameters from pre-trained checkpoint pretrained_weight/dsnet_pretrain_pq_0.577.pth to CPU
2021-08-04 18:59:42,101 INFO Freezing backbone, semantic and instance part of the model.
2021-08-04 18:59:42,101 INFO Not using lr scheduler
2021-08-04 18:59:42,135 INFO DistributedDataParallel(
(module): PolarOffsetSpconvPytorchMeanshift(
(fea_compression): Sequential(
(0): Linear(in_features=64, out_features=16, bias=True)
(1): ReLU()
)
(backbone): Spconv_salsaNet_res_cfg(
(downCntx): ResContextBlock(
(conv1): SubMConv3d()
(bn0): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): LeakyReLU(negative_slope=0.01)
(conv1_2): SubMConv3d()
(bn0_2): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1_2): LeakyReLU(negative_slope=0.01)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
)
(resBlock2): ResBlock(
(conv1): SubMConv3d()
(act1): LeakyReLU(negative_slope=0.01)
(bn0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1_2): SubMConv3d()
(act1_2): LeakyReLU(negative_slope=0.01)
(bn0_2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(pool): SparseConv3d()
)
(resBlock3): ResBlock(
(conv1): SubMConv3d()
(act1): LeakyReLU(negative_slope=0.01)
(bn0): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1_2): SubMConv3d()
(act1_2): LeakyReLU(negative_slope=0.01)
(bn0_2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(pool): SparseConv3d()
)
(resBlock4): ResBlock(
(conv1): SubMConv3d()
(act1): LeakyReLU(negative_slope=0.01)
(bn0): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1_2): SubMConv3d()
(act1_2): LeakyReLU(negative_slope=0.01)
(bn0_2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(pool): SparseConv3d()
)
(resBlock5): ResBlock(
(conv1): SubMConv3d()
(act1): LeakyReLU(negative_slope=0.01)
(bn0): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1_2): SubMConv3d()
(act1_2): LeakyReLU(negative_slope=0.01)
(bn0_2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(pool): SparseConv3d()
)
(upBlock0): UpBlock(
(trans_dilao): SubMConv3d()
(trans_act): LeakyReLU(negative_slope=0.01)
(trans_bn): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): SubMConv3d()
(act1): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn3): BatchNorm1d(512, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(up_subm): SparseInverseConv3d()
)
(upBlock1): UpBlock(
(trans_dilao): SubMConv3d()
(trans_act): LeakyReLU(negative_slope=0.01)
(trans_bn): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): SubMConv3d()
(act1): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn3): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(up_subm): SparseInverseConv3d()
)
(upBlock2): UpBlock(
(trans_dilao): SubMConv3d()
(trans_act): LeakyReLU(negative_slope=0.01)
(trans_bn): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): SubMConv3d()
(act1): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn3): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(up_subm): SparseInverseConv3d()
)
(upBlock3): UpBlock(
(trans_dilao): SubMConv3d()
(trans_act): LeakyReLU(negative_slope=0.01)
(trans_bn): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv1): SubMConv3d()
(act1): LeakyReLU(negative_slope=0.01)
(bn1): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv2): SubMConv3d()
(act2): LeakyReLU(negative_slope=0.01)
(bn2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(conv3): SubMConv3d()
(act3): LeakyReLU(negative_slope=0.01)
(bn3): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(up_subm): SparseInverseConv3d()
)
(ReconNet): ReconBlock(
(conv1): SubMConv3d()
(bn0): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): Sigmoid()
(conv1_2): SubMConv3d()
(bn0_2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1_2): Sigmoid()
(conv1_3): SubMConv3d()
(bn0_3): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1_3): Sigmoid()
)
)
(sem_head): Spconv_sem_logits_head_cfg(
(logits): SubMConv3d()
)
(ins_head): Spconv_ins_offset_concatxyz_threelayers_head_cfg(
(conv1): SubMConv3d()
(bn1): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act1): LeakyReLU(negative_slope=0.01)
(conv2): SubMConv3d()
(bn2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act2): LeakyReLU(negative_slope=0.01)
(conv3): SubMConv3d()
(bn3): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(act3): LeakyReLU(negative_slope=0.01)
(offset): Sequential(
(0): Linear(in_features=35, out_features=32, bias=True)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
)
(offset_linear): Linear(in_features=32, out_features=3, bias=True)
)
(vfe_model): PointNet(
(PPmodel): Sequential(
(0): BatchNorm1d(9, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(1): Linear(in_features=9, out_features=64, bias=True)
(2): BatchNorm1d(64, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(3): ReLU()
(4): Linear(in_features=64, out_features=128, bias=True)
(5): BatchNorm1d(128, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(6): ReLU()
(7): Linear(in_features=128, out_features=256, bias=True)
(8): BatchNorm1d(256, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(9): ReLU()
(10): Linear(in_features=256, out_features=64, bias=True)
)
)
(sem_loss): CrossEntropyLoss()
(pytorch_meanshift): PytorchMeanshift(
(learnable_bandwidth_weights_layer_list): ModuleList(
(0): Sequential(
(0): Linear(in_features=32, out_features=32, bias=True)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Linear(in_features=32, out_features=3, bias=True)
)
(1): Sequential(
(0): Linear(in_features=32, out_features=32, bias=True)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Linear(in_features=32, out_features=3, bias=True)
)
(2): Sequential(
(0): Linear(in_features=32, out_features=32, bias=True)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Linear(in_features=32, out_features=3, bias=True)
)
(3): Sequential(
(0): Linear(in_features=32, out_features=32, bias=True)
(1): BatchNorm1d(32, eps=1e-05, momentum=0.1, affine=True, track_running_stats=True)
(2): ReLU()
(3): Linear(in_features=32, out_features=3, bias=True)
)
)
)
)
)
2021-08-04 18:59:42.231665: I tensorflow/stream_executor/platform/default/dso_loader.cc:53] Successfully opened dynamic library libcudart.so.11.0
2021-08-04 18:59:42,821 INFO Start Training
2021-08-04 18:59:42,822 INFO ----EPOCH -1 Evaluating----
New evaluator with min_points of 50
New evaluator with min_points of 50
0%|▏ | 8/4071 [00:12<1:33:04, 1.37s/it, loss=3.99, fname=08/velodyne/000007.bin, ins_num=29]> /home/fkitashov/Documents/repositories/DS-Net/network/modules/pytorch_meanshift.py(95)calc_shifted_matrix_flat_kernel_bandwidth_weight()
-> if self.data_mode == 'offset':
(Pdb)
(Pdb)

Training the semantic head together with instance head

Thank you for open-sourcing your work.

Did you try the no-fix setting in cfg-train.py? What was the takeaway there? Is it better the complete network end-to-end or just fine-tune the regression heads as done for the released models?
I tried to run this setting, but ran into an error:
RuntimeError: Function 'SubMConvFunctionBackward' returned nan values in its 1th output.

Starting worker group stuck

In config yaml files i see that num_worker are all set to 1. But during my testing, it tries to reach 4 workers and got a stuck as there's only one gpu on my server. Does anyone know where's the problem?
Feedback:

INFO:torch.distributed.elastic.agent.server.api:[default] Starting worker group
INFO:torch.distributed.elastic.multiprocessing:Setting worker0 reply file to: /tmp/torchelastic_eslpage1/none_0vzsx60e/attempt_0/0/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker1 reply file to: /tmp/torchelastic_eslpage1/none_0vzsx60e/attempt_0/1/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker2 reply file to: /tmp/torchelastic_eslpage1/none_0vzsx60e/attempt_0/2/error.json
INFO:torch.distributed.elastic.multiprocessing:Setting worker3 reply file to: /tmp/torchelastic_eslpage1/none_0vzsx60e/attempt_0/3/error.json

Stuck here.

About the inference time

Hi, thanks for your sharing what's an amazing work in lidar panoptic segmentation field.

I wonder what the exact inference time in Sementic KITTI dataset and nuScenes dataset?

shuffle makes no sense

Hi,
in the file function voxelize_spconv, which is in model_zoom.py, you shuffle all points by
cat_pt_ind = cat_pt_ind[shuffled_ind,:]

and then the output unq and processed_pooled_data is still in the order of the grid, which is done automatically by scatter_max function.
pooled_data = torch_scatter.scatter_max(processed_cat_pt_fea, unq_inv, dim=0)[0]

It seems this step makes no point in the function.

Usually, we shuffle the data in the dataloader, which has been done automatically, instead of shuffle the date within the batch.

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.