Git Product home page Git Product logo

pointgroup's People

Contributors

llijiang avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pointgroup's Issues

Some experiment doubts

Thanks for your amazing PointGroup.

And I've done some experiments about it, I have some doubts:

  1. I test your pretrain model on validation set, the performance for input data with dataAugment [35.2/57.4/71.8] is higher than origin input without dataAugment (all parameters set False)[33.6/55.1/70.0] (for [mAP/mAP50/mAP25]).

  2. I decrease the length of backbone ([m, ..., 7*m] -> [m, ..., 5*m]), and set m=32 replace m=16, which improved performance slightly.

  3. The training time is verbose, so sometimes I test the performance of each epoch saved frequently, (your save step is 16 epoch, but it also saves the latest epoch), I found there would be some performance fluctuations between two adjacent epochs, sometimes the fluctuations is large(around 2 percents), even the for some saved epochs (like 320\368, 320\384), the latter epoch's performance maybe worse. Your tensorboard loss curve shows the ScoreNet loss will increase in some epochissue#7.

  4. Your pretrain model's performace can correspond to the results in your ablation study, but the performance improvement between test set and validation set is huge, I've found the improvement between validation set and test set is common in other methods, but around 7 percent improvement about AP50 is crazy, I think you may do some fine-tuning before report your test set result as you say in issue#9. Could you please share some fine-tuning details?

  5. Is the elastic distortion necessary? With this distortion and your origin setting, it is difficult to train the network to your pretrain model's performance on validation set. I've visualized the distorted coordinates, I wonder if this treatment can help training.

Have you meet such situations 1, 2 and 3 in your experiment and could you please answer questions 4 and 5?
Thanks!

Getting this to work on Windows

Has anyone managed to get this working on Windows?

  • The first issue I encountered while following the installation instructions was when I got the PackagesNotFoundError for the
    "conda install -c bioconda google-sparsehash" step. I managed to solve that by searching and using another channel that provides google-sparsehash package. Specifically, I used "conda install -c jithinpr2 google-sparsehash".
  • The next issue I encountered was again a PackagesNotFoundError at the "conda install -c daleydeng gcc-5" step. My work around was again to use another channel for this. I used "conda install -c jithinpr2 libgcc-5". What is available there however is gcc v5.2.0 (and not 5.4). It remains to be seem whether 5.2.0 suffices.
  • The next issue I am currently stuck at is at the step "python setup.py bdist_wheel" to compile spconv. I am getting "NotImplementedError" on line 36 of setup.py where it says the script code isn't implemented for Windows. Has anyone managed to fix this?

Thanks.

Architecture question

Hi, thanks a lot for the great work!
I am currently trying to implement that as part of Torch Points3D and diving deeper into the architecture I was wondering why you use a UNet for the scoring module as opposed to a simple encoder. I tried with an encoder (simpler to implement and seems logical) but as soon as the score loss kicks in the model diverges and the score loss does not really decrease with time so I was wondering if it might be coming from that or something else.
Thanks!
Nicolas

Error while compiling pointgroup_ops library

I have met some error when I compiled pointgroup_ops library. The error is:

running build_ext
building 'PG_OP' extension
gcc -pthread -B /home/xiaojiwei/anaconda2/envs/pointgroup1/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/home/xiaojiwei/anaconda2/envs/pointgroup1/lib/python3.7/site-packages/torch/include -I/home/xiaojiwei/anaconda2/envs/pointgroup1/lib/python3.7/site-packages/torch/include/torch/csrc/api/include -I/home/xiaojiwei/anaconda2/envs/pointgroup1/lib/python3.7/site-packages/torch/include/TH -I/home/xiaojiwei/anaconda2/envs/pointgroup1/lib/python3.7/site-packages/torch/include/THC -I/usr/local/cuda/include -I/home/xiaojiwei/3Dseg/PointGroup/lib/pointgroup_ops/google -I/home/xiaojiwei/anaconda2/envs/pointgroup1/include/python3.7m -c src/pointgroup_ops_api.cpp -o build/temp.linux-x86_64-3.7/src/pointgroup_ops_api.o -g -DTORCH_API_INCLUDE_EXTENSION_H -DTORCH_EXTENSION_NAME=PG_OP -D_GLIBCXX_USE_CXX11_ABI=0 -std=c++11
cc1plus: warning: command line option ‘-Wstrict-prototypes’ is valid for C/ObjC but not for C++
In file included from src/pointgroup_ops.h:3:0,
from src/pointgroup_ops_api.cpp:4:
src/datatype/datatype.h:7:33: fatal error: google/dense_hash_map: No such file or directory
compilation terminated.
error: command 'gcc' failed with exit status 1

I have tried to use add the --include-dirs following your instruction, but it seem didn't work.

voxelize bug?

The intention seems to be that "coords" passed to voxelize_idx function in voxelize.cpp may or may not include the batch indices. The assert "assert(coords.size(1) >= dimension and coords.size(1) <= dimension + 1" confirms that intention. But, in voxelize_outputmap, there is the following code:
LongInt *coord = coords + inputIdx * (dimension + 1);
This assumes that batch index is always present.
The way I see it, it is better to pass coords.size(1) as a parameter to voxelize_output function and use that instead of (dimension + 1).

bugs

when run ''python setup.py bdist_wheel'', and I've set set(CMAKE_CUDA_COMPILER "/usr/local/cuda/bin/nvcc")

-- The CUDA compiler identification is unknown
CMake Error at CMakeLists.txt:2 (project):
No CMAKE_CUDA_COMPILER could be found.

Tell CMake where to find the compiler by setting either the environment
variable "CUDACXX" or the CMake cache entry CMAKE_CUDA_COMPILER to the full
path to the compiler, or to the compiler name if it is in the PATH.

-- Configuring incomplete, errors occurred!

Conversion of LAS files

Is there a way to convert labeled point cloud LAS files into the format required for training (*_vh_clean_2.ply, *_vh_clean_2.labels.ply, *.segs.json, and *.aggregation.json)?

how about applying to sparse pointcloud?

Hi, nice work!
I'm just wondering how about your work applying to sparse point cloud such as Velodyne 64? do you do some experiments on those data? If so, I am looking forward to the experiments results!

Cannot import PG_OP

Hi,

Thank you for sharing the code.
I followed the instruction to setup the environment and successfully compile the files. However, I met the following error, when I try to import PG_OP.

 import PG_OP
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
ImportError: /home//PointGroup/lib/pointgroup_ops/PG_OP.cpython-37m-x86_64-linux-gnu.so: undefined symbol: _ZN6caffe26detail36_typeMetaDataInstance_preallocated_7E

I am using CUDA 10.2, PyTorch 1.1.0 and python3.7.
Thank you in advance.

validation mAP does not match benchmark

Thanks for sharing training and evaluation scripts. They are quite clear, useful and easy to follow.

I successfully follow the instructions in the readme, train scannetv2 for 384 epochs and run the evaluation script. However, there seems to be a gap (~5%-10%) about the mAP compared to the version shown in the scannet benchmark.

Is it normal, e.g., due to the difference between the validation and testing set? If not, any suggestions to further improve the performance to bridge the gap?

Thanks again!

instance centroid

hello,I want to ask how to find the instance centroid? Is the instance centroid on the object or outside the object?thanks!

Compiling error

Hi Jia,
Thanks for your interesting project. I wonder whether you could help me go through the compiling & installation step.
I was stuck at the following scripts, where any detailed comments/instructions to help me understand with would be very appreciated. This part lead to a failing "python setup.py bdist_wheel" I suppose
"Add the $INCLUDE_PATH$ that contains boost in lib/spconv/CMakeLists.txt. (Not necessary if it could be found.)
include_directories($INCLUDE_PATH$)"

Thanks for your time!

How much GPU memory is needed at inference time?

Maybe you can share an example from your experience with the number of Voxels.

I am interested because we are about to buy a new GPU for a project where Segmentation will be one of the tasks in a real-time pipeline.

Error when running test.py on custom dataset

Hello! I'm trying to train PointGroup on my own dataset, and I've been encountering an error when running the test.py script. The output is:

File "test.py", line 227, in
test(model, model_fn, data_name, cfg.test_epoch)
File "test.py", line 73, in test
preds = model_fn(batch, model, epoch)
File "/home/vlab/pg3/model/pointgroup/pointgroup.py", line 360, in test_model_fn
ret = model(input_, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch)
File "/home/vlab/anaconda3/envs/pg3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/home/vlab/pg3/model/pointgroup/pointgroup.py", line 319, in forward
input_feats, inp_map = self.clusters_voxelization(proposals_idx, proposals_offset, output_feats, coords, self.score_fullscale, self.score_scale, self.mode)
File "/home/vlab/pg3/model/pointgroup/pointgroup.py", line 227, in clusters_voxelization
clusters_scale = 1 / ((clusters_coords_max - clusters_coords_min) / fullscale).max(1)[0] - 0.01 # (nCluster), float
RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

Any help with this error would be much appreciated! Thank you!

Batch training actually mix points together?

By looking at the code, it seems when batch_size > 1, point cloud coords and features from different scenes are stacked together. Namely, the data in each batch is multiple scenes overlapped altogether. Although there is a batch_idx at the 1st column coords[:,0], this information is not used in the backbone spconv.

Then by applying spconv, wouldn't it be computing features over an overlapped point cloud? I am assuming the spconv should capture local features in the point cloud, but this doesn't make sense if we apply to an overlapped point cloud. Can anyone check if my understanding is correct? Thanks

Scannet v2

Hello
Your work is outstanding!
I have encountered some difficulties. How big is the scannetv2 dataset used in your paper? I saw 1T on the scannet official website, which may be a terrible number. Looking forward to your reply, thank you!

negative training loss

Hi Liang, Thanks for sharing your well-written codes. Just a kindly reminder that when exploring your repo, I noticed some negative training loss appeared after ~50 epochs. I guess it might be caused by some typo, which should not affect the algorithm correctness (as their validation loss looks all good). Looking forward to your comments, and wish all the best.

Some log info:
epoch: 58/384, train loss: 0.0207
Saving exp/scannetv2/pointgroup/pointgroup_run2_scannet/pointgroup_run2_scannet-000000058.pth
epoch: 59/384, train loss: 0.0062
Saving exp/scannetv2/pointgroup/pointgroup_run2_scannet/pointgroup_run2_scannet-000000059.pth
epoch: 60/384, train loss: -0.0002
Saving exp/scannetv2/pointgroup/pointgroup_run2_scannet/pointgroup_run2_scannet-000000060.pth
epoch: 61/384, train loss: -0.0171
Saving exp/scannetv2/pointgroup/pointgroup_run2_scannet/pointgroup_run2_scannet-000000061.pth
epoch: 62/384, train loss: -0.0477
Saving exp/scannetv2/pointgroup/pointgroup_run2_scannet/pointgroup_run2_scannet-000000062.pth
epoch: 63/384, train loss: -0.0609
Saving exp/scannetv2/pointgroup/pointgroup_run2_scannet/pointgroup_run2_scannet-000000063.pth
epoch: 64/384, train loss: -0.0646
Saving exp/scannetv2/pointgroup/pointgroup_run2_scannet/pointgroup_run2_scannet-000000064.pth
Start Evaluation: epoch: 64/384, val loss: 2.0182

Training Time

Hi Jiang,
It seems that it will take a long time to train the whole model on a single gpu. So I want to know how long it will take you to finish the training. And have you used any methods to speed up the training, like multi-gpu-training?
Thanks!

GPU

support model = torch.nn.DataParallel(model,device_ids=[0,1])? or how can I train on several gpus? please

RuntimeError: CUDA error: an illegal memory access was encountered

I keep getting into this problem, any ideas? It seems like it hits the wall every time.
Illegal memory access happens again when I do
These are the dependencies I have:

numpy        1.20.2
PG-OP        0.0.0
Pillow       8.2.0
pip          21.0.1
plyfile      0.7.4
protobuf     3.16.0
PyYAML       5.4.1
scipy        1.6.3
setuptools   52.0.0.post20210125
six          1.16.0
spconv       1.0
tensorboardX 2.2
torch        1.1.0
torchvision  0.3.0
wheel        0.36.2

With CUDA 10.2 and cudnn 7.6.5.
I was met with the following error.

/home/ubuntu/pointgroup-hs/PointGroup/util/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
[2021-05-08 11:11:43,432  INFO  log.py  line 40  14354]  ************************ Start Logging ************************
[2021-05-08 11:11:43,471  INFO  train.py  line 26  14354]  Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=0.09, batch_size=4, bg_thresh=0.25, block_reps=2, block_residual=True, classes=20, cluster_meanActive=50, cluster_npoint_thre=50, cluster_radius=0.03, cluster_shift_meanActive=300, config='config/pointgroup_run1_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', epochs=384, eval=True, exp_path='exp/scannetv2/pointgroup/pointgroup_run1_scannet', fg_thresh=0.75, filename_suffix='_inst_nostuff.pth', fix_module=[], full_scale=[128, 512], ignore_label=-100, input_channel=3, loss_weight=[1.0, 1.0, 1.0, 1.0], lr=0.001, m=16, manual_seed=123, max_npoint=250000, mode=4, model_dir='model/pointgroup/pointgroup.py', model_name='pointgroup', momentum=0.9, multiplier=0.5, optim='Adam', prepare_epochs=128, pretrain='', pretrain_module=[], pretrain_path=None, save_freq=16, save_instance=False, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=14, score_mode=4, score_scale=50, split='val', step_epoch=384, task='train', test_epoch=384, test_seed=567, test_workers=16, train_workers=16, use_coords=True, weight_decay=0.0001)
[2021-05-08 11:11:43,478  INFO  train.py  line 135  14354]  => creating model ...
[2021-05-08 11:11:43,610  INFO  train.py  line 147  14354]  cuda available: True
[2021-05-08 11:11:46,651  INFO  train.py  line 152  14354]  #classifier parameters: 7715016
[2021-05-08 11:12:34,348  INFO  scannetv2_inst.py  line 43  14354]  Training samples: 1201
[2021-05-08 11:12:46,605  INFO  scannetv2_inst.py  line 54  14354]  Validation samples: 312
[2021-05-08 11:12:46,665  INFO  utils.py  line 61  14354]  Restore from exp/scannetv2/pointgroup/pointgroup_run1_scannet/pointgroup_run1_scannet-000000001.pth
THCudaCheck FAIL file=/pytorch/aten/src/THC/THCCachingHostAllocator.cpp line=265 error=77 : an illegal memory access was encountered
Traceback (most recent call last):
  File "train.py", line 179, in <module>
    train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)
  File "train.py", line 54, in train_epoch
    loss, _, visual_dict, meter_dict = model_fn(batch, model, epoch)
  File "/home/ubuntu/pointgroup-hs/PointGroup/model/pointgroup/pointgroup.py", line 398, in model_fn
    ret = model(input_, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch)
  File "/home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/pointgroup-hs/PointGroup/model/pointgroup/pointgroup.py", line 264, in forward
    output = self.input_conv(input)
  File "/home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/spconv/modules.py", line 123, in forward
    input = module(input)
  File "/home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/spconv/conv.py", line 157, in forward
    outids.shape[0])
  File "/home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/spconv/functional.py", line 83, in forward
    return ops.indice_conv(features, filters, indice_pairs, indice_pair_num, num_activate_out, False, True)
  File "/home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/spconv/ops.py", line 112, in indice_conv
    int(inverse), int(subm))
RuntimeError: CUDA error: an illegal memory access was encountered (copy_to_cpu at /pytorch/aten/src/ATen/native/cuda/Copy.cu:199)
frame #0: std::function<std::string ()>::operator()() const + 0x11 (0x7ff926443441 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #1: c10::Error::Error(c10::SourceLocation, std::string const&) + 0x2a (0x7ff926442d7a in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libc10.so)
frame #2: (anonymous namespace)::copy_to_cpu(at::Tensor&, at::Tensor const&) + 0xa45 (0x7ff8c41b2a65 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #3: void (anonymous namespace)::_copy__cuda<int>(at::Tensor&, at::Tensor const&, bool) + 0x5ae (0x7ff8c425335e in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #4: at::native::_s_copy__cuda(at::Tensor&, at::Tensor const&, bool) + 0x378 (0x7ff8c41b45d8 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #5: at::native::_s_copy_from_cuda(at::Tensor const&, at::Tensor const&, bool) + 0x32 (0x7ff8c41b4c62 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #6: at::CUDAType::_s_copy_from(at::Tensor const&, at::Tensor const&, bool) const + 0xdd (0x7ff8c30bc78d in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2_gpu.so)
frame #7: at::native::_s_copy__cpu(at::Tensor&, at::Tensor const&, bool) + 0x5f (0x7ff8b8003e6f in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #8: <unknown function> + 0xb8cb9f (0x7ff8b82c5b9f in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #9: at::native::copy_(at::Tensor&, at::Tensor const&, bool) + 0x26d (0x7ff8b800333d in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #10: torch::autograd::VariableType::copy_(at::Tensor&, at::Tensor const&, bool) const + 0x629 (0x7ff92532cdc9 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #11: at::native::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) + 0x86c (0x7ff8b81459cc in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #12: at::TypeDefault::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) const + 0x17 (0x7ff8b83c4857 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libcaffe2.so)
frame #13: torch::autograd::VariableType::to(at::Tensor const&, c10::TensorOptions const&, bool, bool) const + 0x2c2 (0x7ff925102b52 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libtorch.so.1)
frame #14: at::Tensor spconv::indiceConv<float>(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long) + 0x1be (0x7ff912386efe in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/spconv/libspconv.so)
frame #15: void torch::jit::detail::callOperatorWithTuple<at::Tensor (* const)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long), at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long, 0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul>(c10::FunctionSchema const&, at::Tensor (* const&&)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long), std::vector<c10::IValue, std::allocator<c10::IValue> >&, std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long>&, torch::Indices<0ul, 1ul, 2ul, 3ul, 4ul, 5ul, 6ul>) + 0x267 (0x7ff91238e157 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/spconv/libspconv.so)
frame #16: std::_Function_handler<int (std::vector<c10::IValue, std::allocator<c10::IValue> >&), torch::jit::createOperator<at::Tensor (*)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long)>(std::string const&, at::Tensor (*&&)(at::Tensor, at::Tensor, at::Tensor, at::Tensor, long, long, long))::{lambda(std::vector<c10::IValue, std::allocator<c10::IValue> >&)#1}>::_M_invoke(std::_Any_data const&, std::vector<c10::IValue, std::allocator<c10::IValue> >&) + 0x61 (0x7ff91238e3c1 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/spconv/libspconv.so)
frame #17: <unknown function> + 0x3d93a5 (0x7ff926a353a5 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
frame #18: <unknown function> + 0x130fac (0x7ff92678cfac in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libtorch_python.so)
<omitting python frames>
frame #26: THPFunction_apply(_object*, _object*) + 0x6b1 (0x7ff926a10301 in /home/ubuntu/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/lib/libtorch_python.so)

So.. can I know if which part is wrong?

Evaluating the pre-trained model on ScanNet test set yield NaN results.

This is what I encounter if I run CUDA_VISIBLE_DEVICES=0 python test.py --config config/pointgroup_default_scannet.yaml --pretrain pointgroup.pth:

/app/util/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
[2021-04-19 10:14:10,618  INFO  log.py  line 40  36]  ************************ Start Logging ************************
[2021-04-19 10:14:10,630  INFO  test.py  line 32  36]  Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=0.09, batch_size=4, bg_thresh=0.25, block_reps=2, block_residual=True, classes=20, cluster_meanActive=50, cluster_npoint_thre=50, cluster_radius=0.03, cluster_shift_meanActive=300, config='config/pointgroup_default_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', epochs=384, eval=True, exp_path='exp/scannetv2/pointgroup/pointgroup_default_scannet', fg_thresh=0.75, filename_suffix='_inst_nostuff.pth', fix_module=[], full_scale=[128, 512], ignore_label=-100, input_channel=3, loss_weight=[1.0, 1.0, 1.0, 1.0], lr=0.001, m=16, manual_seed=123, max_npoint=250000, mode=4, model_dir='model/pointgroup/pointgroup.py', model_name='pointgroup', momentum=0.9, multiplier=0.5, optim='Adam', prepare_epochs=128, pretrain='pointgroup.pth', pretrain_module=[], pretrain_path=None, save_freq=16, save_instance=False, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=14, score_mode=4, score_scale=50, split='val', step_epoch=384, task='test', test_epoch=384, test_seed=567, test_workers=16, train_workers=16, use_coords=True, weight_decay=0.0001)
[2021-04-19 10:14:10,631  INFO  test.py  line 188  36]  => creating model ...
[2021-04-19 10:14:10,631  INFO  test.py  line 189  36]  Classes: 20
[2021-04-19 10:14:11,659  INFO  test.py  line 200  36]  cuda available: True
[2021-04-19 10:25:29,652  INFO  test.py  line 205  36]  #classifier parameters (model): 7715016
[2021-04-19 10:25:29,661  INFO  utils.py  line 61  36]  Restore from pointgroup.pth
[2021-04-19 10:25:29,762  INFO  test.py  line 41  36]  >>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>
[2021-04-19 10:25:29,844  INFO  scannetv2_inst.py  line 65  36]  Testing samples (val): 0
/app/util/eval.py:190: RuntimeWarning: Mean of empty slice
  avg_dict['all_ap']     = np.nanmean(aps[ d_inf,:,oAllBut25])
/app/util/eval.py:191: RuntimeWarning: Mean of empty slice
  avg_dict['all_ap_50%'] = np.nanmean(aps[ d_inf,:,o50])
/app/util/eval.py:192: RuntimeWarning: Mean of empty slice
  avg_dict['all_ap_25%'] = np.nanmean(aps[ d_inf,:,o25])
[2021-04-19 10:25:30,212  INFO  eval.py  line 274  36]  
[2021-04-19 10:25:30,212  INFO  eval.py  line 275  36]  ################################################################
[2021-04-19 10:25:30,212  INFO  eval.py  line 281  36]  what           :             AP         AP_50%         AP_25%
[2021-04-19 10:25:30,212  INFO  eval.py  line 282  36]  ################################################################
[2021-04-19 10:25:30,212  INFO  eval.py  line 292  36]  cabinet        :            nan            nan            nan
[2021-04-19 10:25:30,212  INFO  eval.py  line 292  36]  bed            :            nan            nan            nan
[2021-04-19 10:25:30,212  INFO  eval.py  line 292  36]  chair          :            nan            nan            nan
[2021-04-19 10:25:30,212  INFO  eval.py  line 292  36]  sofa           :            nan            nan            nan
[2021-04-19 10:25:30,212  INFO  eval.py  line 292  36]  table          :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  door           :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  window         :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  bookshelf      :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  picture        :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  counter        :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  desk           :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  curtain        :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  refrigerator   :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  shower curtain :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  toilet         :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  sink           :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  bathtub        :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 292  36]  otherfurniture :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 298  36]  ----------------------------------------------------------------
[2021-04-19 10:25:30,213  INFO  eval.py  line 303  36]  average        :            nan            nan            nan
[2021-04-19 10:25:30,213  INFO  eval.py  line 304  36]  

grad_output.continuous()

I notice that the author @llijiang has changed "grad_output" to "grad_output.continuous()" in spconv's "functional.py" file. I am using SparseConvnet instead of spconv. Does anybody know what would be the equivalent change to SparseConvnet code?
Thanks.

Crash when no proposals are found

Hey @llijiang !
Thanks for sharing this implementation with us, it really helped me for my project.
I noticed that the training crashes when no proposals are found before calling the clusters_voxelization, especially the crash occurs at the following line as an empty tensor is given :
https://github.com/Jia-Research-Lab/PointGroup/blob/a41c2cd22808861f0d4986b2825875c10bb661e5/model/pointgroup/pointgroup.py#L223

Printed Error : RuntimeError: cannot perform reduction function max on tensor with no elements because the operation does not have an identity

Have you ever experienced this issue ? If yes, what are your recommendation to overcome it ?

Thanks,
Chayan

Training with own data

Hi, thank you for giving me new insights into my research.

I'm trying to train with my own point cloud data (with only 1 Channel not RGB) but I couldn't find where should be modified.

Actually, I don't have scannet data right now, So I can't survey structure of scannet data.

So, Could you let me know proper form of data (label) structure and where should be modified in the codes to train only 1 channel data?

Error when building PG_OP

Hi all,

I have met some error when I compiled pointgroup_ops library. The error is:

$ python setup.py develop                                                              
running develop
running egg_info
writing PG_OP.egg-info/PKG-INFO
writing dependency_links to PG_OP.egg-info/dependency_links.txt
writing top-level names to PG_OP.egg-info/top_level.txt
reading manifest file 'PG_OP.egg-info/SOURCES.txt'
writing manifest file 'PG_OP.egg-info/SOURCES.txt'
running build_ext
Traceback (most recent call last):
  File "setup.py", line 14, in <module>
    cmdclass={'build_ext': BuildExtension}
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/site-packages/setuptools/__init__.py", line 153, in setup
    return distutils.core.setup(**attrs)
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/distutils/core.py", line 148, in setup
    dist.run_commands()
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/distutils/dist.py", line 966, in run_commands
    self.run_command(cmd)
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/site-packages/setuptools/command/develop.py", line 34, in run
    self.install_for_development()
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/site-packages/setuptools/command/develop.py", line 136, in install_for_development
    self.run_command('build_ext')
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/distutils/cmd.py", line 313, in run_command
    self.distribution.run_command(command)
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/distutils/dist.py", line 985, in run_command
    cmd_obj.run()
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/site-packages/setuptools/command/build_ext.py", line 79, in run
    _build_ext.run(self)
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/distutils/command/build_ext.py", line 340, in run
    self.build_extensions()
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 232, in build_extensions
    self._check_abi()
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 370, in _check_abi
    check_compiler_abi_compatibility(compiler)
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 162, in check_compiler_abi_compatibility
    if not check_compiler_ok_for_platform(compiler):
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/site-packages/torch/utils/cpp_extension.py", line 138, in check_compiler_ok_for_platform
    which = subprocess.check_output(['which', compiler], stderr=subprocess.STDOUT)
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/subprocess.py", line 411, in check_output
    **kwargs).stdout
  File "/home/helinxu/anaconda3/envs/partnet/lib/python3.7/subprocess.py", line 512, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['which', '/home/helinxu/anaconda3/envs/partnet/bin/x86_64-conda_cos6-linux-gnu-c++']' returned non-zero exit status 1.

I've tried everything I could find, including conda install gxx_linux-64 which led to another error.

Thanks in advance!

CUDNN_STATUS_EXECUTION_FAILED

i run the traning code and meet a bug as follows:

RuntimeError: cuDNN error: CUDNN_STATUS_EXECUTION_FAILED

image

when i add a line : torch.backends.cudnn.enabled = False in the training code , it runs ok.

what's wrong ? I have tried pytorch1.1+cuda9.0 and pytorch1.4+cuda10.1 , both the same error.

the total training time is predicted to be 53 hours accoding to the print information , and GPU memory consumption is skipping between 2000M~4500M ,is this normal?

Thanks!

how much gpu should i need to train the model?

RuntimeError: CUDA out of memory. Tried to allocate 93.50 GiB (GPU 0; 7.79 GiB total capacity; 31.26 MiB already allocated; 6.08 GiB free; 12.74 MiB cached) (malloc at /pytorch/c10/cuda/CUDACachingAllocator.cpp:267)
i have change the bs to 1, but i still meet the problem that the cyda out of memory.

Support PyTorch=1.5

Hi.
Currently the pointgroup library doesn't compile correctly in Pytorch 1.5.
I get the following errors when i try to compile

In file included from src/pointgroup_ops.cpp:8:0:
src/bfs_cluster/bfs_cluster.cpp:32:58: error: ‘THCState_getCurrentStream’ was not declared in this scope

src/bfs_cluster/bfs_cluster.cpp:9:80: error: ‘AT_CHECK’ was not declared in this scope

These two errors can be easily solved and the pointgroup library can be made compatible with PyTorch 1.5. In bfs_cluster.cpp following changes need to be made:

  1. Replace THCState_getCurrentStream with at::cuda::getCurrentCUDAStream().
    https://stackoverflow.com/questions/55919123/cuda-for-pytorch-cuda-c-stream-and-state
  2. Replace AT_CHECK with TORCH_CHECK.

With the above mentioned changes, i was able to compile Pointgroup library with PyTorch 1.5, and Cuda 10.2.
Thanks.

cuda memory problem when apply on outdoor scene

hello, Im a staff of a research institute, I read your paper and find its a great model for indoor instance segmentation. when I try to apply it to outdoor and road pointcloud data which box(space size) of scene is about 200~300meter, It runs several problem. the fatal one is when I set scale as big as indoor you set 50(voxel size 1/50), cuda memory will out, and when I run with scale = 1, the result will performance bad and low pixel. I try to crop our scene in a samll room, but lost Semantic information and performance not well neither. could you give some advice , if I solve this problem and apply in outdoor data, Im happy to share it on github if you are willing too. Thanks

why predict_masks txt files are all 0

I tested the given pretrained model with Scannetv2 dataset, setting save_instance=True. However, the content of all the generated txt files are 0, with no exception.

Where are the instance-segmentated results of the input point cloud?

fnv-1 hash definition

I found the following code snippet in 'datatype.h':

template struct IntArrayHash{
std::size_t operator()(Point const &p) const{
Int hash = 16777619;
for(auto x : p){
hash *= 2166136261;
hash ^= x;
}
return hash;
}
};

According to the definition of fnv-1 hash, shouldn't the offset-basis be 2166136261 and prime be 16777619?

RuntimeError: cublas runtime error

Hi all,

Did anyone use 2080 Ti for this project?

environment

  • 2080 Ti
  • python 3.7.0
  • pytorch 1.1.0
  • CUDA 9.0
  • gcc 5.4

I set up the project exactly as suggested in README, prepared the data properly, but received this error when running the default train scripy:

$ CUDA_VISIBLE_DEVICES=0 python train.py --config config/pointgroup_run1_scannet.yaml 

/data2/helin/1/PointGroup/util/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
  config = yaml.load(f)
[2021-09-04 20:51:03,613  INFO  log.py  line 40  907024]  ************************ Start Logging ************************
[2021-09-04 20:51:03,641  INFO  train.py  line 26  907024]  Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=0.09, batch_size=4, bg_thresh=0.25, block_reps=2, block_residual=True, classes=22, cluster_meanActive=50, cluster_npoint_thre=50, cluster_radius=0.03, cluster_shift_meanActive=300, config='config/pointgroup_run1_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', epochs=384, eval=True, exp_path='exp/scannetv2/pointgroup/pointgroup_run1_scannet', fg_thresh=0.75, filename_suffix='_inst_nostuff.pth', fix_module=[], full_scale=[128, 512], ignore_label=-100, input_channel=3, loss_weight=[1.0, 1.0, 1.0, 1.0], lr=0.001, m=16, manual_seed=123, max_npoint=250000, mode=4, model_dir='model/pointgroup/pointgroup.py', model_name='pointgroup', momentum=0.9, multiplier=0.5, optim='Adam', prepare_epochs=128, pretrain='', pretrain_module=[], pretrain_path=None, save_freq=16, save_instance=False, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=14, score_mode=4, score_scale=50, split='val', step_epoch=384, task='train', test_epoch=384, test_seed=567, test_workers=16, train_workers=16, use_coords=True, weight_decay=0.0001)
[2021-09-04 20:51:03,648  INFO  train.py  line 135  907024]  => creating model ...
[2021-09-04 20:51:03,946  INFO  train.py  line 148  907024]  cuda available: False
[2021-09-04 20:51:08,453  INFO  train.py  line 153  907024]  #classifier parameters: 7715050
[2021-09-04 20:51:08,759  INFO  scannetv2_inst.py  line 43  907024]  Training samples: 4
[2021-09-04 20:51:08,784  INFO  scannetv2_inst.py  line 54  907024]  Validation samples: 1
Traceback (most recent call last):
  File "train.py", line 180, in <module>
    train_epoch(dataset.train_data_loader, model, model_fn, optimizer, epoch)
  File "train.py", line 54, in train_epoch
    loss, _, visual_dict, meter_dict = model_fn(batch, model, epoch)
  File "/data2/helin/1/PointGroup/model/pointgroup/pointgroup.py", line 398, in model_fn
    ret = model(input_, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch)
  File "/home/helinxu/anaconda3/envs/pointgroup2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/data2/helin/1/PointGroup/model/pointgroup/pointgroup.py", line 264, in forward
    output = self.input_conv(input)
  File "/home/helinxu/anaconda3/envs/pointgroup2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/helinxu/anaconda3/envs/pointgroup2/lib/python3.7/site-packages/spconv/modules.py", line 123, in forward
    input = module(input)
  File "/home/helinxu/anaconda3/envs/pointgroup2/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/helinxu/anaconda3/envs/pointgroup2/lib/python3.7/site-packages/spconv/conv.py", line 157, in forward
    outids.shape[0])
  File "/home/helinxu/anaconda3/envs/pointgroup2/lib/python3.7/site-packages/spconv/functional.py", line 83, in forward
    return ops.indice_conv(features, filters, indice_pairs, indice_pair_num, num_activate_out, False, True)
  File "/home/helinxu/anaconda3/envs/pointgroup2/lib/python3.7/site-packages/spconv/ops.py", line 112, in indice_conv
    int(inverse), int(subm))
RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-cbsmv48q/aten/src/THC/THCBlas.cu:259

I looked up the error message RuntimeError: cublas runtime error : the GPU program failed to execute at /tmp/pip-req-build-cbsmv48q/aten/src/THC/THCBlas.cu:259 a bit, and I suppose it has something to do with the GPU version I am using, which is 2080 Ti. (However, I'm not quit sure about that, it may have been caused by other reasons.)

Could anyone give me some suggestions? Plus, what GPU are you using to get it through?

Thanks!

RuntimeError: cuda runtime error (9)

Dear Li Jiang,

Thanks a lot for your great work!
I successfully ran the training code, but when I run "python test.py --config config/pointgroup_run1_scannet.yaml", I got an error as following:

/cluster/scratch/bxiang/PointGroup/util/config.py:20: YAMLLoadWarning: calling yaml.load() without Loader=... is deprecated, as the default Loader is unsafe. Please read https://msg.pyyaml.org/load for full details.
config = yaml.load(f)
[2020-12-12 23:02:48,713 INFO log.py line 40 12439] ************************ Start Logging ************************
[2020-12-12 23:02:49,155 INFO test.py line 32 12439] Namespace(TEST_NMS_THRESH=0.3, TEST_NPOINT_THRESH=100, TEST_SCORE_THRESH=0.09, batch_size=4, bg_thresh=0.25, block_reps=2, block_residual=True, classes=20, cluster_meanActive=50, cluster_npoint_thre=50, cluster_radius=0.03, cluster_shift_meanActive=300, config='config/pointgroup_run1_scannet.yaml', data_root='dataset', dataset='scannetv2', dataset_dir='data/scannetv2_inst.py', epochs=384, eval=True, exp_path='exp/scannetv2/pointgroup/pointgroup_run1_scannet', fg_thresh=0.75, filename_suffix='inst_nostuff.pth', fix_module=[], full_scale=[128, 512], ignore_label=-100, input_channel=3, loss_weight=[1.0, 1.0, 1.0, 1.0], lr=0.001, m=16, manual_seed=123, max_npoint=250000, mode=4, model_dir='model/pointgroup/pointgroup.py', model_name='pointgroup', momentum=0.9, multiplier=0.5, optim='Adam', prepare_epochs=128, pretrain='', pretrain_module=[], pretrain_path=None, save_freq=16, save_instance=False, save_pt_offsets=False, save_semantic=False, scale=50, score_fullscale=14, score_mode=4, score_scale=50, split='val', step_epoch=384, task='test', test_epoch=182, test_seed=567, test_workers=16, train_workers=16, use_coords=True, weight_decay=0.0001)
[2020-12-12 23:02:49,158 INFO test.py line 188 12439] => creating model ...
[2020-12-12 23:02:49,159 INFO test.py line 189 12439] Classes: 20
[2020-12-12 23:02:49,791 INFO test.py line 200 12439] cuda available: True
[2020-12-12 23:02:57,609 INFO test.py line 205 12439] #classifier parameters (model): 7715016
[2020-12-12 23:02:57,650 INFO utils.py line 63 12439] Restore from exp/scannetv2/pointgroup/pointgroup_run1_scannet/pointgroup_run1_scannet-000000182.pth
[2020-12-12 23:02:58,077 INFO test.py line 41 12439] >>>>>>>>>>>>>>>> Start Evaluation >>>>>>>>>>>>>>>>
[2020-12-12 23:03:56,509 INFO scannetv2_inst.py line 65 12439] Testing samples (val): 312
exp/scannetv2/pointgroup/pointgroup_run1_scannet
pointgroup_run1_scannet
[2020-12-12 23:04:02,410 INFO test.py line 157 12439] instance iter: 1/312 point_num: 237360 ncluster: 25 time: total 5.90s inference 0.68s save 0.00s
[2020-12-12 23:04:03,179 INFO test.py line 157 12439] instance iter: 2/312 point_num: 239261 ncluster: 27 time: total 0.77s inference 0.51s save 0.00s
[2020-12-12 23:04:03,852 INFO test.py line 157 12439] instance iter: 3/312 point_num: 217086 ncluster: 24 time: total 0.67s inference 0.41s save 0.00s
...........
...........
...........
[2020-12-12 23:05:44,345 INFO test.py line 157 12439] instance iter: 253/312 point_num: 100286 ncluster: 11 time: total 0.43s inference 0.21s save 0.00s
[2020-12-12 23:05:44,831 INFO test.py line 157 12439] instance iter: 254/312 point_num: 284951 ncluster: 20 time: total 0.49s inference 0.32s save 0.00s
[2020-12-12 23:05:45,443 INFO test.py line 157 12439] instance iter: 255/312 point_num: 331565 ncluster: 19 time: total 0.61s inference 0.36s save 0.00s
[2020-12-12 23:05:45,739 INFO test.py line 157 12439] instance iter: 256/312 point_num: 139138 ncluster: 9 time: total 0.30s inference 0.21s save 0.00s
[2020-12-12 23:05:46,136 INFO test.py line 157 12439] instance iter: 257/312 point_num: 159475 ncluster: 24 time: total 0.40s inference 0.22s save 0.00s
[2020-12-12 23:05:46,485 INFO test.py line 157 12439] instance iter: 258/312 point_num: 133857 ncluster: 10 time: total 0.35s inference 0.23s save 0.00s
[2020-12-12 23:05:46,776 INFO test.py line 157 12439] instance iter: 259/312 point_num: 129951 ncluster: 6 time: total 0.29s inference 0.22s save 0.00s
[2020-12-12 23:05:47,052 INFO test.py line 157 12439] instance iter: 260/312 point_num: 110960 ncluster: 7 time: total 0.28s inference 0.22s save 0.00s
[2020-12-12 23:05:47,452 INFO test.py line 157 12439] instance iter: 261/312 point_num: 184428 ncluster: 14 time: total 0.40s inference 0.30s save 0.00s
[2020-12-12 23:05:47,903 INFO test.py line 157 12439] instance iter: 262/312 point_num: 258402 ncluster: 13 time: total 0.45s inference 0.29s save 0.00s
THCudaCheck FAIL file=/pytorch/aten/src/THC/generic/THCTensorMath.cu line=35 error=9 : invalid configuration argument
Traceback (most recent call last):
File "test.py", line 214, in
test(model, model_fn, data_name, cfg.test_epoch)
File "test.py", line 63, in test
preds = model_fn(batch, model, epoch)
File "/cluster/scratch/bxiang/PointGroup/model/pointgroup/pointgroup.py", line 351, in test_model_fn
ret = model(input
, p2v_map, coords_float, coords[:, 0].int(), batch_offsets, epoch)
File "/cluster/home/bxiang/miniconda3/envs/pointgroup/lib/python3.7/site-packages/torch/nn/modules/module.py", line 493, in call
result = self.forward(*input, **kwargs)
File "/cluster/scratch/bxiang/PointGroup/model/pointgroup/pointgroup.py", line 310, in forward
input_feats, inp_map = self.clusters_voxelization(proposals_idx, proposals_offset, output_feats, coords, self.score_fullscale, self.score_scale, self.mode)
File "/cluster/scratch/bxiang/PointGroup/model/pointgroup/pointgroup.py", line 220, in clusters_voxelization
clusters_coords_min = pointgroup_ops.sec_min(clusters_coords, clusters_offset.cuda()) # (nCluster, 3), float
File "/cluster/scratch/bxiang/PointGroup/lib/pointgroup_ops/functions/pointgroup_ops.py", line 299, in forward
out = torch.cuda.FloatTensor(nProposal, C).zero_()
RuntimeError: cuda runtime error (9) : invalid configuration argument at /pytorch/aten/src/THC/generic/THCTensorMath.cu:35

Could you please give me some idea about what is the problem and how to fix it? Thanks a lot!

Link error while building pointgroup_ops library

After crossing a few hurdles, I have reached a stage on Windows where all the three files listed in setup.py are compiling fine. Now, only the following link error remains. @llijiang or others: Any idea about the cause/fix for this?
Thanks!

Creating library build\temp.win-amd64-3.7\Release\src\PG_OP.cp37-win_amd64.lib and object build\temp.win-amd64-3.7\Release\src\PG_OP.cp37-win_amd64.exp
pointgroup_ops.obj : error LNK2001: unresolved external symbol "public: long * __cdecl at::Tensor::data_ptr<long>(void)const " (??$data_ptr@J@Tensor@at@@QEBAPEAJXZ)
build\lib.win-amd64-3.7\PG_OP.cp37-win_amd64.pyd : fatal error LNK1120: 1 unresolved externals

Support spconv=1.1 ?

Hi.
I was wondering if you are planning to add support for spconv 1.1. Spconv seems to compile smoothly for Python3.7, Cuda 10.2 and Pytorch 1.5.
Spconv 1.0 doesn't seem to work with Pytorch 1.1.
Perhaps the changes in the spconv/functional.py code are sufficient to ensure compatibility with PointGroup code.
Thanks.

Query Loading .pcd files for Dataset

Hi,

I am quite new to this field (undergrad). From what I can tell, the result of all the four ScanNet files that are needed are loading xyz, rgb, labels and instance.

I would like to make my own dataset using Hitatchi Automotive Labs editor, which produces labelled .pcd files (which are basically an array of x, y, z, r, g, b, label, instance) - seems perfect to go into PointGroup. However, I'm a bit lost in the whole code structure.

Would appreciate if anyone could tell me where/how I could do away with the ScanNet format and load .pcd files instead?

Thank you for your response.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.