Git Product home page Git Product logo

pct's Introduction

PCT: Point Cloud Transformer

This is a Jittor implementation of PCT: Point Cloud Transformer.

Paper link: https://arxiv.org/pdf/2012.09688.pdf

News :

  • 2021.3.31 : We try to add simple position embedding in each self-attention layer, we get a more stable training process and 93.3% (5 run best) accuracy on modelnet40 dataset. Code updates in classification network.
  • 2021.3.29 : PCT has been accepted by Computational Visual Media Journal (CVMJ).

Astract

The irregular domain and lack of ordering make it challenging to design deep neural networks for point cloud processing. This paper presents a novel framework named Point Cloud Transformer(PCT) for point cloud learning. PCT is based on Transformer, which achieves huge success in natural language processing and displays great potential in image processing. It is inherently permutation invariant for processing a sequence of points, making it well-suited for point cloud learning. To better capture local context within the point cloud, we enhance input embedding with the support of farthest point sampling and nearest neighbor search. Extensive experiments demonstrate that the PCT achieves the state-of-the-art performance on shape classification, part segmentation and normal estimation tasks

image

Architecture

image

Jittor

Jittor is a high-performance deep learning framework which is easy to learn and use. It provides interfaces like Pytorch.

You can learn how to use Jittor in following links:

Jittor homepage: https://cg.cs.tsinghua.edu.cn/jittor/

Jittor github: https://github.com/Jittor/jittor

If you has any questions about Jittor, you can ask in Jittor developer QQ Group: 761222083

Other implementation

Version 1 : https://github.com/Strawberry-Eat-Mango/PCT_Pytorch (Pytorch version with classification acc 93.2% on ModelNet40)
Version 2 : https://github.com/qq456cvb/Point-Transformers (Pytorch version with classification acc 92.6% on ModelNet40)

About part segmentation, if you want to reproduce the part segmentation results, you can refer this : https://github.com/AnTao97/dgcnn.pytorch

Citation

If it is helpful for your work, please cite this paper:

@article{Guo_2021,
   title={PCT: Point cloud transformer},
   volume={7},
   ISSN={2096-0662},
   url={http://dx.doi.org/10.1007/s41095-021-0229-5},
   DOI={10.1007/s41095-021-0229-5},
   number={2},
   journal={Computational Visual Media},
   publisher={Springer Science and Business Media LLC},
   author={Guo, Meng-Hao and Cai, Jun-Xiong and Liu, Zheng-Ning and Mu, Tai-Jiang and Martin, Ralph R. and Hu, Shi-Min},
   year={2021},
   month={Apr},
   pages={187–199}
}

pct's People

Contributors

menghaoguo avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

pct's Issues

Question about weight value of key and query in SA Layer

Hello,

I really appreciate to your creative work.

However, I hope to know why did you use same weight value when initialize the q_conv and k_conv kernel?

As I know, they don't have to be identical.

Is there any reason?

Thank you for your work again.

image

The parameter of the CosineAnnealingLR

Hi, thank you for releasing the code! However, I could not find the details of the CosineAnnealingLR. Could you tell me how to set T_max and other attributes for this scheduler?

关于seg文件中似乎没有local feature representation的问题

Dear MenghaoGuo,

感谢你的代码,我在你的文章中的3.4节看到local feature representation,但是似乎这个操作只有cls里面有,seg文件中并没有文章提到的SG layer。
是分割任务中不需要local feature吗,还是global feature就已经能胜任分割任务?
希望您解答我的问题。

Thanks again!

Question about training

Hello Guo! I found that much more epochs were needed in part_seg task when I replaced the model in Pointnet++ with PCT. (Pointnet++ code from https://github.com/yanx27/Pointnet_Pointnet2_pytorch)

Pointnet++ needs less than 50 epochs while PCT needs more than 500 epochs. Is this caused by model complexity? Have you found same question when training?

I want to know how you visualize your attention map

Hi,
First of all, I want to thank you for the proposed method , which benefited me a lot. So I reproduce your code by pytorch and tried to visualize the attention map in part segmentation task. but when I want to use the right wing as query point, it can't attention the left wing like what you visualize on the paper. So I want to know how you show the visualization result like the paper.

In addition, other issue point out the dimension of softmax is wrong because your multiplication is Value*Attention, so I think the dimension of softmax in Attention would be 1, not -1(or 2), please correct me if there is any mistakes. And also the dimension of softmax and L1 norm is different(softmax is -1 but L1-Norm is 1), why?
Line 211: self.softmax = nn.Softmax(dim=-1)
Line 220: attention = attention / (1e-9 + attention.sum(dim=1, keepdims=True))

Also, I want to know how you do neighbor embedding in part segmentation. the paper said the number of output points is N, which means you didn't sampling the point and also do SG(sampling and grouping) module twice. but when I reproduce the same method, I got cuda out of memory in RTX 2080Ti(VRAM:12G). Is my VRAM not big enough or I have any problem with the understanding of the paper discription?

I'm looking forward to your reply, and thank you for your contribution.

What is `self.pos_xyz(xyz)`

Hello, thanks for your work!
I am reading the code but get confused about some details.
In the class Point_Transformer_Last, the self.pos_xyz is not defined in the section.
What does the operation specifically do and where can I find the definition of it?
Appreciate!

Queries about permutation invariance of Transformer

Hi Menghao,

Thank you for sharing the interesting paper!

I have some queries about the definition of the permutation invariance. In my opinion, though the self-attention calculates the global contextual information and aggregate the feature via weighted summation, the resulting features are still related to the point order (but the feature of the same point seems to be invariant). And I also observe the implementation of max-pooling strategy, which is proposed in PointNet to guarantee invariance.

So I wonder how you define the permutation invariance, because it seems the attention itself can hardly gurantee the global invanriance. Thank you very much!

Rui

local feature

i found there haven't any describe about how the local feature embedding layer combine with the SA layer, what is the final PCT architecture like?

The performance of segmentation

Thank you for sharing model code selflessly! Recently I have reproduced the segmentation based on provided seg code, however the class mean IoU is only 78.8%. Could you please offer some suggestions?

About the license

Dear authors,

Thank you for sharing your code and contributions. I would like to use your code in my project, while there is no LICENSE present in your repository. May I ask in what ways we could use or modify the code, and it wouild be great if you could add the license to the repository.

Thank you so much!

Reshape operation in Local_op

In Local_op module, it seems like you reshaped the 4-dimension feature from sample_and_group module [batch, npoint, nsample, features] to [batch * npoint, features, nsample]. Then feed it into 1d convolution. After that, you keep the biggst feature as [batch * npoint, features, 1], and reshape it to [batch, features, npoint].
In my opinion, this is the most significant difference between you and other pointnet series articles. Could you explain the effectiveness of this?

Segmentation model detail

Hello, thank you for releasing the code!
I was wondering if you could tell me the model size (MB) and the number of parameters for PCT-2L and PCT-3L?
Thank you!

About SA_Layer

When the convolution weight of q and k are initialized equal,
self.q_conv.conv.weight = self.k_conv.conv.weight
will it cause q and k to be always the same when updated?

Visualization

Thanks for your great job!
Here, I still do not understand the steps you mentioned: (1) Choose query point I.
(2)Convert the value of attention A[i,j] to the depth of color and save both position and color of all point j as .txt file
(3)Open the saved .txt file by meshlab to render point cloud.

Could you please share your visualization.py?

Many Thanks!

About Segmentation Classification Label

Thanks for your open source.
I recently use your code to train for a teeth segmentation, I want to seg a whole teeth(contains gum and teeth) to 4 part separate teeth. Because teeth is just one classification, can I abandon your one-hot label conversion part to train? But now, the IOU is only 80%, can you give me any suggestions?

THANKS!

bugs

when run pct

self.sa1 = SA_Layer(channels)
  File "/home/ssd/zjl/zjl/point/PointCloudLib/networks/cls/pct.py", line 205, in __init__
    self.q_conv.conv.weight = self.k_conv.conv.weight
AttributeError: 'Conv1d' object has no attribute 'conv'

Model Question

Hi, thanks for your wonderful work.
And I have some questions about models in /networks/cls/pct.py.

  1. Does Point_Transformer on 86 line corresponds to a model without neighbor embedding?
  2. Does Point_Transformer2 on 34 line correspond to the final model of the paper?
  3. pos_xyz corresponds to positional embedding? and why we ues it?

Maybe I have a wrong understanding and you can give me some advice? Thanks!

Use of bias in value layer

Hello,

very interesting paper, and nice to publish parts of the code along with it!

A couple of questions:

  • I was wondering why the layer calculating the values self.v_conv has a bias attached to it. Looking at other 'attention' implementations, it seems that those mostly exclude bias from it (as you also do for the keys and queries). Did you see any improvement adding a bias there?
  • Is there any reason for setting the initial weights of the key and query layer equally?
  • In the paper, you mention making use of Farthest Point Sampling (FPS) for the neighbor embedding module, but before you sample you embed the pure 3-dimensional point-coordinates in a high-dimensional space. Do you perform FPS in the full 64-dimensional space, or do you do this in the 3-dimensional one?

Kind regards,
steven

the code about attention map

Thank you very much for your work. Could you please provide me with the code for drawing the attention map, like this

Questions about segmentation

Thanks for your sharing! I benefit a lot from the code. However, I also meet some problems.
Based on the released segmentation code, I conducted the experiment on the S3DIS dataset and evaluated the network on Area5. The training pipeline is following the train_semseg.py
I got the result that the training_acc is 94.3% and the test_mIoU only achieve 54.3%. The result is lower than that in the paper about 7%. I have the following question.

  1. I used the SGD optimizer with LR = 0.01 and used the CosineAnnealing strategy to adjust the LR every epoch. I also used the random scale and rotation around the z-axis data argument. Did I use the correct training trick?
  2. When I experiment with the ShapePartNet dataset for part-segmentation tasks, I found that testing with a multi-scale strategy brings about a 5% improvement. I wonder that did you use the multi-scale test strategy on Area5 for the semantic segmentation?
  3. I also found that the experiment result of DGCNN on Table4 is from 6-fold cross-validation but not Area5.

partseg and semseg

hi.I had some problems reproducing your paper.Can you release the complete code?The partseg and semseg I reproduced with pytorch are bad.

Pre-trained models

Hi,

amazing work and great results, thanks for making it available here! I was wondering whether you plan to release the pre-trained models? I work on robotic grasping and it would be interesting to see how your architecture performs for this task compared to other state-of-the-art models.

The question of training

I'm very sorry to bother you.
I trained the PCT in a 2080Ti,but I can‘t obtain the accuracy to 93.2 and the result is greatly waving.
I only obtain the accuracy to 92.8.
Do you have other particular parameter?
Best regard,

about a real example code

Hi, @MenghaoGuo ,

Thanks for releasing the package. The current package only provides the code for network construction. Could you provide a real example code for point cloud classification or segmentation?

Thanks~

What is cls_label in parseg Network?

I am trying to run simple optimization using this segmentation netowrk, but I cannot fully understand what cls_label parameter in partseg network.

Do I have to run classification before conducting segmentation or something like that?

Thank you.

PCT code for the PartSeg

Hi,

Thanks for releasing the code. Can you release the PCT code for the PartSeg or simply show the parameters of each layer, please?

Best

How is the Global Feature concatenated with Point Feature in segmentation?

Hi, thanks for your great work!
However, I have a problem with the concatenation in segmentation. As shown in Fig. 2, the global feature is obtained by repeating its previous vector. It should be (batch_size, 1024, point_num), point_num may be 1024 in the modelnet. While the point feature should be (batch_sie, 1024, sampled_point_num). For example, the sample_point_num is 256 according to the implementation at https://github.com/Strawberry-Eat-Mango/PCT_Pytorch.
So how can these two features be concatenated? Or do we just segment those sampled points?

Feature map

Thank you very much for your work. I would like to ask how to generate the feature map of the point cloud. If it is convenient, can you provide an example?
thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.