In my use case, I need to reproject the points from the camera pixels to the world ref

The sign I mentioned affects just the z axis. The <a href="https://github.com/graphdec

If we put <math-renderer class="js-inline-math" style="display: inline" data-static-ur

Question on the cameras and the projection matrices about gaussian-splatting HOT 9 CLOSED

MatteoFusconi commented on July 1, 2024

Question on the cameras and the projection matrices

from gaussian-splatting.

Comments (9)

PanagiotisP commented on July 1, 2024 1

Camera models are always a pain to get right. I haven't bothered about the intrinsic at all, so I cannot help you with that. Regarding the world to pixel space though I think I can help.
First of all, you have the world_view_transform that simply transforms points from world to view/camera space.
Then you have the projection_matrix that transforms points from view/camera to NDC space.
The full_proj_transform is just the combination of the two. It transforms a point from world to NDC space.

Regarding the getProjectionMatrix, the code uses the OpenGL projection matrix. A very nice blog about that can be found here. However, there are two differences with the matrix shown on that blog. The first is that the used sign is the positive one (so entries [2,2] and [3,2] have a positive sign). Also, instead of using a cube from [-1, 1] in all coordinates for the NDC space, the z coordinate spans just [0,1]. So at entry [2,2] instead of (f + n) / (f - n), you have f / (f - n).
As for the 4x4 shape, it is as such to handle homogenous coordinates.

I also want to point out that OpenGL uses column-major matrices, so they might be the transposed version of what you would expect (this is covered in the blog too).
Hope I helped

from gaussian-splatting.

lihao2333 commented on July 1, 2024 1

I find a way to understand your projection matrix.
Opengl projection matrix is response for mapping frustum into NDC cube and let $w_n=-z_c$, that is:

x: [l, r] -> [-1, 1] 
y: [b, t] -> [-1, 1]
z: [-n, -f] -> [-1, 1].

And your projection matrix is response for mapping the frustum which is symmetric about the origin into NDC cube and only map z to [0, 1] and let $w_n=z_c$, that is:

x: [-r, -l] -> [-1, 1] 
y: [-t, -b] -> [-1, 1]
z: [n, f] -> [0, 1].

Here is my derivation:

While $x_c$ means x coord in camera coord and $x_n$ means x coord in homogenous NDC coord and $x_n'$ means x coord in NDC coord.

I tried may ways to derive your projection matrix and only this way of understanding works. Please help me if I have miss understanding.

from gaussian-splatting.

PanagiotisP commented on July 1, 2024 1

If we put z=−n in your matrix then we get ff−n(−n)−fnf−n−n=2ff−n≠0. So I think your projection is not meat to map z from (-n, -f) into (0, 1). Instead, you may map (n, f) into (0, 1). In other world, your camera z axis is opposite to opengl camera z axis.

Ok, I see where the mixup is. You are right, the camera space here, unlike OpenGL, has a positive z-axis, so the boundaries are [n, f]. I'm sorry, I didn't even know that OpenGL had a negative z-axis even for the camera space. I thought it was only the NDC/clip space.

from gaussian-splatting.

MatteoFusconi commented on July 1, 2024

Thank you for the answer, now it is way clearer.

It is still not clear to me what is the point of projecting in the NDC space.
Does the rasterizer need to have the gaussians in NDC in order to work?
Because the points can still be projected from the image space to the world space without passing through the NDC

Thanks a lot for your availability

from gaussian-splatting.

PanagiotisP commented on July 1, 2024

NDC is nothing more than a 3D representation of the distorted (after the perspective transform) space. Definitely, you could skip it, but in general, I find it useful because it is still a 3D representation (you have a sense of depth), in which you have the comforts of orthographic projection (rays are parallel to each other, ray direction is just [0, 0, +-1], hit and occlusion tests are trivial etc).

from gaussian-splatting.

lihao2333 commented on July 1, 2024

Camera models are always a pain to get right. I haven't bothered about the intrinsic at all, so I cannot help you with that. Regarding the world to pixel space though I think I can help. First of all, you have the world_view_transform that simply transforms points from world to view/camera space. Then you have the projection_matrix that transforms points from view/camera to NDC space. The full_proj_transform is just the combination of the two. It transforms a point from world to NDC space.

Regarding the getProjectionMatrix, the code uses the OpenGL projection matrix. A very nice blog about that can be found here. However, there are two differences with the matrix shown on that blog. The first is that the used sign is the positive one (so entries [2,2] and [3,2] have a positive sign). Also, instead of using a cube from [-1, 1] in all coordinates for the NDC space, the z coordinate spans just [0,1]. So at entry [2,2] instead of (f + n) / (f - n), you have f / (f - n). As for the 4x4 shape, it is as such to handle homogenous coordinates.

I also want to point out that OpenGL uses column-major matrices, so they might be the transposed version of what you would expect (this is covered in the blog too). Hope I helped

I think you missed the third difference that your camera z axis is opposite with opengl camera.
Otherwise, the projection matrix should be this, I think.

from gaussian-splatting.

PanagiotisP commented on July 1, 2024

The sign I mentioned affects just the z axis. The code is pretty clear on how the sign is used.
To make it clear, the projection matrix that post mentions is

$$\begin{bmatrix} \dfrac{2n}{r-l} & 0 & \dfrac{r+l}{r-l} & 0\\\ 0 & \dfrac{2n}{t-b} & \dfrac{t+b}{t-b} & 0\\\ 0 & 0 & -\dfrac{f+n}{f-n} & -\dfrac{2fn}{f-n}\\\ 0 & 0 & -1 & 0\\\ \end{bmatrix}$$

while this code uses:

$$\begin{bmatrix} \dfrac{2n}{r-l} & 0 & \dfrac{r+l}{r-l} & 0\\\ 0 & \frac{2n}{t-b} & \dfrac{t+b}{t-b} & 0\\\ 0 & 0 & \dfrac{f}{f-n} & -\dfrac{fn}{f-n}\\\ 0 & 0 & 1 & 0\\\ \end{bmatrix}$$

from gaussian-splatting.

lihao2333 commented on July 1, 2024

If we put $z=-n$ in your matrix then we get $\frac{\frac{f}{f-n}(-n) - \frac{fn}{f-n}}{-n} = \frac{2f}{f-n} \neq 0$.
So I think your projection is not meat to map z from (-n, -f) into (0, 1).
Instead, you may map (n, f) into (0, 1).
In other world, your camera z axis is opposite to opengl camera z axis.

from gaussian-splatting.

zzg-zzg commented on July 1, 2024

I find a way to understand your projection matrix. Opengl projection matrix is response for mapping frustum into NDC cube and let wn=−zc, that is:
x: [l, r] -> [-1, 1] 
y: [b, t] -> [-1, 1]
z: [-n, -f] -> [-1, 1]. 
And your projection matrix is response for mapping the frustum which is symmetric about the origin into NDC cube and only map z to [0, 1] and let wn=zc, that is:
x: [-r, -l] -> [-1, 1] 
y: [-t, -b] -> [-1, 1]
z: [n, f] -> [0, 1]. 
Here is my derivation:

While xc means x coord in camera coord and xn means x coord in homogenous NDC coord and xn′ means x coord in NDC coord.

I tried may ways to derive your projection matrix and only this way of understanding works. Please help me if I have miss understanding.

I have tried many ways to understand this function, and your answer is the most convincing explanation I've seen so far.

from gaussian-splatting.

Question on the cameras and the projection matrices about gaussian-splatting HOT 9 CLOSED

Comments (9)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent