lioryariv / volsdf Goto Github PK

View Code? Open in Web Editor NEW

363.0 363.0 50.0 3.24 MB

License: MIT License

Python 99.52% Shell 0.48%

volsdf's People

Contributors

Stargazers

Watchers

volsdf's Issues

DTU's coordinate system convention

Hi, thanks for the wonferful work!

I'm using a NeRF synthetic dataset, which follows OpenGL coordinate system convention (x-axis to the right, y-axis upward, and z-axis backward along the camera’s focal axis). When I apply the dataset to VolSDF directly, the computed ray_dir is incorrect.

I think the problem is in the rotation matrix, DTU/BlendedMVs might follow a different convention. But I couldn't find anything about the coordinate system convention of DTU dataset, do you know about this?

I also mention this in #12 .

Thank you very much!

LaplaceDensity

Hello! Thank you for sharing this wonderful work
I wonder why the author used alpha * (0.5 + 0.5 * sdf.sign() * torch.expm1(-sdf.abs() / beta)) instead of psi = torch.where(sdf > 0, exp, 1 - exp) with exp = 0.5 * torch.expm1(-sdf.abs() / beta).

Some questions about rend_util.py

Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.

First question:
In function load_K_Rt_from_P at line 48 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 48 to 50 in a974c88

 pose = np.eye(4, dtype=np.float32) 

 pose[:3, :3] = R.transpose() 

 pose[:3,3] = (t[:3] / t[3])[:,0]

This code really makes me confused and I'm not able to give an explanation to it.
I read the following code at line 78 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 73 to 78 in a974c88

 pixel_points_cam = lift(x_cam, y_cam, z_cam, intrinsics=intrinsics) 

 # permute for batch matrix product 

 pixel_points_cam = pixel_points_cam.permute(0, 2, 1) 

 world_coords = torch.bmm(p, pixel_points_cam).permute(0, 2, 1)[:, :, :3]

It seems that you use pose as a cameraToWorld matrix.
I did an experiment in advance, the following code is from stackoverflow:

k = np.array([[631,   0, 384],
              [  0, 631, 288],
              [  0,   0,   1]])
r = np.array([[-0.30164902,  0.68282439, -0.66540117],
              [-0.63417301,  0.37743435,  0.67480953],
              [ 0.71192167,  0.6255351 ,  0.3191761 ]])
t = np.array([ 3.75082481, -1.18089565,  1.06138781])

C = np.eye(4)
C[:3, :3] = k @ r
C[:3, 3] = k @ r @ t

out = cv2.decomposeProjectionMatrix(C[:3, :])

If I convert r and t into a homogeneous coordinate, then I take the R@T, which is the worldToCamera matrix. I will get:

>>> T=np.eye(4)
>>> T[:3,3]=t
>>> R=np.eye(4)
>>> R[:3,:3]=r
>>> R@T
array([[-0.30164902,  0.68282439, -0.66540117, -2.64402567],
       [-0.63417301,  0.37743435,  0.67480953, -2.10814783],
       [ 0.71192167,  0.6255351 ,  0.3191761 ,  2.27037141],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Then if I take the inverse of R@T, which I think is the cameraToWorld matrix. I will get:

>>> np.linalg.inv((R@T))
array([[-0.30164902, -0.63417301,  0.71192166, -3.75082481],
       [ 0.6828244 ,  0.37743435,  0.6255351 ,  1.18089565],
       [-0.66540117,  0.67480953,  0.3191761 , -1.06138781],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

This result seems that, to get the cameraToWorld matrix, we should concatenate the R^(-1) and -T, instead of R^(-1) and T referred in line 31 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 48 to 50 in a974c88

 pose = np.eye(4, dtype=np.float32) 

 pose[:3, :3] = R.transpose() 

 pose[:3,3] = (t[:3] / t[3])[:,0]

I don't know why it takes R^(-1) and T here.

Second question:
In function lift in line 96 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 96 to 109 in a974c88

 def lift(x, y, z, intrinsics): 

 # parse intrinsics 

 intrinsics = intrinsics.cuda() 

 fx = intrinsics[:, 0, 0] 

 fy = intrinsics[:, 1, 1] 

 cx = intrinsics[:, 0, 2] 

 cy = intrinsics[:, 1, 2] 

 sk = intrinsics[:, 0, 1] 

 x_lift = (x - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z 

 y_lift = (y - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z 

 # homogeneous 

 return torch.stack((x_lift, y_lift, z, torch.ones_like(z).cuda()), dim=-1)

I don't know why the x_lift takes y and fy into consideration.
It seems that sk should be 0, but I test it in runtime and I get:

intrinsics
tensor([[[ 2.8923e+03, -2.1742e-04,  8.2320e+02,  0.0000e+00],
         [ 0.0000e+00,  2.8832e+03,  6.1907e+02,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.0000e+00]]],
       device='cuda:0')

It seems that sk is not 0. So the transformation becomes:

$$ \begin{bmatrix} x'\\y'\\z \end{bmatrix}= \begin{bmatrix} f_x&sk&c_x&0\\ 0&f_y&c_y&0\\ 0&0&1&0 \end{bmatrix} \begin{bmatrix} x\_lift\\y\_lift\\z\\1 \end{bmatrix} $$

Here [x,y,z,1] is the point in the camera coordinates.
I find that:

$$ x'=f_x \cdot x\_lift + sk \cdot y\_lift + c_x \cdot z $$

The actual result of x_lift is:

$$ x\_lift = \cfrac{x'-c_x \cdot z}{f_x} - sk \cdot y\_lift $$

But in rend_list.py, x_lift is like to be:

$$ x\_lift = \cfrac{(x'-c_x)\cdot z}{f_x} - sk \cdot y\_lift $$

So when z=1, the code is correct. Would it be better if it is simply changed to be:

x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z

(/ z is added to the x)

The first question means more to me than the second question. Would you please explain the logic of pose matrix to me.

Hope this issue would help other people as well.

I try my best to express my question as clear as possible. If there's something unclear or wrong with me, please inform of me.

pre-trained model

Hi~Is there a pre-trained mode? And I want to know how long it takes to train a model and your computer configuration.
Hope for your reply.

Evaluation on BlendedMVS

Dear @lioryariv

Thank you so much for your work. I have not found how to evaluate VolSDF on BlendedMVS dataset, i.e computing the chamfer distance. The BlendedMVS repo doesn't seem to provide such script.

Would you mind pointing me to the right place? Thanks!

about sampling implementation.

Thanks for your great work.

I saw you compute d_star, using a triangle rule. However, In your paper, you said that d_star = max(0, |d_i+1| + |d_i| - ti+1 +ti)
I am confused about it. could you tell me why you use the triangle rule instead of d_star = max(0, |d_i+1| + |d_i| - ti+1 +ti)
In my opinion, if you use max(0, |d_i+1| + |d_i| - ti+1 +ti), the code would be easier than the triangle rule.

May I ask if this project has the requirement. txt file? If not, can anyone provide one?

Model not training, evaluation generate blank, lower resolution is not handled

After running your source code, I found these issues:

Your source code does not handle the cases with lower resolutions. e.g. for DTU dataset, if you lower the resolution to something like [ 300, 400 ], the code will crash and the default resolution [ 1200, 1600 ] is hard-coded in the source code.
The first issue crashes the code also in the psnr computation due to the same reason
The renderings during evaluation generate blank squares. Here bellow you can see the renderings from scan 114 of the DTU dataset:

It seems the your model is not training at all! Because the loss and the psnr value dose not change from the beginning and remains at something like 0.1 and 12 respectively.
Your source code will not run without lowering the config. You will immediately get OOM error.
Your code generates messages like face_normals incorrect shape, ignoring! and face_normals all zero, ignoring! all the times. I still don't know what they mean, or if they are the cause of the issues above.

Some of the issues I mentioned above have been already found by others in other issues. I did not see any response from the authors, so I am writing these here. I am looking forward to the authors response.

Question about normalization

Great work!
When I tried running the model for blender lego, I found the local_matrix is like:
0.00000,0.00000,0.00000,0.00000
0.00000,0.00000,0.00000,0.00000
0.00000,0.00000,0.00000,-4.03113
0.00000,0.00000,0.00000,1.00000
While the focal is set as 1111, while fx=fy=400.

Is that result resonable or not?

Preprocess script for the official DTU dataset

Hi authors, thanks for the inspiring and awesome work!

One question about the DTU dataset, where can I find the codes to process the raw official DTU dataset( https://roboimagedata.compute.dtu.dk/?page_id=36) to the one you provided (https://www.dropbox.com/s/s6psnh1q91m4kgo/DTU.zip)? I'm curious about which set of images (cleaned or rectified) and what lightning did you use. Sorry if I missed the code somewhere in the repo, thanks in advance for your help!

using multiple image resolution and intrinsics

hi, thanks for sharing this cool work! is it possible to configure multiple image resolutions and cam intrinsics? if I change scene_dataset.py to support this will the rest of the training succeed?

Some question on rend_util.py

Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.

First question:
In function load_K_Rt_from_P at line 48 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 48 to 50 in a974c88

 pose = np.eye(4, dtype=np.float32) 

 pose[:3, :3] = R.transpose() 

 pose[:3,3] = (t[:3] / t[3])[:,0]

This code really makes me confused and I'm not able to give an explanation to it.
I read the following code at line 78 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 73 to 78 in a974c88

 pixel_points_cam = lift(x_cam, y_cam, z_cam, intrinsics=intrinsics) 

 # permute for batch matrix product 

 pixel_points_cam = pixel_points_cam.permute(0, 2, 1) 

 world_coords = torch.bmm(p, pixel_points_cam).permute(0, 2, 1)[:, :, :3]

It seems that you use pose as a cameraToWorld matrix.
I did an experiment in advance, the following code is from stackoverflow:

k = np.array([[631,   0, 384],
              [  0, 631, 288],
              [  0,   0,   1]])
r = np.array([[-0.30164902,  0.68282439, -0.66540117],
              [-0.63417301,  0.37743435,  0.67480953],
              [ 0.71192167,  0.6255351 ,  0.3191761 ]])
t = np.array([ 3.75082481, -1.18089565,  1.06138781])

C = np.eye(4)
C[:3, :3] = k @ r
C[:3, 3] = k @ r @ t

out = cv2.decomposeProjectionMatrix(C[:3, :])

If I convert r and t into a homogeneous coordinate, then I take the R@T, which is the worldToCamera matrix. I will get:

>>> T=np.eye(4)
>>> T[:3,3]=t
>>> R=np.eye(4)
>>> R[:3,:3]=r
>>> R@T
array([[-0.30164902,  0.68282439, -0.66540117, -2.64402567],
       [-0.63417301,  0.37743435,  0.67480953, -2.10814783],
       [ 0.71192167,  0.6255351 ,  0.3191761 ,  2.27037141],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Then if I take the inverse of R@T, which I think is the cameraToWorld matrix. I will get:

>>> np.linalg.inv((R@T))
array([[-0.30164902, -0.63417301,  0.71192166, -3.75082481],
       [ 0.6828244 ,  0.37743435,  0.6255351 ,  1.18089565],
       [-0.66540117,  0.67480953,  0.3191761 , -1.06138781],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

This result seems that, to get the cameraToWorld matrix, we should concatenate the R^(-1) and -T, instead of R^(-1) and T referred in line 31 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 48 to 50 in a974c88

 pose = np.eye(4, dtype=np.float32) 

 pose[:3, :3] = R.transpose() 

 pose[:3,3] = (t[:3] / t[3])[:,0]

I don't know why it takes R^(-1) and T here.

Second question:
In function lift in line 96 in rend_util.py:

volsdf/code/utils/rend_util.py

Lines 96 to 109 in a974c88

 def lift(x, y, z, intrinsics): 

 # parse intrinsics 

 intrinsics = intrinsics.cuda() 

 fx = intrinsics[:, 0, 0] 

 fy = intrinsics[:, 1, 1] 

 cx = intrinsics[:, 0, 2] 

 cy = intrinsics[:, 1, 2] 

 sk = intrinsics[:, 0, 1] 

 x_lift = (x - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z 

 y_lift = (y - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z 

 # homogeneous 

 return torch.stack((x_lift, y_lift, z, torch.ones_like(z).cuda()), dim=-1)

I don't know why the x_lift takes y and fy into consideration.
It seems that sk should be 0, but I test it in runtime and I get:

intrinsics
tensor([[[ 2.8923e+03, -2.1742e-04,  8.2320e+02,  0.0000e+00],
         [ 0.0000e+00,  2.8832e+03,  6.1907e+02,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.0000e+00]]],
       device='cuda:0')

It seems that sk is not 0. So the transformation becomes:

$$ \begin{bmatrix} x'\\y'\\z \end{bmatrix}= \begin{bmatrix} f_x&sk&c_x&0\\ 0&f_y&c_y&0\\ 0&0&1&0 \end{bmatrix} \begin{bmatrix} x\_lift\\y\_lift\\z\\1 \end{bmatrix} $$

Here [x,y,z,1] is the point in the camera coordinates.
I find that:

$$ x'=f_x \cdot x\_lift + sk \cdot y\_lift + c_x \cdot z $$

The actual result of x_lift is:

$$ x\_lift = \cfrac{x'-c_x \cdot z}{f_x} - sk \cdot y\_lift $$

But in rend_list.py, x_lift is like to be:

$$ x\_lift = \cfrac{(x'-c_x)\cdot z}{f_x} - sk \cdot y\_lift $$

So when z=1, the code is correct. Would it be better if it is simply changed to be:

x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z

(/ z is added to the x)

The first question means more to me than the second question. Would you please explain the logic of pose matrix to me.

Hope this issue would help other people as well.

about training on other datasets

dear authors:
very nice work!
but when i run the program, i met some "problems".
after loading the dataset, i found that the program will display "face_normals incorrect shape, ignoring!", however, i didn't see where print the sentence in code. so i want to know if i miss something, what is going on when i see "face_normals incorrect shape...".
regards

Hi！ Where can I get the supplement?

Nice work.
Where can I get the supplement?I couldn't find on Github page, project page ,arxiv or any other place......
Thank you.

the mesh generated by NeRF in the paper

Great job! Can you share the mesh (.ply format) generated by NeRF for each scene on the DTU dataset？

about implemention of eq (6)

Hello!
In eq (6), it shows tua_t = density * transperancy(or transmittance)
but in the code, tua_t(or weights) = alpha * transmittance where alpha = 1.0 - torch.exp(-dists * density).

project page

Hi, I find that your project page is very good, can you share its source code?

Hello, is the psnr of each image compared? Thanks you!

How to reconstruct texture after generating mesh ?

Dear author,

How to reconstruct texture after generating mesh ? Can you give me any suggestion?

Thanks!

Question about normalization matrix

Hello! Thank you for sharing this wonderful work
Vol-sdf is very fantastic work at least for me.
I have a minor question about code in preprocess.py, though.

I wonder why below multiplication is needed rather than only inverse K.
If you answer it, it would be really appreciated.
Thanks a lot!

=========================================

  def get_center_point(num_cams,cameras):

    ....
    v = np.linalg.inv(K) @ np.array([800, 600, 1]) #why is it needed?
    v = v / np.linalg.norm(v)

    ....  
  return soll,camera_centers

=======================================

Questions about the camera-normalization step

Hi! Thanks for the amazing work!

I've got a question and hope you could give me a hint. To my understand, the "camera normalization" step (as stated in Supplementary material B.1) is only used for placing the camera inside a sphere in the world coordinate. In the .conf file there is a parameter scene_bounding_sphere, and according to the class ImplicitNetwork, if scene_bounding_sphere is set to 0, the step "Clamping the SDF with the scene bounding sphere" would be ignored.

My question is I'm not sure if I could skip this step. In my experiment setting, the camera is placed on the surfce of a sphere (rather than at the center of the sphere). If I do not use normalize_cameras.py to undergo the data convention, and set the parameter scene_bounding_sphere to the radius of the sphere, I'm not sure whether it would effect the opacity approximation error bound.

Could you illustrate more about the function of the "camera normalization" step? Thank you very much! 😆

	pose = np.eye(4, dtype=np.float32)
	pose[:3, :3] = R.transpose()
	pose[:3,3] = (t[:3] / t[3])[:,0]

	pixel_points_cam = lift(x_cam, y_cam, z_cam, intrinsics=intrinsics)

	# permute for batch matrix product
	pixel_points_cam = pixel_points_cam.permute(0, 2, 1)

	world_coords = torch.bmm(p, pixel_points_cam).permute(0, 2, 1)[:, :, :3]

	def lift(x, y, z, intrinsics):
	# parse intrinsics
	intrinsics = intrinsics.cuda()
	fx = intrinsics[:, 0, 0]
	fy = intrinsics[:, 1, 1]
	cx = intrinsics[:, 0, 2]
	cy = intrinsics[:, 1, 2]
	sk = intrinsics[:, 0, 1]

	x_lift = (x - cx.unsqueeze(-1) + cy.unsqueeze(-1)sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z
	y_lift = (y - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z

	# homogeneous
	return torch.stack((x_lift, y_lift, z, torch.ones_like(z).cuda()), dim=-1)

lioryariv / volsdf Goto Github PK

volsdf's People

Contributors

Stargazers

Watchers

Forkers

volsdf's Issues

Recommend Projects

Recommend Topics

Recommend Org