Git Product home page Git Product logo

volsdf's People

Contributors

lioryariv avatar ykasten avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

volsdf's Issues

DTU's coordinate system convention

Hi, thanks for the wonferful work!

I'm using a NeRF synthetic dataset, which follows OpenGL coordinate system convention (x-axis to the right, y-axis upward, and z-axis backward along the camera’s focal axis). When I apply the dataset to VolSDF directly, the computed ray_dir is incorrect.

I think the problem is in the rotation matrix, DTU/BlendedMVs might follow a different convention. But I couldn't find anything about the coordinate system convention of DTU dataset, do you know about this?

I also mention this in #12 .

Thank you very much!

LaplaceDensity

Hello! Thank you for sharing this wonderful work
I wonder why the author used alpha * (0.5 + 0.5 * sdf.sign() * torch.expm1(-sdf.abs() / beta)) instead of psi = torch.where(sdf > 0, exp, 1 - exp) with exp = 0.5 * torch.expm1(-sdf.abs() / beta).

Some questions about rend_util.py

Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.

  1. First question:
    In function load_K_Rt_from_P at line 48 in rend_util.py:
    pose = np.eye(4, dtype=np.float32)
    pose[:3, :3] = R.transpose()
    pose[:3,3] = (t[:3] / t[3])[:,0]

    This code really makes me confused and I'm not able to give an explanation to it.
    I read the following code at line 78 in rend_util.py:
    pixel_points_cam = lift(x_cam, y_cam, z_cam, intrinsics=intrinsics)
    # permute for batch matrix product
    pixel_points_cam = pixel_points_cam.permute(0, 2, 1)
    world_coords = torch.bmm(p, pixel_points_cam).permute(0, 2, 1)[:, :, :3]

    It seems that you use pose as a cameraToWorld matrix.
    I did an experiment in advance, the following code is from stackoverflow:
k = np.array([[631,   0, 384],
              [  0, 631, 288],
              [  0,   0,   1]])
r = np.array([[-0.30164902,  0.68282439, -0.66540117],
              [-0.63417301,  0.37743435,  0.67480953],
              [ 0.71192167,  0.6255351 ,  0.3191761 ]])
t = np.array([ 3.75082481, -1.18089565,  1.06138781])

C = np.eye(4)
C[:3, :3] = k @ r
C[:3, 3] = k @ r @ t

out = cv2.decomposeProjectionMatrix(C[:3, :])

If I convert r and t into a homogeneous coordinate, then I take the R@T, which is the worldToCamera matrix. I will get:

>>> T=np.eye(4)
>>> T[:3,3]=t
>>> R=np.eye(4)
>>> R[:3,:3]=r
>>> R@T
array([[-0.30164902,  0.68282439, -0.66540117, -2.64402567],
       [-0.63417301,  0.37743435,  0.67480953, -2.10814783],
       [ 0.71192167,  0.6255351 ,  0.3191761 ,  2.27037141],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Then if I take the inverse of R@T, which I think is the cameraToWorld matrix. I will get:

>>> np.linalg.inv((R@T))
array([[-0.30164902, -0.63417301,  0.71192166, -3.75082481],
       [ 0.6828244 ,  0.37743435,  0.6255351 ,  1.18089565],
       [-0.66540117,  0.67480953,  0.3191761 , -1.06138781],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

This result seems that, to get the cameraToWorld matrix, we should concatenate the R^(-1) and -T, instead of R^(-1) and T referred in line 31 in rend_util.py:

pose = np.eye(4, dtype=np.float32)
pose[:3, :3] = R.transpose()
pose[:3,3] = (t[:3] / t[3])[:,0]

I don't know why it takes R^(-1) and T here.

  1. Second question:
    In function lift in line 96 in rend_util.py:
    def lift(x, y, z, intrinsics):
    # parse intrinsics
    intrinsics = intrinsics.cuda()
    fx = intrinsics[:, 0, 0]
    fy = intrinsics[:, 1, 1]
    cx = intrinsics[:, 0, 2]
    cy = intrinsics[:, 1, 2]
    sk = intrinsics[:, 0, 1]
    x_lift = (x - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z
    y_lift = (y - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z
    # homogeneous
    return torch.stack((x_lift, y_lift, z, torch.ones_like(z).cuda()), dim=-1)

    I don't know why the x_lift takes y and fy into consideration.
    It seems that sk should be 0, but I test it in runtime and I get:
intrinsics
tensor([[[ 2.8923e+03, -2.1742e-04,  8.2320e+02,  0.0000e+00],
         [ 0.0000e+00,  2.8832e+03,  6.1907e+02,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.0000e+00]]],
       device='cuda:0')

It seems that sk is not 0. So the transformation becomes:

$$ \begin{bmatrix} x'\\y'\\z \end{bmatrix}= \begin{bmatrix} f_x&sk&c_x&0\\ 0&f_y&c_y&0\\ 0&0&1&0 \end{bmatrix} \begin{bmatrix} x\_lift\\y\_lift\\z\\1 \end{bmatrix} $$

Here [x,y,z,1] is the point in the camera coordinates.
I find that:

$$ x'=f_x \cdot x\_lift + sk \cdot y\_lift + c_x \cdot z $$

The actual result of x_lift is:

$$ x\_lift = \cfrac{x'-c_x \cdot z}{f_x} - sk \cdot y\_lift $$

But in rend_list.py, x_lift is like to be:

$$ x\_lift = \cfrac{(x'-c_x)\cdot z}{f_x} - sk \cdot y\_lift $$

So when z=1, the code is correct. Would it be better if it is simply changed to be:

x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z

(/ z is added to the x)

The first question means more to me than the second question. Would you please explain the logic of pose matrix to me.

Hope this issue would help other people as well.

I try my best to express my question as clear as possible. If there's something unclear or wrong with me, please inform of me.

pre-trained model

Hi~Is there a pre-trained mode? And I want to know how long it takes to train a model and your computer configuration.
Hope for your reply.

Evaluation on BlendedMVS

Dear @lioryariv

Thank you so much for your work. I have not found how to evaluate VolSDF on BlendedMVS dataset, i.e computing the chamfer distance. The BlendedMVS repo doesn't seem to provide such script.

Would you mind pointing me to the right place? Thanks!

about sampling implementation.

Thanks for your great work.

I saw you compute d_star, using a triangle rule. However, In your paper, you said that d_star = max(0, |d_i+1| + |d_i| - ti+1 +ti)
I am confused about it. could you tell me why you use the triangle rule instead of d_star = max(0, |d_i+1| + |d_i| - ti+1 +ti)
In my opinion, if you use max(0, |d_i+1| + |d_i| - ti+1 +ti), the code would be easier than the triangle rule.

Model not training, evaluation generate blank, lower resolution is not handled

After running your source code, I found these issues:

  • Your source code does not handle the cases with lower resolutions. e.g. for DTU dataset, if you lower the resolution to something like [ 300, 400 ], the code will crash and the default resolution [ 1200, 1600 ] is hard-coded in the source code.

  • The first issue crashes the code also in the psnr computation due to the same reason

  • The renderings during evaluation generate blank squares. Here bellow you can see the renderings from scan 114 of the DTU dataset:

Screenshot 2022-09-04 at 18 10 42

  • It seems the your model is not training at all! Because the loss and the psnr value dose not change from the beginning and remains at something like 0.1 and 12 respectively.

  • Your source code will not run without lowering the config. You will immediately get OOM error.

  • Your code generates messages like face_normals incorrect shape, ignoring! and face_normals all zero, ignoring! all the times. I still don't know what they mean, or if they are the cause of the issues above.

Some of the issues I mentioned above have been already found by others in other issues. I did not see any response from the authors, so I am writing these here. I am looking forward to the authors response.

Question about normalization

Great work!
When I tried running the model for blender lego, I found the local_matrix is like:
0.00000,0.00000,0.00000,0.00000
0.00000,0.00000,0.00000,0.00000
0.00000,0.00000,0.00000,-4.03113
0.00000,0.00000,0.00000,1.00000
While the focal is set as 1111, while fx=fy=400.

Is that result resonable or not?

Preprocess script for the official DTU dataset

Hi authors, thanks for the inspiring and awesome work!

One question about the DTU dataset, where can I find the codes to process the raw official DTU dataset( https://roboimagedata.compute.dtu.dk/?page_id=36) to the one you provided (https://www.dropbox.com/s/s6psnh1q91m4kgo/DTU.zip)? I'm curious about which set of images (cleaned or rectified) and what lightning did you use. Sorry if I missed the code somewhere in the repo, thanks in advance for your help!

using multiple image resolution and intrinsics

hi, thanks for sharing this cool work! is it possible to configure multiple image resolutions and cam intrinsics? if I change scene_dataset.py to support this will the rest of the training succeed?

Some question on rend_util.py

Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.

  1. First question:
    In function load_K_Rt_from_P at line 48 in rend_util.py:
    pose = np.eye(4, dtype=np.float32)
    pose[:3, :3] = R.transpose()
    pose[:3,3] = (t[:3] / t[3])[:,0]

    This code really makes me confused and I'm not able to give an explanation to it.
    I read the following code at line 78 in rend_util.py:
    pixel_points_cam = lift(x_cam, y_cam, z_cam, intrinsics=intrinsics)
    # permute for batch matrix product
    pixel_points_cam = pixel_points_cam.permute(0, 2, 1)
    world_coords = torch.bmm(p, pixel_points_cam).permute(0, 2, 1)[:, :, :3]

    It seems that you use pose as a cameraToWorld matrix.
    I did an experiment in advance, the following code is from stackoverflow:
k = np.array([[631,   0, 384],
              [  0, 631, 288],
              [  0,   0,   1]])
r = np.array([[-0.30164902,  0.68282439, -0.66540117],
              [-0.63417301,  0.37743435,  0.67480953],
              [ 0.71192167,  0.6255351 ,  0.3191761 ]])
t = np.array([ 3.75082481, -1.18089565,  1.06138781])

C = np.eye(4)
C[:3, :3] = k @ r
C[:3, 3] = k @ r @ t

out = cv2.decomposeProjectionMatrix(C[:3, :])

If I convert r and t into a homogeneous coordinate, then I take the R@T, which is the worldToCamera matrix. I will get:

>>> T=np.eye(4)
>>> T[:3,3]=t
>>> R=np.eye(4)
>>> R[:3,:3]=r
>>> R@T
array([[-0.30164902,  0.68282439, -0.66540117, -2.64402567],
       [-0.63417301,  0.37743435,  0.67480953, -2.10814783],
       [ 0.71192167,  0.6255351 ,  0.3191761 ,  2.27037141],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

Then if I take the inverse of R@T, which I think is the cameraToWorld matrix. I will get:

>>> np.linalg.inv((R@T))
array([[-0.30164902, -0.63417301,  0.71192166, -3.75082481],
       [ 0.6828244 ,  0.37743435,  0.6255351 ,  1.18089565],
       [-0.66540117,  0.67480953,  0.3191761 , -1.06138781],
       [ 0.        ,  0.        ,  0.        ,  1.        ]])

This result seems that, to get the cameraToWorld matrix, we should concatenate the R^(-1) and -T, instead of R^(-1) and T referred in line 31 in rend_util.py:

pose = np.eye(4, dtype=np.float32)
pose[:3, :3] = R.transpose()
pose[:3,3] = (t[:3] / t[3])[:,0]

I don't know why it takes R^(-1) and T here.

  1. Second question:
    In function lift in line 96 in rend_util.py:
    def lift(x, y, z, intrinsics):
    # parse intrinsics
    intrinsics = intrinsics.cuda()
    fx = intrinsics[:, 0, 0]
    fy = intrinsics[:, 1, 1]
    cx = intrinsics[:, 0, 2]
    cy = intrinsics[:, 1, 2]
    sk = intrinsics[:, 0, 1]
    x_lift = (x - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z
    y_lift = (y - cy.unsqueeze(-1)) / fy.unsqueeze(-1) * z
    # homogeneous
    return torch.stack((x_lift, y_lift, z, torch.ones_like(z).cuda()), dim=-1)

    I don't know why the x_lift takes y and fy into consideration.
    It seems that sk should be 0, but I test it in runtime and I get:
intrinsics
tensor([[[ 2.8923e+03, -2.1742e-04,  8.2320e+02,  0.0000e+00],
         [ 0.0000e+00,  2.8832e+03,  6.1907e+02,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  1.0000e+00,  0.0000e+00],
         [ 0.0000e+00,  0.0000e+00,  0.0000e+00,  1.0000e+00]]],
       device='cuda:0')

It seems that sk is not 0. So the transformation becomes:

$$ \begin{bmatrix} x'\\y'\\z \end{bmatrix}= \begin{bmatrix} f_x&sk&c_x&0\\ 0&f_y&c_y&0\\ 0&0&1&0 \end{bmatrix} \begin{bmatrix} x\_lift\\y\_lift\\z\\1 \end{bmatrix} $$

Here [x,y,z,1] is the point in the camera coordinates.
I find that:

$$ x'=f_x \cdot x\_lift + sk \cdot y\_lift + c_x \cdot z $$

The actual result of x_lift is:

$$ x\_lift = \cfrac{x'-c_x \cdot z}{f_x} - sk \cdot y\_lift $$

But in rend_list.py, x_lift is like to be:

$$ x\_lift = \cfrac{(x'-c_x)\cdot z}{f_x} - sk \cdot y\_lift $$

So when z=1, the code is correct. Would it be better if it is simply changed to be:

x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z

(/ z is added to the x)

The first question means more to me than the second question. Would you please explain the logic of pose matrix to me.

Hope this issue would help other people as well.

about training on other datasets

dear authors:
very nice work!
but when i run the program, i met some "problems".
after loading the dataset, i found that the program will display "face_normals incorrect shape, ignoring!", however, i didn't see where print the sentence in code. so i want to know if i miss something, what is going on when i see "face_normals incorrect shape...".
regards

about implemention of eq (6)

Hello!
In eq (6), it shows tua_t = density * transperancy(or transmittance)
but in the code, tua_t(or weights) = alpha * transmittance where alpha = 1.0 - torch.exp(-dists * density).

project page

Hi, I find that your project page is very good, can you share its source code?

Question about normalization matrix

Hello! Thank you for sharing this wonderful work
Vol-sdf is very fantastic work at least for me.
I have a minor question about code in preprocess.py, though.

I wonder why below multiplication is needed rather than only inverse K.
If you answer it, it would be really appreciated.
Thanks a lot!

=========================================

  def get_center_point(num_cams,cameras):

    ....
    v = np.linalg.inv(K) @ np.array([800, 600, 1]) #why is it needed?
    v = v / np.linalg.norm(v)

    ....  
  return soll,camera_centers

=======================================

Questions about the camera-normalization step

Hi! Thanks for the amazing work!

I've got a question and hope you could give me a hint. To my understand, the "camera normalization" step (as stated in Supplementary material B.1) is only used for placing the camera inside a sphere in the world coordinate. In the .conf file there is a parameter scene_bounding_sphere, and according to the class ImplicitNetwork, if scene_bounding_sphere is set to 0, the step "Clamping the SDF with the scene bounding sphere" would be ignored.

My question is I'm not sure if I could skip this step. In my experiment setting, the camera is placed on the surfce of a sphere (rather than at the center of the sphere). If I do not use normalize_cameras.py to undergo the data convention, and set the parameter scene_bounding_sphere to the radius of the sphere, I'm not sure whether it would effect the opacity approximation error bound.

Could you illustrate more about the function of the "camera normalization" step? Thank you very much! 😆

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.