lioryariv / volsdf Goto Github PK
View Code? Open in Web Editor NEWLicense: MIT License
License: MIT License
Hi, thanks for the wonferful work!
I'm using a NeRF synthetic dataset, which follows OpenGL coordinate system convention (x-axis to the right, y-axis upward, and z-axis backward along the camera’s focal axis). When I apply the dataset to VolSDF directly, the computed ray_dir
is incorrect.
I think the problem is in the rotation matrix, DTU/BlendedMVs might follow a different convention. But I couldn't find anything about the coordinate system convention of DTU dataset, do you know about this?
I also mention this in #12 .
Thank you very much!
Hello! Thank you for sharing this wonderful work
I wonder why the author used alpha * (0.5 + 0.5 * sdf.sign() * torch.expm1(-sdf.abs() / beta))
instead of psi = torch.where(sdf > 0, exp, 1 - exp)
with exp = 0.5 * torch.expm1(-sdf.abs() / beta)
.
Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.
load_K_Rt_from_P
at line 48 in rend_util.py:volsdf/code/utils/rend_util.py
Lines 48 to 50 in a974c88
volsdf/code/utils/rend_util.py
Lines 73 to 78 in a974c88
pose
as a cameraToWorld
matrix.k = np.array([[631, 0, 384],
[ 0, 631, 288],
[ 0, 0, 1]])
r = np.array([[-0.30164902, 0.68282439, -0.66540117],
[-0.63417301, 0.37743435, 0.67480953],
[ 0.71192167, 0.6255351 , 0.3191761 ]])
t = np.array([ 3.75082481, -1.18089565, 1.06138781])
C = np.eye(4)
C[:3, :3] = k @ r
C[:3, 3] = k @ r @ t
out = cv2.decomposeProjectionMatrix(C[:3, :])
If I convert r
and t
into a homogeneous coordinate, then I take the R@T
, which is the worldToCamera
matrix. I will get:
>>> T=np.eye(4)
>>> T[:3,3]=t
>>> R=np.eye(4)
>>> R[:3,:3]=r
>>> R@T
array([[-0.30164902, 0.68282439, -0.66540117, -2.64402567],
[-0.63417301, 0.37743435, 0.67480953, -2.10814783],
[ 0.71192167, 0.6255351 , 0.3191761 , 2.27037141],
[ 0. , 0. , 0. , 1. ]])
Then if I take the inverse of R@T, which I think is the cameraToWorld
matrix. I will get:
>>> np.linalg.inv((R@T))
array([[-0.30164902, -0.63417301, 0.71192166, -3.75082481],
[ 0.6828244 , 0.37743435, 0.6255351 , 1.18089565],
[-0.66540117, 0.67480953, 0.3191761 , -1.06138781],
[ 0. , 0. , 0. , 1. ]])
This result seems that, to get the cameraToWorld
matrix, we should concatenate the R^(-1)
and -T
, instead of R^(-1)
and T
referred in line 31 in rend_util.py:
volsdf/code/utils/rend_util.py
Lines 48 to 50 in a974c88
R^(-1)
and T
here.
lift
in line 96 in rend_util.py:volsdf/code/utils/rend_util.py
Lines 96 to 109 in a974c88
x_lift
takes y
and fy
into consideration.sk
should be 0, but I test it in runtime and I get:intrinsics
tensor([[[ 2.8923e+03, -2.1742e-04, 8.2320e+02, 0.0000e+00],
[ 0.0000e+00, 2.8832e+03, 6.1907e+02, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00]]],
device='cuda:0')
It seems that sk
is not 0. So the transformation becomes:
Here [x,y,z,1] is the point in the camera coordinates.
I find that:
The actual result of x_lift
is:
But in rend_list.py, x_lift
is like to be:
So when z=1
, the code is correct. Would it be better if it is simply changed to be:
x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z
(/ z
is added to the x
)
The first question means more to me than the second question. Would you please explain the logic of pose
matrix to me.
Hope this issue would help other people as well.
I try my best to express my question as clear as possible. If there's something unclear or wrong with me, please inform of me.
Hi~Is there a pre-trained mode? And I want to know how long it takes to train a model and your computer configuration.
Hope for your reply.
Dear @lioryariv
Thank you so much for your work. I have not found how to evaluate VolSDF on BlendedMVS dataset, i.e computing the chamfer distance. The BlendedMVS repo doesn't seem to provide such script.
Would you mind pointing me to the right place? Thanks!
Thanks for your great work.
I saw you compute d_star, using a triangle rule. However, In your paper, you said that d_star = max(0, |d_i+1| + |d_i| - ti+1 +ti)
I am confused about it. could you tell me why you use the triangle rule instead of d_star = max(0, |d_i+1| + |d_i| - ti+1 +ti)
In my opinion, if you use max(0, |d_i+1| + |d_i| - ti+1 +ti)
, the code would be easier than the triangle rule.
May I ask if this project has the requirement. txt file? If not, can anyone provide one?
After running your source code, I found these issues:
Your source code does not handle the cases with lower resolutions. e.g. for DTU dataset, if you lower the resolution to something like [ 300, 400 ]
, the code will crash and the default resolution [ 1200, 1600 ]
is hard-coded in the source code.
The first issue crashes the code also in the psnr
computation due to the same reason
The renderings during evaluation generate blank squares. Here bellow you can see the renderings from scan 114 of the DTU dataset:
It seems the your model is not training at all! Because the loss
and the psnr
value dose not change from the beginning and remains at something like 0.1
and 12
respectively.
Your source code will not run without lowering the config. You will immediately get OOM error.
Your code generates messages like face_normals incorrect shape, ignoring!
and face_normals all zero, ignoring!
all the times. I still don't know what they mean, or if they are the cause of the issues above.
Some of the issues I mentioned above have been already found by others in other issues. I did not see any response from the authors, so I am writing these here. I am looking forward to the authors response.
Great work!
When I tried running the model for blender lego, I found the local_matrix is like:
0.00000,0.00000,0.00000,0.00000
0.00000,0.00000,0.00000,0.00000
0.00000,0.00000,0.00000,-4.03113
0.00000,0.00000,0.00000,1.00000
While the focal is set as 1111, while fx=fy=400.
Is that result resonable or not?
Hi authors, thanks for the inspiring and awesome work!
One question about the DTU dataset, where can I find the codes to process the raw official DTU dataset( https://roboimagedata.compute.dtu.dk/?page_id=36) to the one you provided (https://www.dropbox.com/s/s6psnh1q91m4kgo/DTU.zip)? I'm curious about which set of images (cleaned or rectified) and what lightning did you use. Sorry if I missed the code somewhere in the repo, thanks in advance for your help!
hi, thanks for sharing this cool work! is it possible to configure multiple image resolutions and cam intrinsics? if I change scene_dataset.py to support this will the rest of the training succeed?
Hi, thank you for your decent work. I try to follow your work recently and I meet some problems which I wish to get answers from this issue.
load_K_Rt_from_P
at line 48 in rend_util.py:volsdf/code/utils/rend_util.py
Lines 48 to 50 in a974c88
volsdf/code/utils/rend_util.py
Lines 73 to 78 in a974c88
pose
as a cameraToWorld
matrix.k = np.array([[631, 0, 384],
[ 0, 631, 288],
[ 0, 0, 1]])
r = np.array([[-0.30164902, 0.68282439, -0.66540117],
[-0.63417301, 0.37743435, 0.67480953],
[ 0.71192167, 0.6255351 , 0.3191761 ]])
t = np.array([ 3.75082481, -1.18089565, 1.06138781])
C = np.eye(4)
C[:3, :3] = k @ r
C[:3, 3] = k @ r @ t
out = cv2.decomposeProjectionMatrix(C[:3, :])
If I convert r
and t
into a homogeneous coordinate, then I take the R@T
, which is the worldToCamera
matrix. I will get:
>>> T=np.eye(4)
>>> T[:3,3]=t
>>> R=np.eye(4)
>>> R[:3,:3]=r
>>> R@T
array([[-0.30164902, 0.68282439, -0.66540117, -2.64402567],
[-0.63417301, 0.37743435, 0.67480953, -2.10814783],
[ 0.71192167, 0.6255351 , 0.3191761 , 2.27037141],
[ 0. , 0. , 0. , 1. ]])
Then if I take the inverse of R@T, which I think is the cameraToWorld
matrix. I will get:
>>> np.linalg.inv((R@T))
array([[-0.30164902, -0.63417301, 0.71192166, -3.75082481],
[ 0.6828244 , 0.37743435, 0.6255351 , 1.18089565],
[-0.66540117, 0.67480953, 0.3191761 , -1.06138781],
[ 0. , 0. , 0. , 1. ]])
This result seems that, to get the cameraToWorld
matrix, we should concatenate the R^(-1)
and -T
, instead of R^(-1)
and T
referred in line 31 in rend_util.py:
volsdf/code/utils/rend_util.py
Lines 48 to 50 in a974c88
R^(-1)
and T
here.
lift
in line 96 in rend_util.py:volsdf/code/utils/rend_util.py
Lines 96 to 109 in a974c88
x_lift
takes y
and fy
into consideration.sk
should be 0, but I test it in runtime and I get:intrinsics
tensor([[[ 2.8923e+03, -2.1742e-04, 8.2320e+02, 0.0000e+00],
[ 0.0000e+00, 2.8832e+03, 6.1907e+02, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 1.0000e+00, 0.0000e+00],
[ 0.0000e+00, 0.0000e+00, 0.0000e+00, 1.0000e+00]]],
device='cuda:0')
It seems that sk
is not 0. So the transformation becomes:
Here [x,y,z,1] is the point in the camera coordinates.
I find that:
The actual result of x_lift
is:
But in rend_list.py, x_lift
is like to be:
So when z=1
, the code is correct. Would it be better if it is simply changed to be:
x_lift = (x / z - cx.unsqueeze(-1) + cy.unsqueeze(-1)*sk.unsqueeze(-1)/fy.unsqueeze(-1) - sk.unsqueeze(-1)*y/fy.unsqueeze(-1)) / fx.unsqueeze(-1) * z
(/ z
is added to the x
)
The first question means more to me than the second question. Would you please explain the logic of pose
matrix to me.
Hope this issue would help other people as well.
dear authors:
very nice work!
but when i run the program, i met some "problems".
after loading the dataset, i found that the program will display "face_normals incorrect shape, ignoring!", however, i didn't see where print the sentence in code. so i want to know if i miss something, what is going on when i see "face_normals incorrect shape...".
regards
Nice work.
Where can I get the supplement?I couldn't find on Github page, project page ,arxiv or any other place......
Thank you.
Great job! Can you share the mesh (.ply format) generated by NeRF for each scene on the DTU dataset?
Hello!
In eq (6), it shows tua_t = density * transperancy(or transmittance)
but in the code, tua_t(or weights) = alpha * transmittance where alpha = 1.0 - torch.exp(-dists * density).
Hi, I find that your project page is very good, can you share its source code?
Hello, is the psnr of each image compared? Thanks you!
Dear author,
How to reconstruct texture after generating mesh ? Can you give me any suggestion?
Thanks!
Hello! Thank you for sharing this wonderful work
Vol-sdf is very fantastic work at least for me.
I have a minor question about code in preprocess.py, though.
I wonder why below multiplication is needed rather than only inverse K.
If you answer it, it would be really appreciated.
Thanks a lot!
=========================================
def get_center_point(num_cams,cameras):
....
v = np.linalg.inv(K) @ np.array([800, 600, 1]) #why is it needed?
v = v / np.linalg.norm(v)
....
return soll,camera_centers
=======================================
Hi! Thanks for the amazing work!
I've got a question and hope you could give me a hint. To my understand, the "camera normalization" step (as stated in Supplementary material B.1) is only used for placing the camera inside a sphere in the world coordinate. In the .conf
file there is a parameter scene_bounding_sphere
, and according to the class ImplicitNetwork
, if scene_bounding_sphere
is set to 0, the step "Clamping the SDF with the scene bounding sphere" would be ignored.
My question is I'm not sure if I could skip this step. In my experiment setting, the camera is placed on the surfce of a sphere (rather than at the center of the sphere). If I do not use normalize_cameras.py
to undergo the data convention, and set the parameter scene_bounding_sphere
to the radius of the sphere, I'm not sure whether it would effect the opacity approximation error bound.
Could you illustrate more about the function of the "camera normalization" step? Thank you very much! 😆
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.