Git Product home page Git Product logo

Comments (12)

pwais avatar pwais commented on May 12, 2024 10

Ah, if you have camera-to-world transforms (RT matrices in homogenous form) you can just embed them in the json as is done in the script:

f["transform_matrix"][0:3,3]-=centroid
camera_angle_x is the key NeRF intrinsic parameter deduced from the focal length (
fovx=angle_x*180/math.pi
)... in this project, the input can contain more intrinsic parameters as shown in the script.

However, note that the script tries to re-center the the whole scene. The nerf implementation in this project uses an occupancy estimation pass centered around (0, 0, 0), so you need your scene to be centered around that value no matter what your actual camera poses are. You also likely need to re-scale your scene to try to fit into the unit cube, as is also done in the script:

f["transform_matrix"][0:3,3]*=3./avglen # scale to "nerf sized"

Let's say that you have a scene of a large boat, as is shown in one of the demo videos, and your camera poses are in meters in a coordinate frame centered a the very first capture image (dead reckoning). The boat is several meters long, and your system's origin is at one corner of the boat, so you need to rigidly translate your system so that the center of the boat is at (0,0,0) and the scale of the boat is closer to the unit cube. So that's what that script is trying to do. Note that your reconstruction will vary a bit depending on how you scale and translate your scene. In particular, stuff outside of the unit cube will get estimated much more coarsely (see section C.1 at the very end of the paper https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf ). You may need to fiddle with how you re-scale your scene to get good results.

from instant-ngp.

CaffreyR avatar CaffreyR commented on May 12, 2024 1

Ah, if you have camera-to-world transforms (RT matrices in homogenous form) you can just embed them in the json as is done in the script:

f["transform_matrix"][0:3,3]-=centroid

camera_angle_x is the key NeRF intrinsic parameter deduced from the focal length (

fovx=angle_x*180/math.pi

)... in this project, the input can contain more intrinsic parameters as shown in the script.
However, note that the script tries to re-center the the whole scene. The nerf implementation in this project uses an occupancy estimation pass centered around (0, 0, 0), so you need your scene to be centered around that value no matter what your actual camera poses are. You also likely need to re-scale your scene to try to fit into the unit cube, as is also done in the script:

f["transform_matrix"][0:3,3]*=3./avglen # scale to "nerf sized"

Let's say that you have a scene of a large boat, as is shown in one of the demo videos, and your camera poses are in meters in a coordinate frame centered a the very first capture image (dead reckoning). The boat is several meters long, and your system's origin is at one corner of the boat, so you need to rigidly translate your system so that the center of the boat is at (0,0,0) and the scale of the boat is closer to the unit cube. So that's what that script is trying to do. Note that your reconstruction will vary a bit depending on how you scale and translate your scene. In particular, stuff outside of the unit cube will get estimated much more coarsely (see section C.1 at the very end of the paper https://nvlabs.github.io/instant-ngp/assets/mueller2022instant.pdf ). You may need to fiddle with how you re-scale your scene to get good results.

Hi @pwais @mmalex , thanks for your explanation. So colmap is trying to generate the images into a unit cube. Do we have to make the object in images into a unit cube when we photo them? But what if colmap could not generate a satisfied json file, is there any methods to replace? Thanks !!!

from instant-ngp.

pwais avatar pwais commented on May 12, 2024

You need COLMAP or something similar to get transforms / camera poses. You can install COLMAP in docker pretty easily: #20 (comment)

For the transforms file, instant-ngp uses a format that's an extension of the typical NeRF "blender" format (which only has camera field of view and camera poses RTs). Checkout the colmap importer script in this repo to see how the full json info is generated: https://github.com/NVlabs/instant-ngp/blob/de507662d4b3398163e426fd426d48ff8f2895f6/scripts/colmap2nerf.py

from instant-ngp.

andrew-arkhipov avatar andrew-arkhipov commented on May 12, 2024

You need COLMAP or something similar to get transforms / camera poses. You can install COLMAP in docker pretty easily: #20 (comment)

I was under the impression that COLMAP isn't required if camera intrinsic and extrinsic parameters are known. I would assume that knowing those parameters outright would produce much superior results over estimating them with COLMAP.

As for the parameters in the json file, I was hoping to get a more intuitive/physical understanding of what they represent, not necessarily just how they were calculated.

from instant-ngp.

mmalex avatar mmalex commented on May 12, 2024

@pwais thankyou for this detailed and accurate writeup! the key really is to ensure that 1) your transforms.json is accurate compared to the training images - colmap does a great job sometimes, but sometimes fails catastrophically; and 2) you get your cameras 'nice and centered' around the unit cube of interest. I am investigating an issue where on a simple dataset, the heuristics in colmap2nerf that attempt to find a center point fail when I would expect them to work; I am not yet sure if this is a regression or if there are just cases I hadn't noticed before where they are not good enough. either way I'll update here if I can find a way to improve the automatic behaviour. In the end, the transforms that are fed to the nerf in the paper are somewhat 'out of scope' in teh sense that the results in the paper rely on good transforms in the first place; the colmap2nerf.py script is meant to do as good a job as possible, alongside colmap, as a sort of 'pipeline into the main nerf technique' - but it may sometimes fail, at which point (for now) you are somewhat on your own (and pwais' notes are great guidance).

from instant-ngp.

andrew-arkhipov avatar andrew-arkhipov commented on May 12, 2024

@mmalex Thanks for the response. Do you know if forward-facing data also performs well in comparison to 360 degree data? My toy use case is to get an accurate 3D reconstruction of a person's face, where the only images I have are of said person's face (rather than their entire head), so I'm wondering if instant-ngp is viable for that.

from instant-ngp.

pwais avatar pwais commented on May 12, 2024

@mmalex I haven't played with your script a ton but the normalization done here is pretty comparable to other NeRF projects I've used. One thing to try, for your scene, is to make sure COLMAP is using the "single camera" mode and to use exhaustive matching... if appropriate for your scene. That can sometimes help.

@andrew-arkhipov for forward-facing, you might need to use NDC (normalized device coordinates). See some discussion here in the context of original NeRF: bmild/nerf#35

from instant-ngp.

mmalex avatar mmalex commented on May 12, 2024

well this is embarassing - I just tried it on an internal dataset that I know used to perfectly 'center', based on a non-360 front-facing human dataset (however - the cameras were somewhat inward facing, ie there is still a sense of 'orbiting' the human - but only maybe 20 degrees or so). and... sure enough, it wasnt centering at all well. it turns out I flipped the sign of something at some point, and broke the centering heuristic, without noticing. (we have lived with the same transforms.json for a long time). I am very sorry about this regression creeping in! I just sneaked in a quick commit that fixes the bug - a comparison backwards bug around line 134 that... I am not sure how it sneaked in. I also took the opportunity to recompute the scale after the centering, which is more sensible IMO. anyway... it's all a pile of heurstics, but hopefully a less broken pile of heurstics now. if you can bear to try again with colmap2nerf, if you had bad centering before, you may find it better now.

i should add that a quick way to verify this is to expand in the gui the 'debug visualizations' rollup, and check 'show cameras' and 'show bounding box' - the cameras should sort of sit neatly outside the box, (not too far), facing inwards, with the scene somewhat centered and contained within the box as much as possible. (aabb_scale can then be set as small as possible given the amount of background extent). before this fix, it was doing a really poor job of centering the scene in the box. now, it should be better.

sorry for the regression! I hope it goes better now.

from instant-ngp.

mmalex avatar mmalex commented on May 12, 2024

this is an exmaple of the sort of centering you should expect from a non-360 forward-ish facing scene. Ive turned down the exposure to hide the actual nerf model as I dont have permission of the human subject to release their likeness, but hopefully this illustrates what you're looking for (after the fix). this was generated by running colmap2nerf.py with default parameters on a scene where the subject was photographed more or less front-on. note the cameras are all pointing somewhat towards the center of the box, while remaining just outside it.
image

from instant-ngp.

andrew-arkhipov avatar andrew-arkhipov commented on May 12, 2024

@mmalex Thanks so much for checking that out for me. I think we can wrap up this issue but I do have two more questions. I actually haven’t tried out the model on my face dataset yet, so I’m curious - how accurate is the model at reconstructing areas that require more precision (e.g. eyelashes, hair, nostrils, teeth, etc.)? Are you able to export the resulting reconstruction as a mesh despite it not being fully 360 degrees?

Thanks again!

from instant-ngp.

mmalex avatar mmalex commented on May 12, 2024

I think those are extremely dataset dependent. very roughly, at best our nerf (and others) resolve details down to around 1/1000th of the bounding box, to an order of magnitude, on a good day, with a fair wind. (extremely approximate). so... you can get some fairly good detail, depending on things like original image res/blur/noise, how well colmap does, how much stuff is in the background... for detail, you really want a compressed depth range (ie not too much distant stuff) and thus as small an aabb_scale as you get away with. I'm afraid you'll need to experiment. with human subjects, a limiting factor is nearly always how much they move between frames! its very hard to stay still without a muli camera rig, or a nerf that supports deformation (nerfies, hypernerf, et al). we do not yet support those sorts of things in instant-ngp.

likewise the mesh output is just based on marching cubes over the density of the nerf, so it is predicated on a good nerf fit in the first place. the mesh will always be 'lower quality' than the nerf itself, because the nerf often relies on its 'softness' and volumetric nature to give the impression of detail; the mesh basically walks over a 'hard threshold' in the density, which can give it that melted wax look that polygonal photogrammetry always has. I'd say thats more a general property of 'nerf -> marching cubes -> mesh' though. good luck!

from instant-ngp.

andrew-arkhipov avatar andrew-arkhipov commented on May 12, 2024

@mmalex Understood. Thanks!

from instant-ngp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.