Git Product home page Git Product logo

dense_depth_priors_nerf's Introduction

Dense Depth Priors for NeRF from Sparse Input Views

This repository contains the implementation of the CVPR 2022 paper: Dense Depth Priors for Neural Radiance Fields from Sparse Input Views.

Arxiv | Video | Project Page

Step 1: Train Dense Depth Priors

You can skip this step and download the depth completion model trained on ScanNet from here.

Prepare ScanNet

Extract the ScanNet dataset e.g. using SenseReader and place the files scannetv2_test.txt, scannetv2_train.txt, scannetv2_val.txt from ScanNet Benchmark into the same directory.

Precompute Sampling Locations

Run the COLMAP feature extractor on all RGB images of ScanNet. For this, the RGB files need to be isolated from the other scene data, f.ex. create a temporary directory tmp and copy each <scene>/color/<rgb_filename> to tmp/<scene>/color/<rgb_filename>. Then run:

colmap feature_extractor  --database_path scannet_sift_database.db --image_path tmp

When working with different relative paths or filenames, the database reading in scannet_dataset.py needs to be adapted accordingly.

Download pretrained ResNet

Download the pretrained ResNet from here .

Train

python3 run_depth_completion.py train --dataset_dir <path to ScanNet> --db_path <path to database> --pretrained_resnet_path <path to pretrained resnet> --ckpt_dir <path to write checkpoints>

Checkpoints are written into a subdirectory of the provided checkpoint directory. The subdirectory is named by the training start time in the format jjjjmmdd_hhmmss, which also serves as experiment name in the following.

Test

python3 run_depth_completion.py test --expname <experiment name> --dataset_dir <path to ScanNet> --db_path <path to database> --ckpt_dir <path to write checkpoints>

Step 2: Optimizing NeRF with Dense Depth Priors

Prepare scenes

You can skip the scene preparation and directly download the scenes. To prepare a scene and render sparse depth maps from COLMAP sparse reconstructions, run:

cd preprocessing
mkdir build
cd build
cmake ..
make -j
./extract_scannet_scene <path to scene> <path to ScanNet>

The scene directory must contain the following:

  • train.csv: list of training views from the ScanNet scene
  • test.csv: list of test views from the ScanNet scene
  • config.json: parameters for the scene:
    • name: name of the scene
    • max_depth: maximal depth value in the scene, larger values are invalidated
    • dist2m: scaling factor that scales the sparse reconstruction to meters
    • rgb_only: write RGB only, f.ex. to get input for COLMAP
  • colmap: directory containing 2 sparse reconstruction:
    • sparse: reconstruction run on train and test images together to determine the camera poses
    • sparse_train, reconstruction run on train images alone to determine the sparse depth maps.

Please check the provided scenes as an example. The option rgb_only is used to preprocess the RGB images before running COLMAP. This cuts dark image borders from calibration, which harm the NeRF optimization. It is essential to crop them before running COLMAP to ensure that the determined intrinsics match the cropped RGB images.

Optimize

python3 run_nerf.py train --scene_id <scene, e.g. scene0710_00> --data_dir <directory containing the scenes> --depth_prior_network_path <path to depth prior checkpoint> --ckpt_dir <path to write checkpoints>

Checkpoints are written into a subdirectory of the provided checkpoint directory. The subdirectory is named by the training start time in the format jjjjmmdd_hhmmss, which also serves as experiment name in the following.

Test

python3 run_nerf.py test --expname <experiment name> --data_dir <directory containing the scenes> --ckpt_dir <path to write checkpoints>

The test results are stored in the experiment directory. Running python3 run_nerf.py test_opt ... performs test time optimization of the latent codes before computing the test metrics.

Render Video

python3 run_nerf.py video  --expname <experiment name> --data_dir <directory containing the scenes> --ckpt_dir <path to write checkpoints>

The video is stored in the experiment directory.


Citation

If you find this repository useful, please cite:

@inproceedings{roessle2022depthpriorsnerf,
    title={Dense Depth Priors for Neural Radiance Fields from Sparse Input Views}, 
    author={Barbara Roessle and Jonathan T. Barron and Ben Mildenhall and Pratul P. Srinivasan and Matthias Nie{\ss}ner},
    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month={June},
    year={2022}

Acknowledgements

We thank nerf-pytorch and CSPN, from which this repository borrows code.

dense_depth_priors_nerf's People

Contributors

barbararoessle avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

dense_depth_priors_nerf's Issues

How are camera translations scaled?

Hi,

I've a dataset of images with given camera intrinsics and extrinsics (pose). So, I'm trying to generate the transforms_train.json file without needing to run colmap and other stuff. To do this, I'm trying to figure out how transforms_train.json is created based on colmap sparse reconstruction for the ScanNet scenes. I figured out the relation between rotation matrices, however, I'm not able to figure out how translation is scaled. I found different scaling factors for the 5 scenes. I tried to understand the C++ code that generates the transforms_train.json file, but I'm not used to C++ and hence couldn't figure it out.

Can you please tell me how to compute the scaling factor for the camera translation?

PS: I also noticed that, unlike the original NeRF which scales translation so that nearest depth becomes unity, you do not do such a scaling here.

The error I encountered while preparing my scenes using code extract_scannet_scene

Hi, thanks for your code!
I prepare my scenes using the code "extract_scannet_scene",which has two input parameters, one is "path to scene" and the other is "path to ScanNet". For parameter "path to ScanNet", I type the path of trained network downloaded in step1(named 20211027_092436.tar), but an error occurs when running the code, prompting "_! src_empty() in 'cvtcolor' ".I open "extract_scannet_scene.cpp" and find "boost:: Filesystem ::path path2scannet(argv[2])" and "boost::filesystem::path path2scannetscene(path2scannet/"scans_test"/config.kname); " in lines 161 and 163 of the "main()".I find that the trained ScanNet(20211027_092436.tar) does not contain the folder "scans_test", so I would like to know how to deal with this problem, whether I need to train a ScanNet by myself when preparing my own scenes, or there is something wrong with my method .Looking forward to your reply!

Question about depth_prior_network_path and db_path

Hello, thanks to this excellent project.
I have encountered some problems when I try to reproduce the project. Could you please tell me how to set the "--depth_prior_network_path " of run_nerf.py? (I have downloaded the pretrained depth completion model from the link provided by the author, and then unzip it to get a file folder "Archive")
What's more, I also want to know how to get the "scannet_sift_database.db" of step 1.
Thanks for your reply. (Maybe the issue seems a little bit silly lol)

fail to build extract_scannet_scene.cpp

after I run the command make install for building preprocessing, I met the error:

preprocessing/io_colmap/src/colmap_reader.cpp:84:117: error: no matching function for call to ‘rotate(glm::mat4, float, glm::vec3)’ const glm::mat4 rot_cam2cam = glm::rotate(glm::mat4(1.0f), glm::radians(180.0f), glm::vec3(1.0f, 0.0f, 0.0f)); ^ In file included from /usr/include/glm/gtc/quaternion.hpp:434:0, from dense_depth_priors_nerf-master/preprocessing/io_colmap/src/colmap_reader.cpp:6: /usr/include/glm/gtc/quaternion.inl:560:33: note: candidate: template<class T, glm::qualifier Q> glm::tquat<T, Q> glm::rotate(const glm::tquat<T, Q>&, const T&, const glm::vec<3, T, Q>&) GLM_FUNC_QUALIFIER tquat<T, Q> rotate(tquat<T, Q> const& q, T const& angle, vec<3, T, Q> const& v) ^~~~~~

I install glm by sudo apt install libglm-dev. Finally glm-0.9.9~a2-2 is installed. But it seems that it is not suitable for this program. Could you tell me how you install the glm or how you build the environment of preprocessing?

Question for the resolution of the imags

I try to reuse the depth network which u train on scan net:

image
image
image

When I input my own images, there are some error:
image
Does the model have any requirements for input? The resolution of my image is (1920, 1080)

UnpicklingError: A load persistent id instruction was encountered

Traceback (most recent call last):
File "/home/lhs/project/nerf/dense_depth_priors_nerf-master/run_nerf.py", line 1104, in
run_nerf()
File "/home/lhs/project/nerf/dense_depth_priors_nerf-master/run_nerf.py", line 1070, in run_nerf
train_nerf(images, depths, valid_depths, poses, intrinsics, i_split, args, scene_sample_params, lpips_alex, gt_depths, gt_valid_depths)
File "/home/lhs/project/nerf/dense_depth_priors_nerf-master/run_nerf.py", line 788, in train_nerf
depths, valid_depths = complete_and_check_depth(images, depths, valid_depths, i_train, gt_depths_train, gt_valid_depths_train,
File "/home/lhs/project/nerf/dense_depth_priors_nerf-master/run_nerf.py", line 726, in complete_and_check_depth
depths[i_train], valid_depths[i_train] = complete_depth(images[i_train], depths[i_train], valid_depths[i_train],
File "/home/lhs/project/nerf/dense_depth_priors_nerf-master/run_nerf.py", line 678, in complete_depth
ckpt = torch.load(model_path)
File "/home/lhs/.conda/envs/liao/lib/python3.10/site-packages/torch/serialization.py", line 713, in load
return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
File "/home/lhs/.conda/envs/liao/lib/python3.10/site-packages/torch/serialization.py", line 920, in _legacy_load
magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.

Hello, when reproducing, I first downloaded the depth_priors network model, then downloaded the scenes listed in the readme, and ran python3 run_nerf.py train with the above data --scene_id <scene, e.g. scene0710_00> --data_dir < directory containing the scenes> --depth_prior_network_path --ckpt_dir But the above problem occurred. Is there a problem with the version of some files?

The torch version of the completion depth network

Since I use pytorch 1.13, when I load the completion depth network, I get the error that the pkl's version is not 1.13. So do I know the correct version of the completion depth network(I didn't find the torch version in the readme)?

Error in compute_depth_loss

Hi, thanks for your nice work. However, when I run the code I got the error in compute_depth_loss:
image
Traceback (most recent call last): File "/public/home/luanzl/.pycharm_helpers/pydev/pydevd.py", line 2195, in <module> main() File "/public/home/luanzl/.pycharm_helpers/pydev/pydevd.py", line 2177, in main globals = debugger.run(setup['file'], None, None, is_module) File "/public/home/luanzl/.pycharm_helpers/pydev/pydevd.py", line 1489, in run return self._exec(is_module, entry_point_fn, module_name, file, globals, locals) File "/public/home/luanzl/.pycharm_helpers/pydev/pydevd.py", line 1496, in _exec pydev_imports.execfile(file, globals, locals) # execute the script File "/public/home/luanzl/.pycharm_helpers/pydev/_pydev_imps/_pydev_execfile.py", line 18, in execfile exec(compile(contents+"\n", file, 'exec'), glob, loc) File "/public/home/luanzl/WorkSpace/dense_depth_priors_nerf/run_nerf.py", line 1137, in <module> run_nerf() File "/public/home/luanzl/WorkSpace/dense_depth_priors_nerf/run_nerf.py", line 1100, in run_nerf train_nerf(images, depths, valid_depths, poses, intrinsics, i_split, args, scene_sample_params, lpips_alex, gt_depths, gt_valid_depths) File "/public/home/luanzl/WorkSpace/dense_depth_priors_nerf/run_nerf.py", line 859, in train_nerf depth_loss = compute_depth_loss(extras['depth_map'], extras['z_vals'], extras['weights'], target_d, target_vd) File "/public/home/luanzl/WorkSpace/dense_depth_priors_nerf/model/run_nerf_helpers.py", line 44, in compute_depth_loss return float(pred_mean.shape[0]) / float(target_valid_depth.shape[0]) * f(pred_mean, target_mean, pred_var) File "/public/home/luanzl/anaconda3/envs/dense_depth_priors_nerf/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl return forward_call(*input, **kwargs) File "/public/home/luanzl/anaconda3/envs/dense_depth_priors_nerf/lib/python3.8/site-packages/torch/nn/modules/loss.py", line 372, in forward return F.gaussian_nll_loss(input, target, var, full=self.full, eps=self.eps, reduction=self.reduction) File "/public/home/luanzl/anaconda3/envs/dense_depth_priors_nerf/lib/python3.8/site-packages/torch/nn/functional.py", line 2804, in gaussian_nll_loss raise ValueError("var has negative entry/entries") ValueError: var has negative entry/entries

It seems that I got a negative value during the loss calculation. May I know how to tackle this issue? Hope for your reply, thanks!

Could you provide more complete preprocessing scripts?

When I use the preprocessing code and some code I added because the code seems not complete, I found the poses reconstructed were wrong and strange. I don't know what error is in the preprocessing code and the code wrote. So could you provide a more complete preprocessing code?

How to get the rendered model?

Hi! I have run this project successfully. The result is really good!

I found that this project provides the method of render video visualization. Does it support the output of the 3D model, like the ".ply" or ".pcd" format? I want to access the 3D model for further processing.

Thank you!

Hi,I can’t figure out how to load "the depth completion model trained on ScanNet",or more specific,the "data.pkl"

I use torch.load("the path of pretrained model")
but it says ”_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.“

my pytorch'version is 1.8.1
I want to konw your pytorch'version on save "data.pkl",

Sorry to trouble you,this bug has bothered me for a long time.
I hope you can solve my confusion.
Thanks!!!

About the data after processing scannet with colmap

Hi,

Thanks for your nice work! I have a question regarding the depth/poses obtained in the transform.json files. Is the ground-truth depth consistent with the camera to world pose given? i.e. do I need to scale them or can I directly use them to obtain correspondences for example?

Thanks a lot!

can't run extract_scannet_scene

Unable to read the picture, keep reporting “what(): OpenCV(3.4.6) /home/mvs18/Downloads/opencv/opencv-3.4.6/modules/imgproc/src/color.cpp:182: error: (-215:Assertion failed) !_src.empty() in function 'cvtColor'”
May I ask how is this going?

About depth completion?

This article uses depth completion instead of other depth-predicting networks (such as MVS), is there any difference? Is our deep completion network better?

Question with regards to the target depth map

Hello...Can you please tell me what kind of depth is being used in the target depth map? As far as I understood from the paper, for ScanNet dataset, the RGBD images are being acquired using Structured sensor, isn't it?

Depth loss is negative

Hello and thank you for your great work! I noticed that the paper is using GaussianNLLLoss to satisfy the true sampling distribution as much as possible, as this formula shows:
image
But I find that this loss is always negative, is this as expected?

Also I would like to ask, have you tried to do MSE loss directly on the rendered depth map with the GT depth?

Sorry to bother you and look forward you for your reply!

Colmap pose

Hi, thanks for sharing the code!
I want to ask for detail about generating poses. I fail to obtain poses for some images by running colmap on each scene with train and test images together. Is that because the feature extractor should be run on all ScanNet images? If this is the case, can you share the scannet_sift_database.db? Otherwise, can you provide the code for generating camera poses? Many thanks!

Confused about the coordinate system and COLMAP sparse reconstruction

Hi, thanks for your work. I want to use Scannet ground truth Camera pose to compute the depth map. However, I was confused by the coordinate system and Camera model.

  1. I found that SCANNET provides Camera to World Matrix as camera pose. Do i need to transfer it to W2C so it can be used by COLMAP?
  2. Also I don't see any code that transfer COLMAP coordinate to OPENGL coordinate, as original NeRF dose. Do you take COLMAP coordinate as model input?
  3. The camera.txt in sparse dictionary have a WIDTH, HEIGHT of 624*468. However, the SCANNET original image size is 1296 * 968. Does this mean that we need to run COLMAP on resized images? I thought the resized work is done by extract_scannet_scene.cpp but we need COLMAP Sparse reconstruction result to run extract_scannet_scene. That is confusing.
    Could you help me solve these problems? Thanks a lot, Sorry I am new to NeRF and COLMAP.

Error while making the preprocessing/build .

Hey I keep encountering the following error: //usr/lib/x86_64-linux-gnu/libboost_system.so.1.65.1: error adding symbols: DSO missing from command line when I run make -j in preprocessing/build.
Any idea how do I go about this?
Thanks.

Video generation with less than 3 images

My goal is to create a video with a limited number of images, specifically less than three. After researching, I found that using dense depth priors with nerf technology can be effective for this purpose. However, I am uncertain whether this approach can be applied to my specific problem of working with only one to three images.

cannot load the depth completion model trained on ScanNet

Hello,

I'm trying to optimize NeRF with dense depth priors using the depth completion network trained on ScanNet provided in the corresponding readme section and I get the following error:


Traceback (most recent call last):
  File "run_nerf.py", line 1104, in <module>
    run_nerf()
  File "run_nerf.py", line 1070, in run_nerf
    train_nerf(images, depths, valid_depths, poses, intrinsics, i_split, args, scene_sample_params, lpips_alex, gt_depths, gt_valid_depths)
  File "run_nerf.py", line 789, in train_nerf
    scene_sample_params, args)
  File "run_nerf.py", line 728, in complete_and_check_depth
    invalidate_large_std_threshold=args.invalidate_large_std_threshold)
  File "run_nerf.py", line 678, in complete_depth
    ckpt = torch.load(model_path)
  File "/miniconda/envs/env/lib/python3.7/site-packages/torch/serialization.py", line 713, in load
    return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args)
  File "/miniconda/envs/env/lib/python3.7/site-packages/torch/serialization.py", line 920, in _legacy_load
    magic_number = pickle_module.load(f, **pickle_load_args)
_pickle.UnpicklingError: A load persistent id instruction was encountered,
but no persistent_load function was specified.

I've tried the following pytorch versions: 1.11.0, 1.10.0, 1.9.1, and 1.9.0.
Could you please confirm that this is the correct link for the depth completion network weights?

Thank you in advance for your help.

About the depth-guided ray sampling

return forward_with_additonal_samples(z_vals, raw, z_vals_2, rays_o, rays_d, viewdirs, embedded_cam, network_fn, network_query_fn, raw_noise_std, pytest)

Why is network_fine not used here (network_fn is used) after the depth-guided sampling for final rendering? The network_fine isn't used totally in the rendering process.

What scenes are used for the training of pre-trained depth_prior_network

Hi,Thanks for your work, i want to know the scene list you have used to train depth_prior_network because i need to split the train and test dataset. Did you use all scannet scenes for depth prior training and only spare a few images from each scene for Few shot NeRF optimization?
Thanks!

On the standard deviation of depth map

std = (((z_vals - depth.unsqueeze(-1)).pow(2) * weights).sum(-1)).sqrt()
I don't quite understand the concept of standard deviation on your side. Can you expand it?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.