postech-cvlab / scnerf Goto Github PK

View Code? Open in Web Editor NEW

462.0 16.0 45.0 22.34 MB

[ICCV21] Self-Calibrating Neural Radiance Fields

License: MIT License

Python 93.33% Shell 4.30% C++ 1.11% C 0.17% Cuda 1.09%

nerf implicit-representions calibration computer-vision deep-learning self-calibration pytorch

scnerf's Introduction

Self-Calibrating Neural Radiance Fields, ICCV, 2021

Project Page | Paper | Video

Author Information

News

2021-09-02: The first version of Self-Calibrating Neural Radiance Fields is published

Overview

In this work, we propose a camera self-calibration algorithm for generic cameras with arbitrary non-linear distortions. We jointly learn the geometry of the scene and the accurate camera parameters without any calibration objects. Our camera model consists a pinhole model, radial distortion, and a generic noise model that can learn arbitrary non-linear camera distortions. While traditional self-calibration algorithms mostly rely on geometric constraints, we additionally incorporate photometric consistency. This requires learning the geometry of the scene and we use Neural Radiance Fields (NeRF). We also propose a new geometric loss function, viz., projected ray distance loss, to incorporate geometric consistency for complex non-linear camera models. We validate our approach on standard real image datasets and demonstrate our model can learn the camera intrinsics and extrinsics (pose) from scratch without COLMAP initialization. Also, we show that learning accurate camera models in differentiable manner allows us to improves PSNR over NeRF. We experimentally demonstrate that our proposed method is applicable to variants of NeRF. In addition, we use a set of images captured with a fish-eye lens to demonstrate that learning camera model jointly improves the performance significantly over the COLMAP initialization.

Method

Generic Camera Model

We provide the definition of our differentiable camera model that combines the pinhole camera model, radial distortion, and a generic non-linear camera distortion for self-calibration. Our differentiable generic camera model consists of four components: intrinsic, extrinsic, radial distortion, and non-linear distortion parameters. We show that modeling the rays more accurately (camera model) results in better neural rendering. The following figure shows the computational steps to generate rays of our proposed learnable generic camera model.

Projected Ray Distance

The generic camera model poses a new challenge defining a geometric loss. In most traditional work, the geometric loss is defined as an epipolar constraint that measures the distance between an epipolar line and the corresponding point, or reprojection error where a 3D point for a correspondence is defined first which is then projected to an image plane to measure the distance between the projection and the correspondence. In this work, rather than requiring a 3D reconstruction to compute an indirect loss like the reprojection error, we propose the projected ray distance loss that directly measures the discrepancy between rays using our generic camera model.

Curriculum Learning

The camera parameters determine the positions and directions of the rays for NeRF learning, and unstable values often result in divergence or sub-optimal results. Thus, we incrementally add a subset of learning parameters to the optimization process to reduce the complexity of learning cameras and geometry jointly. First, we learn the NeRF network while initializing the camera focal lengths and camera centers to half the image width and height. Learning coarse geometry first is crucial since it initializes the network parameters suitable for learning better camera parameters. Next, we sequentially add camera parameters from the linear camera model, radial distortion, to nonlinear noise of ray direction, ray origin to the learning. We progressively make the camera model more complex to prevent the camera parameters from overfitting and also allows faster training.

Installation

Requirements

Ubuntu 16.04 or higher
CUDA 11.1 or higher
Python v3.7 or higher
Pytorch v1.7 or higher
Hardware Spec
- GPUs 11GB (2080ti) or larger capacity
- For NeRF++, 2GPUs(2080ti) are required to reproduce the result
- For FishEyeNeRF experiments, we have used 4GPUs(V100).

Environment Setup

We recommend to conda for installation. All the requirements for two codes, NeRF and NeRF++, are included in the requirements.txt
```
conda create -n icn python=3.8
conda activate icn
pip install -r requirements.txt
git submodule update --init --recursive
```

Generating COLMAP poses of custom image set

We further provide a COLMAP pose generator that can be applied to custom image sets. Run the code below if you should acquire camera information for custome image sets.

bash colmap_utils/colmap.sh [path to image set]

The image collection should be inside the directory "images." Checkout the COLMAP document

[path]
    |--- images

Pretrained Weights & Qualitative Results

Here, we provide pretrained weights for users to easily reproduce results in the paper. You can download the pretrained weight in the following link. In the link, we provide all the weights of experiments, reported in our paper. To load the pretrained weight, add the following argument at the end of argument in each script. In the zip file, we have also included qualitative results that are used in our paper.

Link to download the pretrained weight: [link]

Datasets

We use three datasets for evaluation: LLFF dataset, tanks and temples dataset, and FishEyeNeRF dataset (Images captured with a fish-eye lens).

LLFF dataset: [link]
Tanks and Temples dataset: [link]
FishEyeNeRF: [link]

Put the data in the directory "data/" then add soft link with one of the following:

ln -s data NeRF/data
ln -s data nerfplusplus/data
ln -s data nerfplusplus/data/fisheyenerf

Demo Code

The demo code is available at "demo.sh" file. This code runs curriculum learning in NeRF architecture. Please install the aforementioned requirements before running the code. To run the demo code, run:

sh demo.sh

If you want to reproduce the results that are reported in our main paper, run the scripts in the "scripts" directory.

Main Table 1: Self-Calibration Experiment (LLFF)
Main Table 2: Improvement over NeRF (LLFF)
Main Table 3: Improvement over NeRF++ (Tanks and Temples)
Main Table 4: Improvement over NeRF++ (Images with a fish-eye lens)

Code Example:

sh scripts/main_table_1/fern/main1_fern_ours.sh
sh scripts/main_table_2/fern/main2_fern_ours.sh
sh scripts/main_table_3/main3_m60.sh
sh scripts/main_table_4/globe_ours.sh

Citing Self-Calibrating Neural Radiance Fields

@inproceedings{SCNeRF2021,
    author = {Yoonwoo Jeong, Seokjun Ahn, Christopher Choy, Animashree Anandkumar, 
    Minsu Cho, and Jaesik Park},
    title = {Self-Calibrating Neural Radiance Fields},
    booktitle = {ICCV},
    year = {2021},
}

Concurrent Work

We list a few recent concurrent projects that tackle camera extrinsics (pose) optimization in NeRF. Note that our Self-Calibrating NeRF optimizes an extensive set of camera parameters for intrinsics, extrinsics, radial distortion, and non-linear distortion.

Acknowledgements

We appreciate all ICCV reviewers for valuable comments. Their valuable suggestions have helped us to improve our paper. We also acknowledge amazing implementations of NeRF++(https://github.com/Kai-46/nerfplusplus) and NeRF-pytorch(https://github.com/yenchenlin/nerf-pytorch).

scnerf's People

Contributors

Stargazers

Watchers

scnerf's Issues

Does `--dataset_type custom` work for 360 scenes with photos only ?

Two questions, if you don't mind:

Does the code in "custom" branch with --dataset_type custom work for 360 degree scenes like the "tanks_and_temples" images ? Or is it only for forward facing scenes like LLFF fern dataset?
If it does work for 360 scenes, can you confirm that it doesn't need any COLMAP camera parameters, initialization, etc. ?

How to use pretrained weights for inference on custom datasets?

All the scripts mentioned are training the NeRF on a dataset and evaluating them.
Is it possible to run the pretrained model on any set of images captured by a camera?
Or is the pipeline such that the end-end training has to happen for every set of camera images?

Problems running colmap_utils script

Hi,

Thanks for the repo. I was trying to run SCNeRF with only images, but after looking at the code and issues related, it seems like I need to run colmap_utils script nontheless. But there are several errors trying to run the script:

File "/home/SCNeRF/colmap_utils/read_sparse_model.py", line 378, in main
    depth_ext = os.listdir(os.path.join(args.working_dir, "depth"))[0][-4:]

and

File "/home/SCNeRF/colmap_utils/post_colmap.py", line 33, in load_colmap_data
    with open(os.path.join(realdir, "train.txt"), "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: '/data/TUM_desk_rgb/train.txt'

I checked the code, I think the error happens because there is no depth output from colmap directly, and no idea what is train.txt. Could you double check the provided script work for a pure rgb dataset all the way through?

And if possible, it would be very helpful if you could provide a more detailed guide on how to run with only image inputs.

Thanks in advance!

Scripts for paper's results

Thanks for you wonderful code first,

I'd like to reproduce result on your paper at supplementary Table 3.

But flower dataset with demo.sh goes little bit different way.
Could I get some advice for reproduce setup?

this is wandb link in my environment.

Thanks

IndexError: index 0 is out of bounds for axis 0 with size 0

Training Done
Starts Train Rendering
0%| | 0/17 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/home/user/linzejun01/linzejun_mutiply_view01/SCNeRF-master/NeRF/run_nerf.py", line 1047, in
train()
File "/home/user/linzejun01/linzejun_mutiply_view01/SCNeRF-master/NeRF/run_nerf.py", line 973, in train
rgbs, disps = render_path(
File "/home/user/linzejun01/linzejun_mutiply_view01/SCNeRF-master/NeRF/render.py", line 157, in render_path
rgb, disp, acc, _ = render(
File "/home/user/linzejun01/linzejun_mutiply_view01/SCNeRF-master/NeRF/render.py", line 44, in render
idx_in_camera_param=np.where(i_map==image_idx)[0][0]
IndexError: index 0 is out of bounds for axis 0 with size 0

Question about downsampling factor

Hi,
Thanks for sharing your work!

I am confused about the downsampling factor in the LLFF dataset. In your config file, downsample factor is set to 8. However, in the original nerf repo, downsample factor is set to 4 according to this. I am curious that why do not follow the original setting. This may be unfair for comparision.

proj_ray_dist_threshold

Hello. Thank you for great paper and code.

I have one small question.

The threshold value of projected ray distance loss is set to 5, is there a reason why you chose this value?

Also, is this threshold value used even when the camera parameters are initialized with identity matrix and zero vector? (Self-Calibration experiments)
When the camera parameter is coarse or has a bad value, I think the proj_ray_dist loss will be much larger than 5, but wasn't it?
I wonder if this threshold works.

Thank you!

Possible errors on your setup

First of all thank you very much for uploading your code

I think you have a typo on the requirement.txt file, should not it be named "requirements.txt"?

On the other hand, for running the demo.sh at least, the softlink should be created to ./data/, and not to data/nerf_llff_data, otherwise an error will be raised

Thanks,

How dose "multiplicative_noise" infrence results?

Thank for your great works! I want to know how "multiplicative_noise" infrences results. Will results be better if I use add it for training? I find that you set it to be "True" in all experiments. Thanks in advance!
Reference codes are here: https://github.com/POSTECH-CVLab/SCNeRF/blob/master/model/camera_model.py#L166

How to get more detail about the camera pose?

Hello author,

Thank you for sharing your code. I want to capture the corresponding camera pose by taking images of a circle. For example, I take a picture every 10 degrees. After I train the network, I find the results in logs are some images and some .tar file. Can I get some information about the camera pose like a 4x4 matrix?

Looking forward to your reply.

Implementing SCNeRF on custom dataset

Hi @jeongyw12382 ,

I have a set of images. However, I am aware of the FOV and θ, Φ 3D angles for each image.
Would it be possible for me train the NeRF model without COLMAP?
Unfortunately, colmap doesn't work well on my dataset. I get an error saying:
ERROR: the correct camera poses for current points cannot be accessed

How to run experiment with only photos?

Hi Mr,

I would like to run an experiment with your model using a list of pictures from an object to get the estimated camera poses of each picture. How can I mount that experiment?

Thanks in advance,

Parameter setting different with paper

In scripts/main_table_2/fern/main2_fern_ours.sh, last line is:

--ft_path logs/main1_fern_nerf/200000.tar

which means using main1_fern_nerf to init model.
but this 200000 iter in table1 nerf setting is trained with --run_without_colmap both,
and in paper the Table2 result is initialized by COLMAP camera information. So the first 200000 iter should be trained with --run_without_colmap none, instead of --run_without_colmap both,

According to the description above, there will be conflicts.
And I think maybe it should be

--ft_path logs/main2_fern_nerf/200000.tar

have bug with the custom image set

First, i generating COLMAP poses of custom image set. According to my own dataset, i modify Demo.sh. But, it run the following problems.

Error trying the script on custom image set

I have two streams taken from cameras aligned vertically that I have no information about. The streams look like the following:

Camera 1:

Camera 2:

where the plant rotates, so it is the only moving object in the scene.

I wanted to give your method a try to obtain the intrinsics of these cameras (so that I can 3d reconstruct plant), and therefore turned the video streams into images (.png) and executed the following on the camera1's images:

bash colmap_utils/colmap.sh ./images/

However, I received an output that I could not make any sense of:

colmap_utils/colmap.sh: line 7: colmap: command not found
colmap_utils/colmap.sh: line 13: colmap: command not found
colmap_utils/colmap.sh: line 19: colmap: command not found
Traceback (most recent call last):
  File "/home/bla/anaconda3/envs/icn/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/bla/anaconda3/envs/icn/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/bla/Desktop/SCNeRF/colmap_utils/read_sparse_model.py", line 417, in <module>
    main()
  File "/home/bla/Desktop/SCNeRF/colmap_utils/read_sparse_model.py", line 369, in main
    cameras, images, points3D = read_model(path=model_path, ext=".bin")
  File "/home/bla/Desktop/SCNeRF/colmap_utils/read_sparse_model.py", line 305, in read_model
    cameras = read_cameras_binary(os.path.join(path, "cameras" + ext))
  File "/home/bla/Desktop/SCNeRF/colmap_utils/read_sparse_model.py", line 120, in read_cameras_binary
    with open(path_to_model_file, "rb") as fid:
FileNotFoundError: [Errno 2] No such file or directory: './images/sparse/0/cameras.bin'
Post-colmap
Traceback (most recent call last):
  File "/home/bla/anaconda3/envs/icn/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/bla/anaconda3/envs/icn/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/bla/Desktop/SCNeRF/colmap_utils/post_colmap.py", line 266, in <module>
    gen_poses(args.working_dir)
  File "/home/bla/Desktop/SCNeRF/colmap_utils/post_colmap.py", line 247, in gen_poses
    poses, pts3d, perm = load_colmap_data(basedir)
  File "/home/bla/Desktop/SCNeRF/colmap_utils/post_colmap.py", line 13, in load_colmap_data
    camdata = read_model.read_cameras_binary(camerasfile)
  File "/home/bla/Desktop/SCNeRF/colmap_utils/read_sparse_model.py", line 120, in read_cameras_binary
    with open(path_to_model_file, "rb") as fid:
FileNotFoundError: [Errno 2] No such file or directory: './images/sparse/0/cameras.bin'

What am I doing wrong?

Question about the equations in the paper

Q1

Is it true that each element of n is divided by c ? not f?

Also, what is the meaning of p' value?
undistorted pixel?

Q2

In this equation, I'm not sure why $z_d$ is multiplied twice.

Q3

In this equation, why divide it by $r_{A,d} \cdot r_{A,d}$ instead of $||r_{A,d}||$ ?

Q4

In this equation, round L in the last term should be round r?

about main table 1

Thank you for sharing your code.
I'm trying to reproduce the results in the main 1 table.
Now I fully trained NeRF results (not 'ours' results) and all of the values are showing slightly worse than the values in the table.
Following is the Test Set Results / Train Set Result / Result in the paper.

test		psnr	ssim	lpips	prd
nerf	flower	13.628	0.2909	0.7835	nan
nerf	fortress	15.618	0.4311	0.6794	nan
nerf	leaves	12.734	0.1451	0.7938	nan
nerf	trex	12.419	0.3743	0.6729	nan

train		psnr	ssim	lpips	prd
nerf	flower	13.062	0.2887	0.8028	nan
nerf	fortress	13.539	0.3868	0.7249	nan
nerf	leaves	12.38599	0.143	0.819662	nan
nerf	trex	12.58406	0.425573	0.692024	nan

paper		psnr	ssim	lpips	prd
nerf	flower	13.8	0.302	0.716	nan
nerf	fortress	16.3	0.524	0.445	nan
nerf	leaves	13.01	0.18	0.687	nan
nerf	trex	15.7	0.409	0.575	nan

Can I get a clue?
Also, I wonder which dataset is used for the table among train/val/test set

What tool did you use to draw the architecture diagram?

Hi @joonahn,

This is amazing work! If you don't mind, may I ask you what tool have you used to draw this illustration?

Demo error

Hello,
I really appreciate your great work!

I tried to run the demo.sh as specified (just changed the number of iterations) but it gave me the following error.

Any advice could be appreciated.

ERROR Error while calling W&B API: project not found (<Response [404]>)

Loaded SuperPoint model
Loaded SuperGlue model ("outdoor" weights)
wandb: (1) Create a W&B account
wandb: (2) Use an existing W&B account
wandb: (3) Don't visualize my results
wandb: Enter your choice: 2
wandb: You chose 'Use an existing W&B account'
wandb: You can find your API key in your browser here: https://wandb.ai/authorize
wandb: Paste an API key from your profile and hit enter:
wandb: Appending key for api.wandb.ai to your netrc file: /root/.netrc
wandb: ERROR Error while calling W&B API: project not found (<Response [404]>)
Thread SenderThread:
Traceback (most recent call last):
File "/root/anaconda3/envs/icn/lib/python3.8/site-packages/wandb/sdk/lib/retry.py", line 102, in call
result = self._call_fn(*args, **kwargs)
File "/root/anaconda3/envs/icn/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 138, in execute
six.reraise(*sys.exc_info())
File "/root/anaconda3/envs/icn/lib/python3.8/site-packages/six.py", line 719, in reraise
raise value
File "/root/anaconda3/envs/icn/lib/python3.8/site-packages/wandb/sdk/internal/internal_api.py", line 132, in execute
return self.client.execute(*args, **kwargs)
File "/root/anaconda3/envs/icn/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 52, in execute
result = self._get_result(document, *args, **kwargs)
File "/root/anaconda3/envs/icn/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/client.py", line 60, in _get_result
return self.transport.execute(document, *args, **kwargs)
File "/root/anaconda3/envs/icn/lib/python3.8/site-packages/wandb/vendor/gql-0.2.0/gql/transport/requests.py", line 39, in execute
request.raise_for_status()
File "/root/anaconda3/envs/icn/lib/python3.8/site-packages/requests/models.py", line 953, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://api.wandb.ai/graphql

susperious data leakage ?

Hi, I found that you used SuperGlue and SuperPoint as feature extractor and matcher, as far as I know, these two algorithms are trained supervisedly, Is there any suspicion of data leakage here? This approach may affect the fairness of your experiment, because the Colmap-based pose information is not data-driven, and your method somehow references external extra data unless your experimental data is entirely based on SIFT and Bfmatcher.

Ablation Study on Tank and Temple datasets

Hi，
thanks for your great work! i have some questions about applying this work on some large scale datasets.
1.In your paper you did ablation study on LLFF datasets about IE OD PRD, did you do the ablation study on Tank and Temple dataset ?
2.Is it feasible to apply this work on some large scale datasets without initial poses by colmap?
3.In BARF: Bundle-Adjusting Neural Radiance Fields paper, they mention that it's hard to optimize nerf and poses because of the position encoding, in your experienments, do you think it is necessary to change the position encoding function as BARF said ?
@chrischoy @minsucho @soskek @joonahn
Looking forward to your reply!

estimated parameters used in camera undistortion

Hi, thank for your code and works!
I want to know how the estimated parameters "ray_o_noise" and "ray_d_noise" can be used in camera undistortion?

Reproducing the results (Table 4)

Hello, thanks for your great work.

I am interested in training the FishEyeNeRF dataset with the NeRF++ model.
Specifically, I would like to reproduce the NeRF++[RD] presented in Table 4 of the paper using the code you provided.

Would this be possible?