Git Product home page Git Product logo

mipnerf's Introduction

mip-NeRF

This repository contains the code release for Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields. This implementation is written in JAX, and is a fork of Google's JaxNeRF implementation. Contact Jon Barron if you encounter any issues.

rays

Abstract

The rendering procedure used by neural radiance fields (NeRF) samples a scene with a single ray per pixel and may therefore produce renderings that are excessively blurred or aliased when training or testing images observe scene content at different resolutions. The straightforward solution of supersampling by rendering with multiple rays per pixel is impractical for NeRF, because rendering each ray requires querying a multilayer perceptron hundreds of times. Our solution, which we call "mip-NeRF" (à la "mipmap"), extends NeRF to represent the scene at a continuously-valued scale. By efficiently rendering anti-aliased conical frustums instead of rays, mip-NeRF reduces objectionable aliasing artifacts and significantly improves NeRF's ability to represent fine details, while also being 7% faster than NeRF and half the size. Compared to NeRF, mip-NeRF reduces average error rates by 17% on the dataset presented with NeRF and by 60% on a challenging multiscale variant of that dataset that we present. mip-NeRF is also able to match the accuracy of a brute-force supersampled NeRF on our multiscale dataset while being 22x faster.

Installation

We recommend using Anaconda to set up the environment. Run the following commands:

# Clone the repo
git clone https://github.com/google/mipnerf.git; cd mipnerf
# Create a conda environment, note you can use python 3.6-3.8 as
# one of the dependencies (TensorFlow) hasn't supported python 3.9 yet.
conda create --name mipnerf python=3.6.13; conda activate mipnerf
# Prepare pip
conda install pip; pip install --upgrade pip
# Install requirements
pip install -r requirements.txt

[Optional] Install GPU and TPU support for Jax

# Remember to change cuda101 to your CUDA version, e.g. cuda110 for CUDA 11.0.
pip install --upgrade jax jaxlib==0.1.65+cuda101 -f https://storage.googleapis.com/jax-releases/jax_releases.html

Data

Then, you'll need to download the datasets from the NeRF official Google Drive. Please download and unzip nerf_synthetic.zip and nerf_llff_data.zip.

Generate multiscale dataset

You can generate the multiscale dataset used in the paper by running the following command,

python scripts/convert_blender_data.py --blenderdir /nerf_synthetic --outdir /multiscale

Running

Example scripts for training mip-NeRF on individual scenes from the three datasets used in the paper can be found in scripts/. You'll need to change the paths to point to wherever the datasets are located. Gin configuration files for our model and some ablations can be found in configs/. An example script for evaluating on the test set of each scene can be found in scripts/, after which you can use scripts/summarize.ipynb to produce error metrics across all scenes in the same format as was used in tables in the paper.

OOM errors

You may need to reduce the batch size to avoid out of memory errors. For example the model can be run on a NVIDIA 3080 (10Gb) using the following flag.

--gin_param="Config.batch_size = 1024"

Citation

If you use this software package, please cite our paper:

@misc{barron2021mipnerf,
      title={Mip-NeRF: A Multiscale Representation for Anti-Aliasing Neural Radiance Fields},
      author={Jonathan T. Barron and Ben Mildenhall and Matthew Tancik and Peter Hedman and Ricardo Martin-Brualla and Pratul P. Srinivasan},
      year={2021},
      eprint={2103.13415},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

Thanks to Boyang Deng for JaxNeRF.

mipnerf's People

Contributors

jonbarron avatar tancik avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mipnerf's Issues

Need some suggestions for reimplementation in pytorch

Hi, thanks for this remarkable work.

I'm trying to reimplement mipnerf in pytorch on LLFF. Currently, I've finished the pipeline identical to this repo. But the training converged to a lower psnr on the same validation images, e.g. 22.5 in my implementation vs 24.9 in this repo for 20w iterations . Is this phenomenon possibly caused by the natural of the optimizers in pytorch? Or maybe I missed some points in mipnerf implementation? Would you please provide some suggestions?

Thank you!

Why positional encoding for viewing direction?

Positional encoding (PE) or IPE is required to allow NeRF/MipNeRF learn high frequencies. I understand that volume density and color as a function of 3d point can have discontinuities, especially at object boundaries. However, for a given 3d position, the color as a function of viewing direction is a smooth function, right? I can't imagine a scenario where it may have high frequencies. So, why do we need PE for viewing direction?

@jonbarron, I've pondered over this question for quite some time now, but didn't understand. Any intuition will be helpful.

origins and directions issues

hi,thanks for your exciting work.I have one question.
In the code,the initialization values of directions are (x,y,1),so i think the values of origins should be (x,y,0).So,the values of directions are Xdirections=R*Xc+t,the values of origins are Xdirections-R[0:3,2].

Fails at high resolution on LLFF dataset

Hi, I appreciate your excellent work. I found that when training mip-nerf on LLFF scene (fern specifically), it reconstructs well on original (252x189) and lower resolution.
color_000

However, when I try to render higher resolution images (e.g. 512x378 here), they contain a lot of noise that vanilla nerf doesn't contain. What might be the possible reason?
color_001

Extracting isosurfaces of Mip-NeRF

Hello!

Thanks for sharing this awesome work! :)

I'm curious, have you thought about the correct way of extracting the isosurfaces from the trained implicit function?

For vanilla NeRF model it is possible to extract the level surface at two scales using separately trained coarse and fine networks. Here, as far as I understand, it is possible to extract the level surface at an arbitrary scale, and for that I could just query the network with a positional encoding obtained for a desired point x and some manually selected variance, which determines the scale.

Does this approach makes sense to you, or are there some reasons why it could fail?

IPE explained

Hello.
First of all, thank you for sharing this great work.

I would like to kindly ask you to elaborate more on how did you derive the formulas for IPE.

Why do you concatenate it this way?
image

How did you get the formula for the y_var?
image

I cannot quiet get it from the article.
Thank you :)

Question about RGB Activation Padding

Hi, this is super interesting work that seems to solve a key problem with NeRF.
One thing I noticed when reading through your paper was that you used a modified RGB activation function.
I tried using this padded sigmoid with normal NeRF, and I noticed that it tends to cause background pixels to have non-zero density because they can saturate all the way to black or white, and was wondering if you encountered the same thing? I was looking at the acc image returned from volumetric integration of the weights. I'm not sure if it's significant, but I was wondering whether you compared normal sigmoid to the widened one?

I was also wondering if you experimented with shrinking the range of sigmoid? I tried lowering the range of sigmoid, and that seems to produce much cleaner accs, at the cost of less RGB range, but to a negligible extent.

Thanks!

keras version match tensorflow

Hi, when I am trying to reproduce the result, I installed on cuda10.1 py3.6 tensorflow2.3.1, keras2.7.0

Then I met the issue when I import tensorflow, and from keras import optimizers,

"another metric with the same name already exists".

I would like to know what's the version you use during your experiments?

Thanks in advance!

Use of sensor size in focal length calculation

Hi,

Thanks for making the mip-NeRF codebase public. For the focal length calculation, why isn't the actual sensor size used (example 36mm for the Blender dataset) here? I believe the formula for calculating focal length from FOV uses the true sensor size.

self.focal = .5 * self.w / np.tan(.5 * camera_angle_x)

calculate the variance in 3D space?

Hi, this is a very interesting work that solves a key problem in NeRF. I don’t quite understand your code for calculating variance. Can you explain why it is calculated like this?

d_outer = d[..., :, None] * d[..., None, :]
eye = jnp.eye(d.shape[-1])
null_outer = eye - d[..., :, None] * (d / d_mag_sq)[..., None, :]
t_cov = t_var[..., None, None] * d_outer[..., None, :, :]
xy_cov = r_var[..., None, None] * null_outer[..., None, :, :]
cov = t_cov + xy_cov

Is it safe to use unnormalized ray directions while sampling points?

Hi,

While sampling points along rays, the code uses rays.directions for the direction vectors instead of rays.viewdirs.

t_vals, samples = mip.sample_along_rays(
key,
rays.origins,
rays.directions,
rays.radii,
self.num_samples,
rays.near,
rays.far,
randomized,
self.lindisp,
self.ray_shape,
)

Original NeRF uses normalized direction vectors for the sampling points. Can you clarify if we need to replace rays.directions with rays.viewdirs?

Is dx (or base_radius of cone) all the same for rays of different pixel in a picture?

`

Distance from each unit-norm direction vector to its x-axis neighbor.

dx = [
    np.sqrt(np.sum((v[:-1, :, :] - v[1:, :, :])**2, -1)) for v in directions
]

dx = [np.concatenate([v, v[-2:-1, :]], 0) for v in dx]

# Cut the distance in half, and then round it out so that it's
# halfway between inscribed by / circumscribed about the pixel.

radii = [v[..., None] * 2 / np.sqrt(12) for v in dx]`

average error matrics in paper

hi,thanks for your exciting work.I have two question about avg matrics.

first,I'm confused about the meaning of avg matric,will there be more advantages?

second,I find that when you compute MSE from PSNR,the implementation of the code in is different from that in the paper. so I'm a bit confunsed. Can you help me ?

Question about the radius setting.

Hi, a wonderful work!

I am wondering why the radius of the cone is set to r=2/sqrt(12)*pixel_size.

I know that this setting is to ensure that the variance of the cone matches that of the pixel in world coordinate space. I'm just curious about the derivation. Could you please give me some hint on how to get to this result?

Thanks,
Yu

about shiny_datasets

sorry to bother you but, after I downloaded shiny.zip, it failed when I unzipped it. It seems that the zip file is internally damaged?

Use normalized direction vector or not?

AS you said: in

# Distance from each unit-norm direction vector to its x-axis neighbor.

BUT you use directions (not-norm) rather than viewdirs (unit-norm):

directions = ((camera_dirs[None, ..., None, :] *
self.camtoworlds[:, None, None, :3, :3]).sum(axis=-1))
origins = np.broadcast_to(self.camtoworlds[:, None, None, :3, -1],
directions.shape)
viewdirs = directions / np.linalg.norm(directions, axis=-1, keepdims=True)
# Distance from each unit-norm direction vector to its x-axis neighbor.
dx = np.sqrt(
np.sum((directions[:, :-1, :, :] - directions[:, 1:, :, :])**2, -1))

This is different,right?
dx = np.sqrt(
np.sum((directions[:, :-1, :, :] - directions[:, 1:, :, :])**2, -1))
dx = np.concatenate([dx, dx[:, -2:-1, :]], 1)
# Cut the distance in half, and then round it out so that it's
# halfway between inscribed by / circumscribed about the pixel.
radii = dx[..., None] * 2 / np.sqrt(12)

If you use directions (not-norm) ,dx (or radii) will be the same for different pixel rays in the image .
If you use viewdirs (unit-norm), dx (or radii) will be smaller for pixel rays away from image center, and bigger for pixel rays around image center.

scripts/train_blender.sh: line 31: 11426 Segmentation fault (core dumped)

Hi,

I ran the command

bash scripts/train_blender.sh

and the terminal indicated the following error:

scripts/train_blender.sh: line 31: 11426 Segmentation fault (core dumped) python -m train --data_dir=$DATA_DIR --train_dir=$TRAIN_DIR --gin_file=configs/blender.gin --logtostderr

Could you tell me how to address it? Thanks

Are the signs inverted on rgb padding?

After the sigmoid activation I noticed that you are doing this

rgb = rgb * (1 + 2 * self.rgb_padding) - self.rgb_padding

where rgb_padding = 0.001 by default.

Is this intentional or did you mean to move the range to (rgb_padding, 1 - rgb_padding)?

small typo in instructions

python scripts/convert_blender_data.py --blenderdir /nerf_synthetic --outdir /multiscale

should be:

python scripts/convert_blender_data.py --blenderdir ./nerf_synthetic --outdir ./multiscale

How to get equation (5)

My undestanding of the first condition is: the projection of o to x on the direction d should be between $t_0$ and $t_1$.

But I don't understand how to get the second condition. Can anyone help me?
image

Tiny typo in supplemental material

Hi! Thanks for sharing the great work!

I was just going through the paper and might find a tiny typo. So I am posting an issue in case you did not already notice. In the 1st section of supplemental material, I guess it should be (sinθ)^2 here instead of sinθ.

Screen_2022-09-15_15-40-57

A question about the code

Hello, thank you very much for your high-quality work. At present, I have some questions that I would like to get your help.

When I tried to reproduce your code, I found that the environment in requirements was difficult to work with. There were always some dependency issues.

Do you have any new dependencies installed?Looking forward to your reply.thanks!!!

A confusion about the order of sin and cos in the IPE part

jnp.concatenate([y, y + 0.5 * jnp.pi], axis=-1),

Hello author! I recently tried to reproduce the mip-nerf by myself, and found a doubt about the IPE part.

In NeRF it is coded in the order of [(sinx,cosx),...], while Mip-NeRF seems to put the sinx-related ones together, followed by the cosx-related ones,[(sinx , ...),(cosx , ...)], as the following equation shows:

image

So I'm curious, have you tried coding in the same order as in NeRF, or does the current coding layout work better? Thank you so much! I just noticed this while writing the code, so I wanted to ask for some advice

Question about mipnerf360

Hello! Sorry if this is the wrong place to post this question.
In mipnerf, during inference, LLFF scenes have their near and far distances set as 0 and 1 with the use of NDC.
However in mipnerf360, during training, without the usage of NDC, I assume near and far distances are used directly from the COLMAP calculations.
If so, how do you set these values during inference? Has it got something to do with the contract(.) operator?
Or perhaps I am approaching this wrongly?

Use of internal packages

Hi,
I think you are using internal packages in scripts/summarize.ipynb.

For example:
from google3.pyglib import gfile
with gfile.Open(filename) as f:

Radii computation with and without NDC

The radii computation code is different for non-ndc and ndc spaces. In particular, without ndc, radii computation uses only dx derived from directions, but when ndc is enabled, (dx+dy)/2 is used, which is derived from ndc_origins. Can you please shed some light on why it is done so?

Reconstructing using colmap poses

If we estimate the poses of the lego dataset using colmap, will it work? I tried doing the same, and tried training it using train_llff.sh but it showed inconsistent results.

Question about CPU OOM.

I use the following command to train on multiscale datasets, but get "killed" output.

bash ./scripts/train_multiblender.sh

I have generated multiscale datesets, and changed the correct path in ./scripts/train_multiblender.sh. It works well on ./scripts/train_blender.sh with original datasets.

My computer has 16G MEM and 4G SWAP and I'd like to know the minimum requirements.

Thanks.

Batch_size Can't reduce GPU Memory

Hi, I am using RTX3080 for training and will crash every 5000 iterations when executing this code
vis_suite = vis.visualize_suite(pred_distance, pred_acc)
And here is the error message

Traceback (most recent call last):
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/data/feihu/mipnerf-main/train.py", line 321, in <module>
    app.run(main)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/data/feihu/mipnerf-main/train.py", line 295, in main
    vis_suite = vis.visualize_suite(pred_distance, pred_acc)
  File "/data/feihu/mipnerf-main/internal/vis.py", line 140, in visualize_suite
    'depth_normals': visualize_normals(depth, acc)
  File "/data/feihu/mipnerf-main/internal/vis.py", line 125, in visualize_normals
    normals = depth_to_normals(scaled_depth)
  File "/data/feihu/mipnerf-main/internal/vis.py", line 38, in depth_to_normals
    dy = convolve2d(depth, f_blur[None, :] * f_edge[:, None])
  File "/data/feihu/mipnerf-main/internal/vis.py", line 30, in convolve2d
    return jsp.signal.convolve2d(
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/scipy/signal.py", line 85, in convolve2d
    return _convolve_nd(in1, in2, mode, precision=precision)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/scipy/signal.py", line 65, in _convolve_nd
    result = lax.conv_general_dilated(in1[None, None], in2[None, None], strides,
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/lax/convolution.py", line 147, in conv_general_dilated
    return conv_general_dilated_p.bind(
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/core.py", line 323, in bind
    return self.bind_with_trace(find_top_trace(args), args, params)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/core.py", line 326, in bind_with_trace
    out = trace.process_primitive(self, map(trace.full_raise, args), params)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/core.py", line 675, in process_primitive
    return primitive.impl(*tracers, **params)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/dispatch.py", line 98, in apply_primitive
    compiled_fun = xla_primitive_callable(prim, *unsafe_map(arg_spec, args),
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/util.py", line 219, in wrapper
    return cached(config._trace_context(), *args, **kwargs)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/util.py", line 212, in cached
    return f(*args, **kwargs)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/dispatch.py", line 148, in xla_primitive_callable
    compiled = _xla_callable_uncached(lu.wrap_init(prim_fun), device, None,
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/dispatch.py", line 230, in _xla_callable_uncached
    return lower_xla_callable(fun, device, backend, name, donated_invars, False,
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/dispatch.py", line 704, in compile
    self._executable = XlaCompiledComputation.from_xla_computation(
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/dispatch.py", line 806, in from_xla_computation
    compiled = compile_or_get_cached(backend, xla_computation, options)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/dispatch.py", line 768, in compile_or_get_cached
    return backend_compile(backend, computation, compile_options)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/profiler.py", line 206, in wrapper
    return func(*args, **kwargs)
  File "/home/feihu/.conda/envs/metanerf/lib/python3.9/site-packages/jax/_src/dispatch.py", line 713, in backend_compile
    return backend.compile(built_c, compile_options=options)
jaxlib.xla_extension.XlaRuntimeError: UNKNOWN: Failed to determine best cudnn convolution algorithm for:
%cudnn-conv = (f32[1,1,800,800]{3,2,1,0}, u8[0]{0}) custom-call(f32[1,1,800,800]{3,2,1,0} %Arg_0.1, f32[1,1,3,3]{3,2,1,0} %Arg_1.2), window={size=3x3 pad=1_1x1_1}, dim_labels=bf01_oi01->bf01, custom_call_target="__cudnn$convForward", metadata={op_name="jit(conv_general_dilated)/jit(main)/conv_general_dilated[window_strides=(1, 1) padding=((1, 1), (1, 1)) lhs_dilation=(1, 1) rhs_dilation=(1, 1) dimension_numbers=ConvDimensionNumbers(lhs_spec=(0, 1, 2, 3), rhs_spec=(0, 1, 2, 3), out_spec=(0, 1, 2, 3)) feature_group_count=1 batch_group_count=1 lhs_shape=(1, 1, 800, 800) rhs_shape=(1, 1, 3, 3) precision=(<Precision.HIGHEST: 2>, <Precision.HIGHEST: 2>) preferred_element_type=None]" source_file="/data/feihu/mipnerf-main/internal/vis.py" source_line=30}, backend_config="{\"conv_result_scale\":1,\"activation_mode\":\"0\",\"side_input_scale\":0}"

Original error: UNIMPLEMENTED: DNN library is not found.

To ignore this failure and try to use a fallback algorithm (which may have suboptimal performance), use XLA_FLAGS=--xla_gpu_strict_conv_algorithm_picker=false.  Please also file a bug for the root cause of failing autotuning.

I have found that jax will show this message when OOM, so i changed my batch_size from 1024 to 512, but it still takes 10GB when training, how can I reduce the usage of GPU Memory?

How to understand "The variance of the conical frustum with respect to its radius r is equal to the variance of the frustum with respect to x or (by symmetry) y. "

How to understand "The variance of the conical frustum with respect to its radius r is equal to the variance of the frustum with respect to x or (by symmetry) y. "in Supplemental Material?
I think
Var(r)=E(r^2)-E(r)^2 where -R<r<R so E(r)=0
E(r^2)=E(x^2+y^2)=E(x^2)+E(y^2)=E(x^2)-E(x)^2+E(y^2)-E(y)^2=Var(x)+Var(y) where E(x)=0 E(y)=0
SO Var(r)=Var(x)+Var(y) rather than Var(r)=Var(x)
Where is my mistake?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.