google / nerfactor Goto Github PK

View Code? Open in Web Editor NEW

422.0 13.0 54.0 40.52 MB

Neural Factorization of Shape and Reflectance Under an Unknown Illumination

Home Page: https://xiuming.info/projects/nerfactor/

License: Apache License 2.0

Python 96.91% Shell 3.09%

relighting view-synthesis shape reflectance illumination neural-rendering nerf

nerfactor's Issues

When I train vanilla nerf, there are countless threads.

hi, when I train vanilla nerf, there are countless threads, and sever is about to freeze.
I found that this occurs due to https://github.com/google/nerfactor/blob/main/nerfactor/networks/embedder.py
freq_bands = 2. ** tf.linspace(0., log2_max_freq, n_freqs)

about spherify_poses

def spherify_poses(poses):
    """poses: Nx3x5 (final column contains H, W, and focal length)."""
    rays_d = poses[:, :3, 2:3]
    rays_o = poses[:, :3, 3:4] # because pose is camera-to-world

  def p34_to_44(p):
      """p: Nx3x4."""
      return np.concatenate((
          p,
          np.tile(
              np.reshape(np.eye(4)[-1, :], (1, 1, 4)),
              (p.shape[0], 1, 1)),
      ), 1)

  def min_line_dist(rays_o, rays_d):
      a_i = np.eye(3) - rays_d * np.transpose(rays_d, [0, 2, 1])
      b_i = -a_i @ rays_o
      pt_mindist = np.squeeze(-np.linalg.inv(
          (np.transpose(a_i, [0, 2, 1]) @ a_i).mean(0)) @ (b_i).mean(0))
      return pt_mindist

pt_mindist = min_line_dist(rays_o, rays_d)
center = pt_mindist
up = (poses[:, :3, 3] - center).mean(0)
vec0 = normalize(up)
vec1 = normalize(np.cross([.1, .2, .3], vec0))
vec2 = normalize(np.cross(vec0, vec1))
pos = center
c2w = np.stack([vec1, vec2, vec0, pos], 1)
poses_reset = (
    np.linalg.inv(p34_to_44(c2w[None])) @ p34_to_44(poses[:, :3, :4]))
rad = np.sqrt(np.mean(np.sum(np.square(poses_reset[:, :3, 3]), -1)))
sc = 1. / rad
poses_reset[:, :3, 3] *= sc
rad *= sc
centroid = np.mean(poses_reset[:, :3, 3], 0)
zh = centroid[2]
radcircle = np.sqrt(rad ** 2 - zh ** 2)

new_poses = []
for th in np.linspace(0., 2. * np.pi, 120):
    camorigin = np.array([
        radcircle * np.cos(th), radcircle * np.sin(th), zh])
    up = np.array([0, 0, -1.])
    vec2 = normalize(camorigin)
    vec0 = normalize(np.cross(vec2, up))
    vec1 = normalize(np.cross(vec2, vec0))
    pos = camorigin
    p = np.stack([vec0, vec1, vec2, pos], 1)
    new_poses.append(p)
new_poses = np.stack(new_poses, 0)
new_poses = np.concatenate([
    new_poses,
    np.broadcast_to(poses[0, :3, -1:], new_poses[:, :3, -1:].shape)
], -1)
poses_reset = np.concatenate([
    poses_reset[:, :3, :4],
    np.broadcast_to(poses[0, :3, -1:], poses_reset[:, :3, -1:].shape)
], -1)
return poses_reset, new_poses

Can you explain a little bit about min_line_dist in this function?
I think this problem is about "Find the point minimizing the distance from a set of N lines".

And I don't think this is an easy problem. The min_dist_line function seems to have implemented it quite simply. Could you explain a little bit?

It is slow to render my own synthetic data, can we use gpu to render?

Thank you for open-sourcing your code, it is indeed a great job. I have a question regarding its usage. I have used your code to render my own dataset, but I noticed that the rendering speed is relatively slow, taking about 2 minutes per image. Therefore, I would like to inquire if you have any suggestions on how to accelerate the rendering process using GPU?

Can we extract mesh from the system by marching cubes?

Thanks for the released code! I would like to ask whether we can extract mesh by marching cubes. Is this function implemented? If not, how can we access the density field of NeRF in the model? After a quick look into the code, it seems that the system directly read xyz, normal etc. from input, and it is not straightforward to get the density values.

NerFactor generated images from the quantitative evaluations

Congrats to a nice paper!

Would it be possible to access the NerFactor output images for the view synthesis comparison (column III, row "NerfFactor") in Table 1 (the eight validation images for each of the four scenes), in order to generate other metrics, check the metric and visual quality per image and per scene.

About the intrinsic matrix in data

Thanks for the great work and data!

I am transfering the camera pose data to other format. But neighter metadata.json nor *_camera.json had no intrinsic data inside(focal length and cx, cy).

Also I used .blender's focal data, but either way not worked for me, maybe this is because .blender file use different extrinsic matrix.

So can you tell me your intrinsic values or the way to know those values? thx

Rendering results are all white after training the vanilla NeRF in step1

Hi @xiumingzhang, Thank you for your great work.
I just tested the code on the Shiny Blender dataset from the Ref-NeRF paper but go wired rendering results(all white) while training the vanilla NeRF.
Shiny Blender teapot:

Here is the dataset in NeRFactor format.

NeRF Synthetic ficus:

I use the same settings (near, far, learning rate...) as NeRF synthetic dataset in your README.
I am wondering if I miss anything or feed the wrong config setting to the network.
Thank you in advance!

Shape pre-trained stage error

im trying to run the code on dtu scan without mvs shape. I encountered a shape mismatch when enumerating the dataset. I followed the instructions on trainning vanilla nerf and computing geometry buffers for dtu scan.

`2023-10-28 23:21:29.816339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2023-10-28 23:21:29.863132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:85:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2023-10-28 23:21:29.874349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2023-10-28 23:21:30.084292: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-10-28 23:21:30.248526: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-10-28 23:21:30.293372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2023-10-28 23:21:30.489140: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2023-10-28 23:21:30.528962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2023-10-28 23:21:30.827244: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-10-28 23:21:30.828401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2023-10-28 23:21:30.829422: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2023-10-28 23:21:30.857546: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2400105000 Hz
2023-10-28 23:21:30.859352: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c210dd2680 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-10-28 23:21:30.859387: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2023-10-28 23:21:30.961714: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c210dd5100 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-28 23:21:30.961809: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2023-10-28 23:21:30.963299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:85:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2023-10-28 23:21:30.963357: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2023-10-28 23:21:30.963389: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-10-28 23:21:30.963417: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-10-28 23:21:30.963444: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2023-10-28 23:21:30.963470: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2023-10-28 23:21:30.963504: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2023-10-28 23:21:30.963531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-10-28 23:21:30.964088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2023-10-28 23:21:30.964137: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2023-10-28 23:21:30.967326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-28 23:21:30.967377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2023-10-28 23:21:30.967403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2023-10-28 23:21:30.968545: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14902 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
INFO:tensorflow:�[32mUsing MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)�[0m
I1028 23:21:30.974194 47943187769408 mirrored_strategy.py:500] �[32m�[32mUsing MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)�[0m�[0m
�[36m[util/io] Output directory already exisits:
/project2/tsui/nerfactor/output/train/scan37_shape/lr1e-2�[0m
�[35m[util/io] Output directory wiped:
/project2/tsui/nerfactor/output/train/scan37_shape/lr1e-2�[0m
�[36m[trainvali] For results, see:
/project2/tsui/nerfactor/output/train/scan37_shape/lr1e-2�[0m
�[36m[datasets/nerf_shape] Number of 'train' views: 47�[0m
�[36m[datasets/nerf_shape] Number of 'vali' views: 2�[0m
�[36m[models/base] Trainable layers registered:
['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']�[0m
�[36m[trainvali] Started from scratch�[0m

Training epochs: 0%| | 0/200 [00:00<?, ?it/s]2023-10-28 23:21:35.393537: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,512]
2023-10-28 23:21:35.393675: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,3]
2023-10-28 23:21:35.393760: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,3]
2023-10-28 23:21:35.393836: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341]
2023-10-28 23:21:35.393917: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,3]
2023-10-28 23:21:35.393997: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,3]

Training epochs: 0%| | 0/200 [00:00<?, ?it/s]
shape: <tensorflow.python.distribute.input_lib.DistributedDataset object at 0x2b9b1fe38518>
Traceback (most recent call last):
File "/project2/tsui/nerfactor/code/nerfactor/trainvali.py", line 342, in
app.run(main)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/project2/tsui/nerfactor/code/nerfactor/trainvali.py", line 179, in main
for batch_i, batch in enumerate(datapipe_train):
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/input_lib.py", line 296, in next
return self.get_next()
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/input_lib.py", line 328, in get_next
global_has_value, replicas = _get_next_as_optional(self, self._strategy)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/input_lib.py", line 192, in _get_next_as_optional
iterator._iterators[i].get_next_as_list(new_name)) # pylint: disable=protected-access
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/input_lib.py", line 1132, in get_next_as_list
data_list = self._iterator.get_next_as_optional()
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py", line 601, in get_next_as_optional
iterator_ops.get_next_as_optional(self._device_iterators[i]))
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 833, in get_next_as_optional
iterator.element_spec)), iterator.element_spec)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2444, in iterator_get_next_as_optional
_ops.raise_from_not_ok_status(e, name)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1022] = [60, 497] does not index into param shape [256,341,512]
[[{{node GatherNd_7}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]] [Op:IteratorGetNextAsOptional]
`
what could be the problem? Thanks

Geometry buffer script throwing error

Hi i am trying to run the geometry buffer script on my dataset and its firstly taking 2-3 hrs to show any output and is then throwing errors.
My command:
bash geometry_from_nerf_run.sh 0 --data_root="/scratch/darthgera123/nerf/woman_data/" --trained_nerf="/scratch/darthgera123/nerf/woman_nerf/lr5e-4/" --out_root="/scratch/darthgera123/nerf/woman_geometry/" --imh=512 --scene_bbox=-0.3,0.3,-0.3,0.3,-0.3,0.3 --occu_thres=0.5 --mlp_chunk=3750
The error:

2021-08-09 16:04:55.509013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10210 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5)
2021-08-09 16:04:55.512131: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f55866a9d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-09 16:04:55.512172: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
Views (train):   0%|          | 0/304 [00:00<?, ?it/s]2021-08-09 16:05:05.594129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-09 17:51:19.840422: E tensorflow/core/grappler/clusters/utils.cc:87] Failed to get device properties, error code: 999

As mentioned I initially ket the mlp_chunk high till I got OOM error but now its throwing this weird error. Please help @xiumingzhang @cdibona @dberlin

gradient error in Joint Optimization

I train successfully in shape pre-training but stuck in joint optimization.
2022-09-27 02:30:25.358618: E tensorflow/core/kernels/check_numerics_op.cc:289] abnormal_detected_host @0x7f43f6808a00 = {1, 0} Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found. (0) Invalid argument: Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' : Tensor had NaN values [[node gradient_tape/model/CheckNumerics (defined at tmp/tmp398ckawp.py:22) ]] [[Identity_6/_372]] (1) Invalid argument: Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' : Tensor had NaN values [[node gradient_tape/model/CheckNumerics (defined at tmp/tmp398ckawp.py:22) ]] 0 successful operations. 0 derived errors ignored. [Op:__inference_distributed_train_step_45946]

run geometry_from_nerf.py issues

Hi,xiuming

Great job! i want to run this work, but i meet some problems when i run geometry_from_nerf.py , could you help me? and i didn't find the trained models in your data from project website, so do you publish the trained model for testing later?

Thanks! looking forward to your reply.

WARNING:tensorflow:10 out of the last 10 calls to <function pfor..f at 0x7f6d021b9bf8> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
W0911 12:21:18.196791 140106441705280 def_function.py:126] 10 out of the last 10 calls to <function pfor..f at 0x7f6d021b9bf8> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.

MERL-BRDF dataset

Hi, thanks for your impressing work.

I am trying to re-implement your code, however I didn't find the data-download link for your processed MERL-BRDF dataset. It seems the public-available source databse is different from that one used in your codes.

I also noticed the code under nerfactor/brdf, but the readme file said we needn't run it directly. However it truly leads to errors when directly run the shell command.

Can you please share the download link of your processed MERL-BRDF dataset, or your processing code

How to calculate geometry buffers from MVS geometry ?

I am trying to replace NeRF with my own geometry. The script trainvali_mvs_run.sh for joint optimisation uses shape_mvs dataset, which in turn needs the same alpha, normals, and lvis.
I couldn't find any script to generate these geometry buffers from MVS geometry instead of NeRF. Could you please help ?

Scripts

Hi
Thanks for the very comprehensive repo however I don't understand how to run the scripts and how to set the arguments
@cdibona @xiumingzhang @google-admin please help

Rendering scripts

Hi,

Do you have python script to render nerf's orginal blend files?

Best，

How about the light direction and the vier direction?

How to get the angle view direction and light direction in the paper?

"ValueError: 'a' cannot be empty unless no samples are taken" in preparation step

Hi,
I was trying to follow the first step here to get BRDF priors, but I am getting the following error:

$ REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config='brdf.ini' --config_override="data_root=$data_root,outroot=$outroot,viewer_prefix=$viewer_prefix"

2021-08-18 12:21:52.126973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-08-18 12:21:52.165148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:68:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-08-18 12:21:52.165428: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-18 12:21:52.171059: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-18 12:21:52.173348: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-08-18 12:21:52.173750: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-08-18 12:21:52.176352: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-08-18 12:21:52.186737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-08-18 12:21:52.192472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-08-18 12:21:52.194714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-08-18 12:21:52.195193: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-08-18 12:21:52.203031: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3299990000 Hz
2021-08-18 12:21:52.203763: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f68dc000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-08-18 12:21:52.203782: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): Host, Default Version
2021-08-18 12:21:52.290736: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564ac481a330 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-18 12:21:52.290795: I tensorflow/compiler/xla/service/service.cc:176]   StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2021-08-18 12:21:52.292470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties: 
pciBusID: 0000:68:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-08-18 12:21:52.292553: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-18 12:21:52.292584: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-18 12:21:52.292611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-08-18 12:21:52.292637: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-08-18 12:21:52.292663: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-08-18 12:21:52.292690: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-08-18 12:21:52.292717: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-08-18 12:21:52.294454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-08-18 12:21:52.294508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-18 12:21:52.295132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-18 12:21:52.295142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108]      0 
2021-08-18 12:21:52.295148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0:   N 
2021-08-18 12:21:52.296183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10150 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:68:00.0, compute capability: 7.5)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0818 12:21:52.299249 140098729092928 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
[util/io] Output directory already exisits:
	/home/jiwonchoi/nerfactor/output/train/merl/lr1e-2
[util/io] Overwrite is off, so doing nothing
[trainvali] For results, see:
	/home/jiwonchoi/nerfactor/output/train/merl/lr1e-2
Traceback (most recent call last):
  File "/home/jiwonchoi/nerfactor/nerfactor/trainvali.py", line 341, in <module>
    app.run(main)
  File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/jiwonchoi/nerfactor/nerfactor/trainvali.py", line 81, in main
    dataset_train = Dataset(config, 'train', debug=FLAGS.debug)
  File "/home/jiwonchoi/nerfactor/nerfactor/datasets/brdf_merl.py", line 52, in __init__
    mats = np.random.choice(self.brdf_names, n_iden, replace=False)
  File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken

I double checked my paths. Not sure where this error has originated from.

Trained Weights

Hi, I find NeRFactor results very impressive; it really is an outstanding and inspiring work.
Would it be possibile for you to share the trained weights you have used to obtain such results?
I would like to include NeRFactor scene representations into my path-tracer and having them available to compare the results would surely speed-up my work.

Thank you!

brdf_scale

I've seen there is a constant named brdf_scale to scale the brdf value. Where does this constant come from and what its value is?

Thanks!

How long will it take to run the third part in the ./nerfactor

How to get the result in your paper?

Hi, in general NeRFactor is trully an outstanding and inspiring work.

However, when I run the code with the default scripts and settings you privided under nerfactor/, the result, especially the testing relighting result, is not that satisfied, as compared to the figures in your paper:

I set the 'ims' and 'imh' to 512 in all those experiments, is there any settings needs to be changed, like total number of iterations or learning rates, when run the code? Or is there anything else you suppose may lead to this performance?

Thanks!

About create my own dataset

I can use COLMAP create LLFF successfully, but I can't understand how to do next.
Can you explain it to me in more detail？
thanks!!

questions about hdrs.

Hi, i downloaded some hdr from poly heaven. But i find some differences between the hdr downloaded and the hdr provided. The max value of the sunset i download is about 300000, but the provide hdr is about 80. Is there any pre-process while dealing with the new hdr map?

I get error when I train vanilla NeRF.

Hi, thanks to your nice work.
However, I get error when I prepare to train vanilla NeRF following the instruction.

The error is printed as follow:
[trainvali] For results, see:
/home/linxiong/nerfactor/output/train/hotdog_nerf/lr5e-4
[datasets/nerf] Number of 'train' views: 100
Traceback (most recent call last):
File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 348, in
app.run(main)
File "/home/linxiong/.local/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/linxiong/.local/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 91, in main
datapipe_train = dataset_train.build_pipeline(no_batch=no_batch)
File "../nerfactor/datasets/base.py", line 115, in build_pipeline
dataset = dataset.map(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1623, in map
return ParallelMapDataset(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4016, in init
self._map_func = StructuredFunctionWrapper(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3221, in init
self._function = wrapper_fn.get_concrete_function()
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2531, in get_concrete_function
graph_function = self._get_concrete_function_garbage_collected(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2496, in _get_concrete_function_garbage_collected
graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2657, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3214, in wrapper_fn
ret = _wrapper_helper(*args)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3156, in _wrapper_helper
ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in call
result = self._call(*args, **kwds)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 627, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 505, in _initialize
self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2657, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn
return weak_wrapped_fn().wrapped(*args, **kwds)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3299, in bound_method_wrapper
return wrapped_fn(*args, **kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper
raise e.ag_error_metadata.to_exception(e)
NotImplementedError: in user code:

/home/linxiong/nerfactor/nerfactor/datasets/nerf.py:111 _process_example_postcache  *
    rayo, rayd, rgb = self._sample_rays(self.rayo, self.rayd, self.rgb)
/home/linxiong/nerfactor/nerfactor/datasets/nerf.py:130 _sample_rays  *
    coords = tf.stack(
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:3391 meshgrid  **
    mult_fact = ones(shapes, output_dtype)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2967 ones
    output = _constant_if_small(one, shape, dtype, name)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2662 _constant_if_small
    if np.prod(shape) < 1000:
<__array_function__ internals>:5 prod
    
/home/linxiong/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3051 prod
    return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
/home/linxiong/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:86 _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:748 __array__
    raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"

NotImplementedError: Cannot convert a symbolic Tensor (meshgrid/Size:0) to a numpy array.

Any suggestion ?

OOM(out of memory) on 2080Ti (with 11G memory) when run test.py

I really appreciate your work and open source, but when I run the following code, I get an ERR: OOM

III. Simultaneous Relighting and View Synthesis (testing)

‘’‘
scene='hotdog_2163'
gpus='4,5,6,7'
model='nerfactor'
overwrite='True'

proj_root='/mnt/data1/jy/NeRFactor'
repo_dir="$proj_root/nerfactor" # /mnt/data1/jy/NeRFactor/nerfactor/
outroot="$proj_root/output/train/${scene}_$model"
viewer_prefix='http://vision38.csail.mit.edu' # or just use ''
ckpt="$outroot/lr5e-3/checkpoints/ckpt-10"
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
color_correct_albedo='false'
else
color_correct_albedo='true'
fi
REPO_DIR="$proj_root" "$proj_root/nerfactor/test_run.sh" "$gpus" --ckpt="$ckpt" --color_correct_albedo="$color_correct_albedo"
‘’’

My GPU is 2080Ti (with 11G memory) , when I want to lower batch size, I find 'no_batch = True' in 'lr5e-3.ini'. So how can I successfully run 'test.py' on 2080Ti.

Very much looking forward to your help

MLPs wrong skip connection

Hello,

According to paper, MLPs skip connection is on layer 2 (starting from 0 is the third one) but using nerfactor I've seen strange behaviour so I checked the code. Network .call() in mlp.py does the following:

    x_ = x + 0 # make a copy
    for i, layer in enumerate(self.layers):
        y = layer(x_)
        if i in self.skip_at:
            y = tf.concat((y, x), -1)
        x_ = y
    return y

So the concatenation is applied after calling the layer and therefore the true skip connection is at the next layer (the fourth one).

OOM at II. Joint Optimization in Training, Validation, and Testing

It keeps out of memory at II. Joint Optimization in Training, Validation, and Testing. Note that it is NOT out of GPU memory but cpu memory. And it seems to happen at for batch_i, batch in enumerate(datapipe_train):
I run on a machine with 1 3090 GPU, 20 cpu cores, 80 GB memory. Any suggestion would help!

how to set up the 'ims'

hi,xiuming.

I want to learn how to set up the value of 'ims' when converting the brdf dataset.
and is there a relationship between 'ims' and 'imh'?

thanks in advance!

Relighting Results Background Color

Hi,

I have downloaded the released results, and noticed that the background for the relit images is different for each envmap lighting. Where does the background color come from?

Wrong NeRF and surface

Hi, xiuming,thanks for your outstanding work and resources!

When I run the code with the scripts, settings and the your data provided in google drive, I got wrong result.

Maybe I set a wrong learning rate or wrong path of dataset? I found train set and validation set without nn.png. Could you provide me some suggestions? Thanks!

My dataset directory:

/home/kf/nerfactor/data/selected/hotdog_2163
│   transforms_test.json
│   transforms_train.json
│   transforms_val.json
│   
├───test_000
│       albedo.png
│       diffuse-color.exr
│       metadata.json
│       nn.png
│       normal.exr
│       normal.png
│       refball-normal.exr
│       refball-normal.png
│       rgba_city.png
│       rgba_courtyard.png
│       rgba_forest.png
│       rgba_interior.png
│       rgba_night.png
│       rgba_olat-0000-0000.png
│       rgba_olat-0000-0008.png
│       rgba_olat-0000-0016.png
│       rgba_olat-0000-0024.png
│       rgba_olat-0004-0000.png
│       rgba_olat-0004-0008.png
│       rgba_olat-0004-0016.png
│       rgba_olat-0004-0024.png
│       rgba_studio.png
│       rgba_sunrise.png
│       rgba_sunset.png
│       rgba.png
│       
├───test_001
│       .......
├───test_199
│       .......
├───train_000
│       albedo.png
│       diffuse-color.exr
│       metadata.json
│       normal.exr
│       normal.png
│       refball-normal.exr
│       refball-normal.png
│       rgba_city.png
│       rgba_courtyard.png
│       rgba_forest.png
│       rgba_interior.png
│       rgba_night.png
│       rgba_olat-0000-0000.png
│       rgba_olat-0000-0008.png
│       rgba_olat-0000-0016.png
│       rgba_olat-0000-0024.png
│       rgba_olat-0004-0000.png
│       rgba_olat-0004-0008.png
│       rgba_olat-0004-0016.png
│       rgba_olat-0004-0024.png
│       rgba_studio.png
│       rgba_sunrise.png
│       rgba_sunset.png
│       rgba.png
│       
├───train_001

│       
├───train_002
│       metadata.json
│       rgba.png
│       
├───train_002
│       .......
├───train_099
│     
├───val_000
│       albedo.png
│       diffuse-color.exr
│       metadata.json
│       normal.exr
│       normal.png
│       refball-normal.exr
│       refball-normal.png
│       rgba_city.png
│       rgba_courtyard.png
│       rgba_forest.png
│       rgba_interior.png
│       rgba_night.png
│       rgba_olat-0000-0000.png
│       rgba_olat-0000-0008.png
│       rgba_olat-0000-0016.png
│       rgba_olat-0000-0024.png
│       rgba_olat-0004-0000.png
│       rgba_olat-0004-0008.png
│       rgba_olat-0004-0016.png
│       rgba_olat-0004-0024.png
│       rgba_studio.png
│       rgba_sunrise.png
│       rgba_sunset.png
│       rgba.png
│ 
├───val_001
│     ........
└───val_007

tensorflow.python.framework.errors_impl.InvalidArgumentError error in synthesis step

Hi,
With your debugging help done in other issues, I was able to get up to the last step.

I have 3 questions about this last step:

If I use single 2080ti here (as you set gpus='0'), I am getting OOM allocation error so I assigned three 2080ti here. Is this an acceptable approach? Because you did not seem to allow allocating multi-gpus for calculating geometry buffers. Also, should I consider using imh=256 instead of 512 to reduce memory usage?
Error message as follows:

tensorflow.python.framework.errors_impl.ResourceExhaustedError: 
    OOM when allocating tensor with shape[68361728,3] and type float on /job:localhost/replica:0/t
ask:0/device:GPU:0 by allocator GPU_0_bfc [Op:Mul]

What I initially did was copying the whole script (step 1, 2, and 3) of the final step, and running it with $ bash ./script.sh. However this causes the error saying that it cannot find the ckpt-2 and ckpt-10 files that should be pre-existed. So I separated three scripts, and was able to get up to the shape pre-training and joint optimization process. I hope my execution did not cause the below tensorflow warning regarding:

The calling iterator did not fully read the dataset being cached. 
In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. 
This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. 
You should use `dataset.take(k).cache().repeat()` instead.

I am getting the following error in the very last step and cannot complete your hotdog example ("Simultaneous Relighting and View Synthesis (testing)"):

[test] Restoring trained model
[models/base] Trainable layers registered:
        ['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[models/base] Trainable layers registered:
        ['net_brdf_mlp_layer0', 'net_brdf_mlp_layer1', 'net_brdf_mlp_layer2', 'net_brdf_mlp_layer3', 'net_brdf_out_layer0']
[models/base] Trainable layers registered:
        ['net_albedo_mlp_layer0', 'net_albedo_mlp_layer1', 'net_albedo_mlp_layer2', 'net_albedo_mlp_layer3', 'net_albedo_out_layer0', 'net_brdf_z_mlp_layer0', 'net_brdf_z_mlp_layer1', 'net_brdf_z_mlp_layer2', 'net_brdf_z_mlp_layer3', 'net_brdf_z_out_layer0', 'net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[test] Running inference
Inferring Views:   0%|                                                     | 0/200 [00:00<?, ?it/s]
2021-09-14 01:46:33.905210: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-09-14 01:47:05.401366: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Inferring Views:   0%|                                                     | 0/200 [02:22<?, ?it/s]
Traceback (most recent call last):
  File "/home/jiwonchoi/code/nerfactor/nerfactor/test.py", line 209, in <module>
    app.run(main)
  File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/jiwonchoi/code/nerfactor/nerfactor/test.py", line 192, in main
    brdf_z_override=brdf_z_override)
  File "/home/jiwonchoi/code/nerfactor/nerfactor/models/nerfactor.py", line 266, in call
    relight_probes=relight_probes)
  File "/home/jiwonchoi/code/nerfactor/nerfactor/models/nerfactor.py", line 362, in _render
    rgb_probes = tf.concat([x[:, None, :] for x in rgb_probes], axis=1)
  File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
    return target(*args, **kwargs)
  File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1606, in concat
    return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
  File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1181, in concat_v2
    _ops.raise_from_not_ok_status(e, name)
  File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
    six.raise_from(core._status_to_exception(e.code, message), None)
  File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: OpKernel 'ConcatV2' has constraint on attr 'T' not in NodeDef '[N=0, Tidx=DT_INT32]', KernelDef: 'op: "ConcatV2" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "values" host_memory_arg: "axis" host_memory_arg: "output"' [Op:ConcatV2] name: concat

It seems like rgb_probes = tf.concat([x[:, None, :] for x in rgb_probes], axis=1) this line of code causes the issue. Not sure how to debug this.

Thank you in advance.

Question about incompatible shapes(0,3) and (100,3) at II. Joint Optimization in Training, Validation, and Testing

Hi, thank you for the inspiring work and your open source code!When I run the following script, I get an ValueError report the at step II. Joint Optimization in Training, Validation, and Testing:
I. Shape Pre-Training and II. Joint Optimization (training and validation)

'''

scene='hotdog_2163'
gpus='2'
model='nerfactor'
overwrite='True'
proj_root='/lyy/nerfactor'
repo_dir="$proj_root/nerfactor"
viewer_prefix='' # or just use ''

I. Shape Pre-Training

data_root="$proj_root/data/selected/$scene"
if [[ "$scene" == scan* ]]; then
# DTU scenes
imh='256'
else
imh='512'
fi
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
near='0.1'; far='2'
else
near='2'; far='6'
fi
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
use_nerf_alpha='True'
else
use_nerf_alpha='False'
fi
surf_root="$proj_root/output/surf/$scene"
shape_outdir="$proj_root/output/train/${scene}_shape"
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config='shape.ini' --config_override="data_root=$data_root,imh=$imh,near=$near,far=$far,use_nerf_alpha=$use_nerf_alpha,data_nerf_root=$surf_root,outroot=$shape_outdir,viewer_prefix=$viewer_prefix,overwrite=$overwrite"

II. Joint Optimization (training and validation)

shape_ckpt="$shape_outdir/lr1e-2/checkpoints/ckpt-2"
brdf_ckpt="$proj_root/output/train/merl/lr1e-2/checkpoints/ckpt-50"
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
xyz_jitter_std=0.001
else
xyz_jitter_std=0.01
fi
test_envmap_dir="$proj_root/data/envmaps/for-render_h16/test"
shape_mode='finetune'
outroot="$proj_root/output/train/${scene}_$model"
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config="$model.ini" --config_override="data_root=$data_root,imh=$imh,near=$near,far=$far,use_nerf_alpha=$use_nerf_alpha,data_nerf_root=$surf_root,shape_model_ckpt=$shape_ckpt,brdf_model_ckpt=$brdf_ckpt,xyz_jitter_std=$xyz_jitter_std,test_envmap_dir=$test_envmap_dir,shape_mode=$shape_mode,outroot=$outroot,viewer_prefix=$viewer_prefix,overwrite=$overwrite"

III. Simultaneous Relighting and View Synthesis (testing)

ckpt="$outroot/lr5e-3/checkpoints/ckpt-10"
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
color_correct_albedo='false'
else
color_correct_albedo='true'
fi
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/test_run.sh" "$gpus" --ckpt="$ckpt" --color_correct_albedo="$color_correct_albedo"

'''

[trainvali] For results, see:
/lyy/nerfactor/output/train/hotdog_2163_nerfactor/lr5e-3
[datasets/nerf_shape] Number of 'train' views: 100
[datasets/nerf_shape] Number of 'vali' views: 8
[models/base] Trainable layers registered:
['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[models/base] Trainable layers registered:
['net_brdf_mlp_layer0', 'net_brdf_mlp_layer1', 'net_brdf_mlp_layer2', 'net_brdf_mlp_layer3', 'net_brdf_out_layer0']
Traceback (most recent call last):
File "/lyy/nerfactor/nerfactor/nerfactor/trainvali.py", line 341, in
app.run(main)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/lyy/nerfactor/nerfactor/nerfactor/trainvali.py", line 106, in main
model = Model(config, debug=FLAGS.debug)
File "/lyy/nerfactor/nerfactor/nerfactor/models/nerfactor.py", line 68, in init
ioutil.restore_model(self.brdf_model, brdf_ckpt)
File "/lyy/nerfactor/nerfactor/nerfactor/util/io.py", line 48, in restore_model
ckpt.restore(ckpt_path).expect_partial()
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 2009, in restore
status = self._saver.restore(save_path=save_path)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 1304, in restore
checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 209, in restore
restore_ops = trackable._restore_from_checkpoint_position(self) # pylint: disable=protected-access
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 907, in _restore_from_checkpoint_position
tensor_saveables, python_saveables))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 289, in restore_saveables
validated_saveables).restore(self.save_path_tensor)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/saving/functional_saver.py", line 281, in restore
restore_ops.update(saver.restore(file_prefix))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/saving/functional_saver.py", line 103, in restore
restored_tensors, restored_shapes=None)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 647, in restore
for v in self._mirrored_variable.values))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 647, in
for v in self._mirrored_variable.values))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 392, in _assign_on_device
return variable.assign(tensor)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 846, in assign
self._shape.assert_is_compatible_with(value_tensor.shape)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 1117, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (0, 3) and (100, 3) are incompatible

The shape checkpoints are generated by step I. Shape Pre-Training and the BRDF checkpoints are downloaded from your page.
Does it mean i need to pre-train brdf model by myself?

Very much looking forward to your help!

Crash at shape pre-training

I am trying to reproduce the results but meet some problems at 'I. Shape Pre-Training'. I find the script would crash at validation of shape pre-training. It looks like a OOM issue because log says "killed" and it stop crashing if I set shuffle_buffer_size=False at shape.ini. Any suggestions would help!

I am using a machine with 4 3090 GPUs, 12 cpu cores and 60 GB memory. My dataset have 100 train data and 7 validate data. There are 120 test data, 99 train data, 99 val data at surf_root directory.

Maybe this is a potential bug.

In line 50-52 of nerfactor/nerfactor/util/vis.py,
#############################
img = xm.io.img.load(path)
img = img[:, :, :3] # discards alpha
hw = img.shape[:2]
#############################
The pred_lvis.png has no third channel similar to grey image.
So, it should be changed to this as follows:
#############################
Img = xm.io.img.load(path
if len(img.shape) == 2:
stacked_img = np.stack((img,)*3, axis=-1)
img = stacked_img
img = img[:, :, :3] # discards alpha
hw = img.shape[:2]
#############################

The pre-trained models and data provided are not sufficient to perform tests on the blender dataset

I want to render the albedo, relighting results with pre-trained nerfactor on the blender dataset without further training. However, I find the pre-trained models and data provided are not sufficient to perform tests with test.py on the blender dataset. It requires shape_ckpt, brdf_ckpt and processed data(lvis.npy, xyz.npy, alpha.png, normal.npy) of each view which are not provided.
So, does it mean I still need to do DataPreparation step and train shapemodel by myself ? Are pre-trained models provided useless?

Question about Camera Models for synthetic/real datasets

Hey, thank you for the awesome work.

I was attempting to replicate your work and realized that I had made the assumption that the camera model for the synthetic and real datasets was identical. I assume this is not true, but I had some trouble finding the camera model for the real data. Where in the code is it, or what's the difference between the synthetic and real camera models?

Running the Data Example

Hello, the results look very impressive in the paper and I'm excited to try out the repository myself but am having some issues with getting the scripts in the Read-Me to run without error.

I downloaded the data and went to the nerfactor section step 1 for learn data-driven BRDF priors. At this point I don't have data/brdf_merl_npz/ims512_envmaph16_spp1, is this something that needs to be generated with data_gen/merl/make_dataset_run.sh? If so, when I follow the instructions in the data_gen folder, I get the following error message when running data_gen/merl/make_dataset_run.sh.

Training & Validation: 0%| | 0/4 [00:00<?, ?it/s]Loading MERL-BRDF:
/mnt/c/Users/Documents/wrk_dir/data/brdf_merl/Copyright_Notice.txt
Training & Validation: 0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/mnt/c/Users/Documents/wrk_dir/code/nerfactor/data_gen/merl/make_dataset.py", line 144, in
app.run(main)
File "/home/user/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/user/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/mnt/c/Users/Documents/wrk_dir/code/nerfactor/data_gen/merl/make_dataset.py", line 75, in main
brdf = MERL(path=path)
File "/mnt/c/Users/Documents/wrk_dir/code/nerfactor/brdf/merl/merl.py", line 31, in init
cube_rgb = merl.readMERLBRDF(path) # (phi_d, theta_h, theta_d, ch)
File "/mnt/c/Users/Documents/wrk_dir/code/nerfactor/third_party/nielsen2015on/merlFunctions.py", line 19, in readMERLBRDF
BRDFVals = np.swapaxes(np.reshape(vals,(dims[2], dims[1], dims[0], 3),'F'),1,2)
File "<array_function internals>", line 6, in reshape
File "/home/user/anaconda3/envs/nerfactor/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 299, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/home/user/anaconda3/envs/nerfactor/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: cannot reshape array of size 144 into shape (808591476,1751607666,2037411651,3)

Resolved: need to move the brdfs folder out of the downloaded data and move/delete the Readme inside the folder.

Shape error at II. Joint Optimization

I successfully run Shape Pre-Training but got error at II. Joint Optimization.

2023-02-23 00:27:17.819403: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:230] Shuffle buffer filled.
INFO:tensorflow:Error reported to Coordinator: in user code:

    /home/maojiahui/conda/nerfactor_b/nerfactor/nerfactor/models/nerfactor.py:209 call  *
        normal_pred = self._pred_normal_at(xyz)
    /home/maojiahui/conda/nerfactor_b/nerfactor/nerfactor/models/shape.py:203 chunk_func  *
        normals = out_layer(mlp_layers(surf_embed))
    /home/maojiahui/conda/nerfactor_b/nerfactor/nerfactor/models/shape.py:191 chunk_apply  *
        y_chunk = func(x_chunk)
    /home/maojiahui/conda/nerfactor_b/nerfactor/nerfactor/networks/mlp.py:46 __call__  *
        y = layer(x_)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:1023 __call__  **
        self._maybe_build(inputs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:2625 _maybe_build
        self.build(input_shapes)  # pylint:disable=not-callable
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py:1198 build
        trainable=True)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:655 add_weight
        caching_device=caching_device)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py:815 _add_variable_with_custom_getter
        **kwargs_for_getter)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer_utils.py:139 make_variable
        shape=variable_shape if variable_shape else None)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:260 __call__
        return cls._variable_v1_call(*args, **kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:221 _variable_v1_call
        shape=shape)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/shared_variable_creator.py:69 create_new_variable
        v = next_creator(**kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2111 creator_with_resource_vars
        created = self._create_variable(next_creator, **kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py:538 _create_variable
        distribute_utils.VARIABLE_POLICY_MAPPING, **kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_utils.py:306 create_mirrored_variable
        value_list = real_mirrored_creator(**kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py:530 _real_mirrored_creator
        v = next_creator(**kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:67 getter
        return captured_getter(captured_previous, **kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py:752 variable_capturing_scope
        lifted_initializer_graph=lifted_initializer_graph, **kwds)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:264 __call__
        return super(VariableMetaclass, cls).__call__(*args, **kwargs)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py:293 __init__
        initial_value = initial_value()
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py:87 __call__
        self._checkpoint_position, shape, shard_info=shard_info)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py:122 __init__
        self.wrapped_value.set_shape(shape)
    /home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:1240 set_shape
        (self.shape, shape))

    ValueError: Tensor's shape (3, 128) is not compatible with supplied shape [63, 128]

I run this code with 3090Ti(tf2.5) . Looking forward to your help.

Potential bug in the Microfacet BRDF model

nerfactor/brdf/microfacet/microfacet.py

Line 57 in 0831990

g = self._get_g(pts2c, h, normal, alpha=alpha) # NxL

The shadow term is calculated as g = self._get_g(pts2c, h, normal, alpha=alpha) # NxL

However, according two Equ.(23) in the original BTDF paper it should be G(v,m)G(l,m).

Here only half part is calculated.

NaN or Inf in 'Albedo' at step II. Joint Optimization

Hi,

Great work. I am training your model on my own dataset in real-data format. However, it always reports the following error message when processing step II. Joint Optimization in Training, Validation, and Testing. Could you provide me some insight about what configuration/data format might be wrong?

Error message

Exception has occurred: InvalidArgumentError       (note: full exception trace is shown but execution is paused at: _run_module_as_main)
2 root error(s) found.
  (0) Invalid argument:  Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' : Tensor had NaN values
	 [[{{node cond/else/_1/StatefulPartitionedCall/gradient_tape/model/CheckNumerics_2}}]]
	 [[cond/else/_1/StatefulPartitionedCall/replica_1/model/assert_greater_3/Assert/AssertGuard/branch_executed/_57539/_6203]]
  (1) Invalid argument:  Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' : Tensor had NaN values
	 [[{{node cond/else/_1/StatefulPartitionedCall/gradient_tape/model/CheckNumerics_2}}]]
0 successful operations.
3 derived errors ignored. [Op:__inference_fn_with_cond_190304]

Function call stack:
fn_with_cond -> fn_with_cond
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 598, in call
    ctx=ctx)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
    self.captured_inputs)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
    return graph_function._filtered_call(args, kwargs)  # pylint: disable=protected-access
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 708, in _call
    return function_lib.defun(fn_with_cond)(*canon_args, **canon_kwds)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
    result = self._call(*args, **kwds)
  File "/home/admin/FaceReal/nerfactor/nerfactor/trainvali.py", line 181, in main
    strategy, model, batch, optimizer, global_bs_train)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
    sys.exit(main(argv))
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
    _run_main(main, args)
  File "/home/admin/FaceReal/nerfactor/nerfactor/trainvali.py", line 341, in <module>
    app.run(main)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 193, in _run_module_as_main (Current frame)
    "__main__", mod_spec)

My dataset directory looks like

root
│   transforms_test.json
│   transforms_train.json
│   transforms_val.json
│   
├───test_000
│       metadata.json
│       nn.png
│       rgba.png
│       
├───train_000
│       albedo.png
│       metadata.json
│       rgba.png
│       
├───train_001
│       metadata.json
│       rgba.png
│       
├───train_002
│       metadata.json
│       rgba.png
│       
├───train_003
│       metadata.json
│       rgba.png
│       
├───train_004
│       metadata.json
│       rgba.png
│       
├───train_005
│       metadata.json
│       rgba.png
│       
├───train_006
│       metadata.json
│       rgba.png
│       
├───train_007
│       metadata.json
│       rgba.png
│       
├───train_008
│       metadata.json
│       rgba.png
│       
├───train_009
│       metadata.json
│       rgba.png
│       
└───val_000
        metadata.json
        rgba.png

google / nerfactor Goto Github PK

nerfactor's Issues

III. Simultaneous Relighting and View Synthesis (testing)

I. Shape Pre-Training

II. Joint Optimization (training and validation)

III. Simultaneous Relighting and View Synthesis (testing)

Recommend Projects

Recommend Topics

Recommend Org