google / nerfactor Goto Github PK
View Code? Open in Web Editor NEWNeural Factorization of Shape and Reflectance Under an Unknown Illumination
Home Page: https://xiuming.info/projects/nerfactor/
License: Apache License 2.0
Neural Factorization of Shape and Reflectance Under an Unknown Illumination
Home Page: https://xiuming.info/projects/nerfactor/
License: Apache License 2.0
hi, when I train vanilla nerf, there are countless threads, and sever is about to freeze.
I found that this occurs due to https://github.com/google/nerfactor/blob/main/nerfactor/networks/embedder.py
freq_bands = 2. ** tf.linspace(0., log2_max_freq, n_freqs)
def spherify_poses(poses):
"""poses: Nx3x5 (final column contains H, W, and focal length)."""
rays_d = poses[:, :3, 2:3]
rays_o = poses[:, :3, 3:4] # because pose is camera-to-world
def p34_to_44(p):
"""p: Nx3x4."""
return np.concatenate((
p,
np.tile(
np.reshape(np.eye(4)[-1, :], (1, 1, 4)),
(p.shape[0], 1, 1)),
), 1)
def min_line_dist(rays_o, rays_d):
a_i = np.eye(3) - rays_d * np.transpose(rays_d, [0, 2, 1])
b_i = -a_i @ rays_o
pt_mindist = np.squeeze(-np.linalg.inv(
(np.transpose(a_i, [0, 2, 1]) @ a_i).mean(0)) @ (b_i).mean(0))
return pt_mindist
pt_mindist = min_line_dist(rays_o, rays_d)
center = pt_mindist
up = (poses[:, :3, 3] - center).mean(0)
vec0 = normalize(up)
vec1 = normalize(np.cross([.1, .2, .3], vec0))
vec2 = normalize(np.cross(vec0, vec1))
pos = center
c2w = np.stack([vec1, vec2, vec0, pos], 1)
poses_reset = (
np.linalg.inv(p34_to_44(c2w[None])) @ p34_to_44(poses[:, :3, :4]))
rad = np.sqrt(np.mean(np.sum(np.square(poses_reset[:, :3, 3]), -1)))
sc = 1. / rad
poses_reset[:, :3, 3] *= sc
rad *= sc
centroid = np.mean(poses_reset[:, :3, 3], 0)
zh = centroid[2]
radcircle = np.sqrt(rad ** 2 - zh ** 2)
new_poses = []
for th in np.linspace(0., 2. * np.pi, 120):
camorigin = np.array([
radcircle * np.cos(th), radcircle * np.sin(th), zh])
up = np.array([0, 0, -1.])
vec2 = normalize(camorigin)
vec0 = normalize(np.cross(vec2, up))
vec1 = normalize(np.cross(vec2, vec0))
pos = camorigin
p = np.stack([vec0, vec1, vec2, pos], 1)
new_poses.append(p)
new_poses = np.stack(new_poses, 0)
new_poses = np.concatenate([
new_poses,
np.broadcast_to(poses[0, :3, -1:], new_poses[:, :3, -1:].shape)
], -1)
poses_reset = np.concatenate([
poses_reset[:, :3, :4],
np.broadcast_to(poses[0, :3, -1:], poses_reset[:, :3, -1:].shape)
], -1)
return poses_reset, new_poses
Can you explain a little bit about min_line_dist in this function?
I think this problem is about "Find the point minimizing the distance from a set of N lines".
And I don't think this is an easy problem. The min_dist_line function seems to have implemented it quite simply. Could you explain a little bit?
Thank you for open-sourcing your code, it is indeed a great job. I have a question regarding its usage. I have used your code to render my own dataset, but I noticed that the rendering speed is relatively slow, taking about 2 minutes per image. Therefore, I would like to inquire if you have any suggestions on how to accelerate the rendering process using GPU?
Thanks for the released code! I would like to ask whether we can extract mesh by marching cubes. Is this function implemented? If not, how can we access the density field of NeRF in the model? After a quick look into the code, it seems that the system directly read xyz, normal etc. from input, and it is not straightforward to get the density values.
Congrats to a nice paper!
Would it be possible to access the NerFactor output images for the view synthesis comparison (column III, row "NerfFactor") in Table 1 (the eight validation images for each of the four scenes), in order to generate other metrics, check the metric and visual quality per image and per scene.
Thanks for the great work and data!
I am transfering the camera pose data to other format. But neighter metadata.json nor *_camera.json had no intrinsic data inside(focal length and cx, cy).
Also I used .blender's focal data, but either way not worked for me, maybe this is because .blender file use different extrinsic matrix.
So can you tell me your intrinsic values or the way to know those values? thx
Hi @xiumingzhang, Thank you for your great work.
I just tested the code on the Shiny Blender dataset from the Ref-NeRF paper but go wired rendering results(all white) while training the vanilla NeRF.
Shiny Blender teapot:
Here is the dataset in NeRFactor format.
I use the same settings (near, far, learning rate...) as NeRF synthetic dataset in your README.
I am wondering if I miss anything or feed the wrong config setting to the network.
Thank you in advance!
im trying to run the code on dtu scan without mvs shape. I encountered a shape mismatch when enumerating the dataset. I followed the instructions on trainning vanilla nerf and computing geometry buffers for dtu scan.
`2023-10-28 23:21:29.816339: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2023-10-28 23:21:29.863132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:85:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2023-10-28 23:21:29.874349: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2023-10-28 23:21:30.084292: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-10-28 23:21:30.248526: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-10-28 23:21:30.293372: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2023-10-28 23:21:30.489140: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2023-10-28 23:21:30.528962: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2023-10-28 23:21:30.827244: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-10-28 23:21:30.828401: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2023-10-28 23:21:30.829422: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2023-10-28 23:21:30.857546: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 2400105000 Hz
2023-10-28 23:21:30.859352: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c210dd2680 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2023-10-28 23:21:30.859387: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2023-10-28 23:21:30.961714: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55c210dd5100 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2023-10-28 23:21:30.961809: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2023-10-28 23:21:30.963299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:85:00.0 name: Tesla V100-SXM2-16GB computeCapability: 7.0
coreClock: 1.53GHz coreCount: 80 deviceMemorySize: 15.78GiB deviceMemoryBandwidth: 836.37GiB/s
2023-10-28 23:21:30.963357: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2023-10-28 23:21:30.963389: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2023-10-28 23:21:30.963417: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2023-10-28 23:21:30.963444: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2023-10-28 23:21:30.963470: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2023-10-28 23:21:30.963504: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2023-10-28 23:21:30.963531: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2023-10-28 23:21:30.964088: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2023-10-28 23:21:30.964137: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2023-10-28 23:21:30.967326: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2023-10-28 23:21:30.967377: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2023-10-28 23:21:30.967403: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2023-10-28 23:21:30.968545: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14902 MB memory) -> physical GPU (device: 0, name: Tesla V100-SXM2-16GB, pci bus id: 0000:85:00.0, compute capability: 7.0)
INFO:tensorflow:�[32mUsing MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)�[0m
I1028 23:21:30.974194 47943187769408 mirrored_strategy.py:500] �[32m�[32mUsing MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)�[0m�[0m
�[36m[util/io] Output directory already exisits:
/project2/tsui/nerfactor/output/train/scan37_shape/lr1e-2�[0m
�[35m[util/io] Output directory wiped:
/project2/tsui/nerfactor/output/train/scan37_shape/lr1e-2�[0m
�[36m[trainvali] For results, see:
/project2/tsui/nerfactor/output/train/scan37_shape/lr1e-2�[0m
�[36m[datasets/nerf_shape] Number of 'train' views: 47�[0m
�[36m[datasets/nerf_shape] Number of 'vali' views: 2�[0m
�[36m[models/base] Trainable layers registered:
['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']�[0m
�[36m[trainvali] Started from scratch�[0m
Training epochs: 0%| | 0/200 [00:00<?, ?it/s]2023-10-28 23:21:35.393537: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,512]
2023-10-28 23:21:35.393675: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,3]
2023-10-28 23:21:35.393760: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,3]
2023-10-28 23:21:35.393836: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341]
2023-10-28 23:21:35.393917: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,3]
2023-10-28 23:21:35.393997: W tensorflow/core/framework/op_kernel.cc:1753] OP_REQUIRES failed at gather_nd_op.cc:47 : Invalid argument: indices[1022] = [60, 497] does not index into param shape [256,341,3]
Training epochs: 0%| | 0/200 [00:00<?, ?it/s]
shape: <tensorflow.python.distribute.input_lib.DistributedDataset object at 0x2b9b1fe38518>
Traceback (most recent call last):
File "/project2/tsui/nerfactor/code/nerfactor/trainvali.py", line 342, in
app.run(main)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/project2/tsui/nerfactor/code/nerfactor/trainvali.py", line 179, in main
for batch_i, batch in enumerate(datapipe_train):
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/input_lib.py", line 296, in next
return self.get_next()
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/input_lib.py", line 328, in get_next
global_has_value, replicas = _get_next_as_optional(self, self._strategy)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/input_lib.py", line 192, in _get_next_as_optional
iterator._iterators[i].get_next_as_list(new_name)) # pylint: disable=protected-access
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/input_lib.py", line 1132, in get_next_as_list
data_list = self._iterator.get_next_as_optional()
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/data/ops/multi_device_iterator_ops.py", line 601, in get_next_as_optional
iterator_ops.get_next_as_optional(self._device_iterators[i]))
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/data/ops/iterator_ops.py", line 833, in get_next_as_optional
iterator.element_spec)), iterator.element_spec)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_dataset_ops.py", line 2444, in iterator_get_next_as_optional
_ops.raise_from_not_ok_status(e, name)
File "/home/tsui/project/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[1022] = [60, 497] does not index into param shape [256,341,512]
[[{{node GatherNd_7}}]]
[[MultiDeviceIteratorGetNextFromShard]]
[[RemoteCall]] [Op:IteratorGetNextAsOptional]
`
what could be the problem? Thanks
Hi i am trying to run the geometry buffer script on my dataset and its firstly taking 2-3 hrs to show any output and is then throwing errors.
My command:
bash geometry_from_nerf_run.sh 0 --data_root="/scratch/darthgera123/nerf/woman_data/" --trained_nerf="/scratch/darthgera123/nerf/woman_nerf/lr5e-4/" --out_root="/scratch/darthgera123/nerf/woman_geometry/" --imh=512 --scene_bbox=-0.3,0.3,-0.3,0.3,-0.3,0.3 --occu_thres=0.5 --mlp_chunk=3750
The error:
2021-08-09 16:04:55.509013: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10210 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5)
2021-08-09 16:04:55.512131: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55f55866a9d0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-09 16:04:55.512172: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
Views (train): 0%| | 0/304 [00:00<?, ?it/s]2021-08-09 16:05:05.594129: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-09 17:51:19.840422: E tensorflow/core/grappler/clusters/utils.cc:87] Failed to get device properties, error code: 999
As mentioned I initially ket the mlp_chunk high till I got OOM error but now its throwing this weird error. Please help @xiumingzhang @cdibona @dberlin
I train successfully in shape pre-training but stuck in joint optimization.
2022-09-27 02:30:25.358618: E tensorflow/core/kernels/check_numerics_op.cc:289] abnormal_detected_host @0x7f43f6808a00 = {1, 0} Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo'
tensorflow.python.framework.errors_impl.InvalidArgumentError: 2 root error(s) found.
(0) Invalid argument: Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' : Tensor had NaN values
[[node gradient_tape/model/CheckNumerics (defined at tmp/tmp398ckawp.py:22) ]]
[[Identity_6/_372]]
(1) Invalid argument: Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' : Tensor had NaN values
[[node gradient_tape/model/CheckNumerics (defined at tmp/tmp398ckawp.py:22) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference_distributed_train_step_45946]
Hi,xiuming
Great job! i want to run this work, but i meet some problems when i run geometry_from_nerf.py , could you help me? and i didn't find the trained models in your data from project website, so do you publish the trained model for testing later?
Thanks! looking forward to your reply.
WARNING:tensorflow:10 out of the last 10 calls to <function pfor..f at 0x7f6d021b9bf8> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
W0911 12:21:18.196791 140106441705280 def_function.py:126] 10 out of the last 10 calls to <function pfor..f at 0x7f6d021b9bf8> triggered tf.function retracing. Tracing is expensive and the excessive number of tracings could be due to (1) creating @tf.function repeatedly in a loop, (2) passing tensors with different shapes, (3) passing Python objects instead of tensors. For (1), please define your @tf.function outside of the loop. For (2), @tf.function has experimental_relax_shapes=True option that relaxes argument shapes that can avoid unnecessary retracing. For (3), please refer to https://www.tensorflow.org/guide/function#controlling_retracing and https://www.tensorflow.org/api_docs/python/tf/function for more details.
Hi, thanks for your impressing work.
I am trying to re-implement your code, however I didn't find the data-download link for your processed MERL-BRDF dataset. It seems the public-available source databse is different from that one used in your codes.
I also noticed the code under nerfactor/brdf, but the readme file said we needn't run it directly. However it truly leads to errors when directly run the shell command.
Can you please share the download link of your processed MERL-BRDF dataset, or your processing code
I am trying to replace NeRF with my own geometry. The script trainvali_mvs_run.sh
for joint optimisation uses shape_mvs
dataset, which in turn needs the same alpha
, normals
, and lvis
.
I couldn't find any script to generate these geometry buffers from MVS geometry instead of NeRF. Could you please help ?
Hi
Thanks for the very comprehensive repo however I don't understand how to run the scripts and how to set the arguments
@cdibona @xiumingzhang @google-admin please help
Hi,
Do you have python script to render nerf's orginal blend files?
Best,
How to get the angle view direction and light direction in the paper?
Hi,
I was trying to follow the first step here to get BRDF priors, but I am getting the following error:
$ REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config='brdf.ini' --config_override="data_root=$data_root,outroot=$outroot,viewer_prefix=$viewer_prefix"
2021-08-18 12:21:52.126973: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2021-08-18 12:21:52.165148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:68:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-08-18 12:21:52.165428: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-18 12:21:52.171059: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-18 12:21:52.173348: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-08-18 12:21:52.173750: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-08-18 12:21:52.176352: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-08-18 12:21:52.186737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-08-18 12:21:52.192472: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-08-18 12:21:52.194714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-08-18 12:21:52.195193: I tensorflow/core/platform/cpu_feature_guard.cc:143] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2021-08-18 12:21:52.203031: I tensorflow/core/platform/profile_utils/cpu_utils.cc:102] CPU Frequency: 3299990000 Hz
2021-08-18 12:21:52.203763: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x7f68dc000b60 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2021-08-18 12:21:52.203782: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2021-08-18 12:21:52.290736: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x564ac481a330 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2021-08-18 12:21:52.290795: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): GeForce RTX 2080 Ti, Compute Capability 7.5
2021-08-18 12:21:52.292470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1561] Found device 0 with properties:
pciBusID: 0000:68:00.0 name: GeForce RTX 2080 Ti computeCapability: 7.5
coreClock: 1.545GHz coreCount: 68 deviceMemorySize: 10.76GiB deviceMemoryBandwidth: 573.69GiB/s
2021-08-18 12:21:52.292553: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-18 12:21:52.292584: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-08-18 12:21:52.292611: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10
2021-08-18 12:21:52.292637: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10
2021-08-18 12:21:52.292663: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10
2021-08-18 12:21:52.292690: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10
2021-08-18 12:21:52.292717: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7
2021-08-18 12:21:52.294454: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1703] Adding visible gpu devices: 0
2021-08-18 12:21:52.294508: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.1
2021-08-18 12:21:52.295132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1102] Device interconnect StreamExecutor with strength 1 edge matrix:
2021-08-18 12:21:52.295142: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1108] 0
2021-08-18 12:21:52.295148: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1121] 0: N
2021-08-18 12:21:52.296183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1247] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 10150 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080 Ti, pci bus id: 0000:68:00.0, compute capability: 7.5)
INFO:tensorflow:Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
I0818 12:21:52.299249 140098729092928 mirrored_strategy.py:500] Using MirroredStrategy with devices ('/job:localhost/replica:0/task:0/device:GPU:0',)
[util/io] Output directory already exisits:
/home/jiwonchoi/nerfactor/output/train/merl/lr1e-2
[util/io] Overwrite is off, so doing nothing
[trainvali] For results, see:
/home/jiwonchoi/nerfactor/output/train/merl/lr1e-2
Traceback (most recent call last):
File "/home/jiwonchoi/nerfactor/nerfactor/trainvali.py", line 341, in <module>
app.run(main)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/jiwonchoi/nerfactor/nerfactor/trainvali.py", line 81, in main
dataset_train = Dataset(config, 'train', debug=FLAGS.debug)
File "/home/jiwonchoi/nerfactor/nerfactor/datasets/brdf_merl.py", line 52, in __init__
mats = np.random.choice(self.brdf_names, n_iden, replace=False)
File "mtrand.pyx", line 908, in numpy.random.mtrand.RandomState.choice
ValueError: 'a' cannot be empty unless no samples are taken
I double checked my paths. Not sure where this error has originated from.
Hi, I find NeRFactor results very impressive; it really is an outstanding and inspiring work.
Would it be possibile for you to share the trained weights you have used to obtain such results?
I would like to include NeRFactor scene representations into my path-tracer and having them available to compare the results would surely speed-up my work.
Thank you!
I've seen there is a constant named brdf_scale to scale the brdf value. Where does this constant come from and what its value is?
Thanks!
Hi, in general NeRFactor is trully an outstanding and inspiring work.
However, when I run the code with the default scripts and settings you privided under nerfactor/, the result, especially the testing relighting result, is not that satisfied, as compared to the figures in your paper:
I set the 'ims' and 'imh' to 512 in all those experiments, is there any settings needs to be changed, like total number of iterations or learning rates, when run the code? Or is there anything else you suppose may lead to this performance?
Thanks!
Hi, i downloaded some hdr from poly heaven. But i find some differences between the hdr downloaded and the hdr provided. The max value of the sunset i download is about 300000, but the provide hdr is about 80. Is there any pre-process while dealing with the new hdr map?
Hi, thanks to your nice work.
However, I get error when I prepare to train vanilla NeRF following the instruction.
The error is printed as follow:
[trainvali] For results, see:
/home/linxiong/nerfactor/output/train/hotdog_nerf/lr5e-4
[datasets/nerf] Number of 'train' views: 100
Traceback (most recent call last):
File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 348, in
app.run(main)
File "/home/linxiong/.local/lib/python3.8/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/linxiong/.local/lib/python3.8/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/linxiong/nerfactor/nerfactor/trainvali.py", line 91, in main
datapipe_train = dataset_train.build_pipeline(no_batch=no_batch)
File "../nerfactor/datasets/base.py", line 115, in build_pipeline
dataset = dataset.map(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 1623, in map
return ParallelMapDataset(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 4016, in init
self._map_func = StructuredFunctionWrapper(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3221, in init
self._function = wrapper_fn.get_concrete_function()
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2531, in get_concrete_function
graph_function = self._get_concrete_function_garbage_collected(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2496, in _get_concrete_function_garbage_collected
graph_function, args, kwargs = self._maybe_define_function(args, kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2657, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3214, in wrapper_fn
ret = _wrapper_helper(*args)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/data/ops/dataset_ops.py", line 3156, in _wrapper_helper
ret = autograph.tf_convert(func, ag_ctx)(*nested_args)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 580, in call
result = self._call(*args, **kwds)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 627, in _call
self._initialize(args, kwds, add_initializers_to=initializers)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 505, in _initialize
self._stateful_fn._get_concrete_function_internal_garbage_collected( # pylint: disable=protected-access
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2446, in _get_concrete_function_internal_garbage_collected
graph_function, _, _ = self._maybe_define_function(args, kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2777, in _maybe_define_function
graph_function = self._create_graph_function(args, kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 2657, in _create_graph_function
func_graph_module.func_graph_from_py_func(
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 981, in func_graph_from_py_func
func_outputs = python_func(*func_args, **func_kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/def_function.py", line 441, in wrapped_fn
return weak_wrapped_fn().wrapped(*args, **kwds)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/eager/function.py", line 3299, in bound_method_wrapper
return wrapped_fn(*args, **kwargs)
File "/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/func_graph.py", line 968, in wrapper
raise e.ag_error_metadata.to_exception(e)
NotImplementedError: in user code:
/home/linxiong/nerfactor/nerfactor/datasets/nerf.py:111 _process_example_postcache *
rayo, rayd, rgb = self._sample_rays(self.rayo, self.rayd, self.rgb)
/home/linxiong/nerfactor/nerfactor/datasets/nerf.py:130 _sample_rays *
coords = tf.stack(
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:3391 meshgrid **
mult_fact = ones(shapes, output_dtype)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2967 ones
output = _constant_if_small(one, shape, dtype, name)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/ops/array_ops.py:2662 _constant_if_small
if np.prod(shape) < 1000:
<__array_function__ internals>:5 prod
/home/linxiong/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:3051 prod
return _wrapreduction(a, np.multiply, 'prod', axis, dtype, out,
/home/linxiong/.local/lib/python3.8/site-packages/numpy/core/fromnumeric.py:86 _wrapreduction
return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
/home/linxiong/.local/lib/python3.8/site-packages/tensorflow/python/framework/ops.py:748 __array__
raise NotImplementedError("Cannot convert a symbolic Tensor ({}) to a numpy"
NotImplementedError: Cannot convert a symbolic Tensor (meshgrid/Size:0) to a numpy array.
Any suggestion ?
I really appreciate your work and open source, but when I run the following code, I get an ERR: OOM
‘’‘
scene='hotdog_2163'
gpus='4,5,6,7'
model='nerfactor'
overwrite='True'
proj_root='/mnt/data1/jy/NeRFactor'
repo_dir="$proj_root/nerfactor" # /mnt/data1/jy/NeRFactor/nerfactor/
outroot="$proj_root/output/train/${scene}_$model"
viewer_prefix='http://vision38.csail.mit.edu' # or just use ''
ckpt="$outroot/lr5e-3/checkpoints/ckpt-10"
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
color_correct_albedo='false'
else
color_correct_albedo='true'
fi
REPO_DIR="$proj_root" "$proj_root/nerfactor/test_run.sh" "$gpus" --ckpt="$ckpt" --color_correct_albedo="$color_correct_albedo"
‘’’
My GPU is 2080Ti (with 11G memory) , when I want to lower batch size, I find 'no_batch = True' in 'lr5e-3.ini'. So how can I successfully run 'test.py' on 2080Ti.
Very much looking forward to your help
Hello,
According to paper, MLPs skip connection is on layer 2 (starting from 0 is the third one) but using nerfactor I've seen strange behaviour so I checked the code. Network .call() in mlp.py does the following:
x_ = x + 0 # make a copy
for i, layer in enumerate(self.layers):
y = layer(x_)
if i in self.skip_at:
y = tf.concat((y, x), -1)
x_ = y
return y
So the concatenation is applied after calling the layer and therefore the true skip connection is at the next layer (the fourth one).
It keeps out of memory at II. Joint Optimization in Training, Validation, and Testing. Note that it is NOT out of GPU memory but cpu memory. And it seems to happen at for batch_i, batch in enumerate(datapipe_train):
I run on a machine with 1 3090 GPU, 20 cpu cores, 80 GB memory. Any suggestion would help!
hi,xiuming.
I want to learn how to set up the value of 'ims' when converting the brdf dataset.
and is there a relationship between 'ims' and 'imh'?
thanks in advance!
Hi,
I have downloaded the released results, and noticed that the background for the relit images is different for each envmap lighting. Where does the background color come from?
Hi, xiuming,thanks for your outstanding work and resources!
When I run the code with the scripts, settings and the your data provided in google drive, I got wrong result.
Maybe I set a wrong learning rate or wrong path of dataset? I found train set and validation set without nn.png. Could you provide me some suggestions? Thanks!
My dataset directory:
/home/kf/nerfactor/data/selected/hotdog_2163
│ transforms_test.json
│ transforms_train.json
│ transforms_val.json
│
├───test_000
│ albedo.png
│ diffuse-color.exr
│ metadata.json
│ nn.png
│ normal.exr
│ normal.png
│ refball-normal.exr
│ refball-normal.png
│ rgba_city.png
│ rgba_courtyard.png
│ rgba_forest.png
│ rgba_interior.png
│ rgba_night.png
│ rgba_olat-0000-0000.png
│ rgba_olat-0000-0008.png
│ rgba_olat-0000-0016.png
│ rgba_olat-0000-0024.png
│ rgba_olat-0004-0000.png
│ rgba_olat-0004-0008.png
│ rgba_olat-0004-0016.png
│ rgba_olat-0004-0024.png
│ rgba_studio.png
│ rgba_sunrise.png
│ rgba_sunset.png
│ rgba.png
│
├───test_001
│ .......
├───test_199
│ .......
├───train_000
│ albedo.png
│ diffuse-color.exr
│ metadata.json
│ normal.exr
│ normal.png
│ refball-normal.exr
│ refball-normal.png
│ rgba_city.png
│ rgba_courtyard.png
│ rgba_forest.png
│ rgba_interior.png
│ rgba_night.png
│ rgba_olat-0000-0000.png
│ rgba_olat-0000-0008.png
│ rgba_olat-0000-0016.png
│ rgba_olat-0000-0024.png
│ rgba_olat-0004-0000.png
│ rgba_olat-0004-0008.png
│ rgba_olat-0004-0016.png
│ rgba_olat-0004-0024.png
│ rgba_studio.png
│ rgba_sunrise.png
│ rgba_sunset.png
│ rgba.png
│
├───train_001
│
├───train_002
│ metadata.json
│ rgba.png
│
├───train_002
│ .......
├───train_099
│
├───val_000
│ albedo.png
│ diffuse-color.exr
│ metadata.json
│ normal.exr
│ normal.png
│ refball-normal.exr
│ refball-normal.png
│ rgba_city.png
│ rgba_courtyard.png
│ rgba_forest.png
│ rgba_interior.png
│ rgba_night.png
│ rgba_olat-0000-0000.png
│ rgba_olat-0000-0008.png
│ rgba_olat-0000-0016.png
│ rgba_olat-0000-0024.png
│ rgba_olat-0004-0000.png
│ rgba_olat-0004-0008.png
│ rgba_olat-0004-0016.png
│ rgba_olat-0004-0024.png
│ rgba_studio.png
│ rgba_sunrise.png
│ rgba_sunset.png
│ rgba.png
│
├───val_001
│ ........
└───val_007
Hi,
With your debugging help done in other issues, I was able to get up to the last step.
I have 3 questions about this last step:
gpus='0'
), I am getting OOM allocation error so I assigned three 2080ti here. Is this an acceptable approach? Because you did not seem to allow allocating multi-gpus for calculating geometry buffers. Also, should I consider using imh=256
instead of 512 to reduce memory usage?tensorflow.python.framework.errors_impl.ResourceExhaustedError:
OOM when allocating tensor with shape[68361728,3] and type float on /job:localhost/replica:0/t
ask:0/device:GPU:0 by allocator GPU_0_bfc [Op:Mul]
$ bash ./script.sh
. However this causes the error saying that it cannot find the ckpt-2
and ckpt-10
files that should be pre-existed. So I separated three scripts, and was able to get up to the shape pre-training and joint optimization process. I hope my execution did not cause the below tensorflow warning regarding:The calling iterator did not fully read the dataset being cached.
In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded.
This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`.
You should use `dataset.take(k).cache().repeat()` instead.
[test] Restoring trained model
[models/base] Trainable layers registered:
['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[models/base] Trainable layers registered:
['net_brdf_mlp_layer0', 'net_brdf_mlp_layer1', 'net_brdf_mlp_layer2', 'net_brdf_mlp_layer3', 'net_brdf_out_layer0']
[models/base] Trainable layers registered:
['net_albedo_mlp_layer0', 'net_albedo_mlp_layer1', 'net_albedo_mlp_layer2', 'net_albedo_mlp_layer3', 'net_albedo_out_layer0', 'net_brdf_z_mlp_layer0', 'net_brdf_z_mlp_layer1', 'net_brdf_z_mlp_layer2', 'net_brdf_z_mlp_layer3', 'net_brdf_z_out_layer0', 'net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[test] Running inference
Inferring Views: 0%| | 0/200 [00:00<?, ?it/s]
2021-09-14 01:46:33.905210: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10
2021-09-14 01:47:05.401366: W tensorflow/core/kernels/data/cache_dataset_ops.cc:794] The calling iterator did not fully read the dataset being cached. In order to avoid unexpected truncation of the dataset, the partially cached contents of the dataset will be discarded. This can happen if you have an input pipeline similar to `dataset.cache().take(k).repeat()`. You should use `dataset.take(k).cache().repeat()` instead.
Inferring Views: 0%| | 0/200 [02:22<?, ?it/s]
Traceback (most recent call last):
File "/home/jiwonchoi/code/nerfactor/nerfactor/test.py", line 209, in <module>
app.run(main)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/jiwonchoi/code/nerfactor/nerfactor/test.py", line 192, in main
brdf_z_override=brdf_z_override)
File "/home/jiwonchoi/code/nerfactor/nerfactor/models/nerfactor.py", line 266, in call
relight_probes=relight_probes)
File "/home/jiwonchoi/code/nerfactor/nerfactor/models/nerfactor.py", line 362, in _render
rgb_probes = tf.concat([x[:, None, :] for x in rgb_probes], axis=1)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/util/dispatch.py", line 180, in wrapper
return target(*args, **kwargs)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/array_ops.py", line 1606, in concat
return gen_array_ops.concat_v2(values=values, axis=axis, name=name)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 1181, in concat_v2
_ops.raise_from_not_ok_status(e, name)
File "/home/jiwonchoi/.conda/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 6653, in raise_from_not_ok_status
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: OpKernel 'ConcatV2' has constraint on attr 'T' not in NodeDef '[N=0, Tidx=DT_INT32]', KernelDef: 'op: "ConcatV2" device_type: "GPU" constraint { name: "T" allowed_values { list { type: DT_INT32 } } } host_memory_arg: "values" host_memory_arg: "axis" host_memory_arg: "output"' [Op:ConcatV2] name: concat
It seems like rgb_probes = tf.concat([x[:, None, :] for x in rgb_probes], axis=1)
this line of code causes the issue. Not sure how to debug this.
Thank you in advance.
Hi, thank you for the inspiring work and your open source code!When I run the following script, I get an ValueError report the at step II. Joint Optimization in Training, Validation, and Testing:
I. Shape Pre-Training and II. Joint Optimization (training and validation)
'''
scene='hotdog_2163'
gpus='2'
model='nerfactor'
overwrite='True'
proj_root='/lyy/nerfactor'
repo_dir="$proj_root/nerfactor"
viewer_prefix='' # or just use ''
data_root="$proj_root/data/selected/$scene"
if [[ "$scene" == scan* ]]; then
# DTU scenes
imh='256'
else
imh='512'
fi
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
near='0.1'; far='2'
else
near='2'; far='6'
fi
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
use_nerf_alpha='True'
else
use_nerf_alpha='False'
fi
surf_root="$proj_root/output/surf/$scene"
shape_outdir="$proj_root/output/train/${scene}_shape"
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config='shape.ini' --config_override="data_root=$data_root,imh=$imh,near=$near,far=$far,use_nerf_alpha=$use_nerf_alpha,data_nerf_root=$surf_root,outroot=$shape_outdir,viewer_prefix=$viewer_prefix,overwrite=$overwrite"
shape_ckpt="$shape_outdir/lr1e-2/checkpoints/ckpt-2"
brdf_ckpt="$proj_root/output/train/merl/lr1e-2/checkpoints/ckpt-50"
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
xyz_jitter_std=0.001
else
xyz_jitter_std=0.01
fi
test_envmap_dir="$proj_root/data/envmaps/for-render_h16/test"
shape_mode='finetune'
outroot="$proj_root/output/train/${scene}_$model"
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/trainvali_run.sh" "$gpus" --config="$model.ini" --config_override="data_root=$data_root,imh=$imh,near=$near,far=$far,use_nerf_alpha=$use_nerf_alpha,data_nerf_root=$surf_root,shape_model_ckpt=$shape_ckpt,brdf_model_ckpt=$brdf_ckpt,xyz_jitter_std=$xyz_jitter_std,test_envmap_dir=$test_envmap_dir,shape_mode=$shape_mode,outroot=$outroot,viewer_prefix=$viewer_prefix,overwrite=$overwrite"
ckpt="$outroot/lr5e-3/checkpoints/ckpt-10"
if [[ "$scene" == pinecone || "$scene" == vasedeck || "$scene" == scan* ]]; then
# Real scenes: NeRF & DTU
color_correct_albedo='false'
else
color_correct_albedo='true'
fi
REPO_DIR="$repo_dir" "$repo_dir/nerfactor/test_run.sh" "$gpus" --ckpt="$ckpt" --color_correct_albedo="$color_correct_albedo"
'''
[trainvali] For results, see:
/lyy/nerfactor/output/train/hotdog_2163_nerfactor/lr5e-3
[datasets/nerf_shape] Number of 'train' views: 100
[datasets/nerf_shape] Number of 'vali' views: 8
[models/base] Trainable layers registered:
['net_normal_mlp_layer0', 'net_normal_mlp_layer1', 'net_normal_mlp_layer2', 'net_normal_mlp_layer3', 'net_normal_out_layer0', 'net_lvis_mlp_layer0', 'net_lvis_mlp_layer1', 'net_lvis_mlp_layer2', 'net_lvis_mlp_layer3', 'net_lvis_out_layer0']
[models/base] Trainable layers registered:
['net_brdf_mlp_layer0', 'net_brdf_mlp_layer1', 'net_brdf_mlp_layer2', 'net_brdf_mlp_layer3', 'net_brdf_out_layer0']
Traceback (most recent call last):
File "/lyy/nerfactor/nerfactor/nerfactor/trainvali.py", line 341, in
app.run(main)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 308, in run
_run_main(main, args)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 254, in _run_main
sys.exit(main(argv))
File "/lyy/nerfactor/nerfactor/nerfactor/trainvali.py", line 106, in main
model = Model(config, debug=FLAGS.debug)
File "/lyy/nerfactor/nerfactor/nerfactor/models/nerfactor.py", line 68, in init
ioutil.restore_model(self.brdf_model, brdf_ckpt)
File "/lyy/nerfactor/nerfactor/nerfactor/util/io.py", line 48, in restore_model
ckpt.restore(ckpt_path).expect_partial()
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 2009, in restore
status = self._saver.restore(save_path=save_path)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 1304, in restore
checkpoint=checkpoint, proto_id=0).restore(self._graph_view.root)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 209, in restore
restore_ops = trackable._restore_from_checkpoint_position(self) # pylint: disable=protected-access
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py", line 907, in _restore_from_checkpoint_position
tensor_saveables, python_saveables))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/util.py", line 289, in restore_saveables
validated_saveables).restore(self.save_path_tensor)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/saving/functional_saver.py", line 281, in restore
restore_ops.update(saver.restore(file_prefix))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/saving/functional_saver.py", line 103, in restore
restored_tensors, restored_shapes=None)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 647, in restore
for v in self._mirrored_variable.values))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 647, in
for v in self._mirrored_variable.values))
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/values.py", line 392, in _assign_on_device
return variable.assign(tensor)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/resource_variable_ops.py", line 846, in assign
self._shape.assert_is_compatible_with(value_tensor.shape)
File "/home/ly/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/tensor_shape.py", line 1117, in assert_is_compatible_with
raise ValueError("Shapes %s and %s are incompatible" % (self, other))
ValueError: Shapes (0, 3) and (100, 3) are incompatible
The shape checkpoints are generated by step I. Shape Pre-Training and the BRDF checkpoints are downloaded from your page.
Does it mean i need to pre-train brdf model by myself?
Very much looking forward to your help!
I am trying to reproduce the results but meet some problems at 'I. Shape Pre-Training'. I find the script would crash at validation of shape pre-training. It looks like a OOM issue because log says "killed" and it stop crashing if I set shuffle_buffer_size=False
at shape.ini. Any suggestions would help!
I am using a machine with 4 3090 GPUs, 12 cpu cores and 60 GB memory. My dataset have 100 train data and 7 validate data. There are 120 test data, 99 train data, 99 val data at surf_root
directory.
In line 50-52 of nerfactor/nerfactor/util/vis.py,
#############################
img = xm.io.img.load(path)
img = img[:, :, :3] # discards alpha
hw = img.shape[:2]
#############################
The pred_lvis.png has no third channel similar to grey image.
So, it should be changed to this as follows:
#############################
Img = xm.io.img.load(path
if len(img.shape) == 2:
stacked_img = np.stack((img,)*3, axis=-1)
img = stacked_img
img = img[:, :, :3] # discards alpha
hw = img.shape[:2]
#############################
I want to render the albedo, relighting results with pre-trained nerfactor on the blender dataset without further training. However, I find the pre-trained models and data provided are not sufficient to perform tests with test.py
on the blender dataset. It requires shape_ckpt
, brdf_ckpt
and processed data(lvis.npy
, xyz.npy
, alpha.png
, normal.npy
) of each view which are not provided.
So, does it mean I still need to do DataPreparation step and train shapemodel
by myself ? Are pre-trained models provided useless?
Hey, thank you for the awesome work.
I was attempting to replicate your work and realized that I had made the assumption that the camera model for the synthetic and real datasets was identical. I assume this is not true, but I had some trouble finding the camera model for the real data. Where in the code is it, or what's the difference between the synthetic and real camera models?
Hello, the results look very impressive in the paper and I'm excited to try out the repository myself but am having some issues with getting the scripts in the Read-Me to run without error.
I downloaded the data and went to the nerfactor section step 1 for learn data-driven BRDF priors. At this point I don't have data/brdf_merl_npz/ims512_envmaph16_spp1, is this something that needs to be generated with data_gen/merl/make_dataset_run.sh? If so, when I follow the instructions in the data_gen folder, I get the following error message when running data_gen/merl/make_dataset_run.sh.
Training & Validation: 0%| | 0/4 [00:00<?, ?it/s]Loading MERL-BRDF:
/mnt/c/Users/Documents/wrk_dir/data/brdf_merl/Copyright_Notice.txt
Training & Validation: 0%| | 0/4 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/mnt/c/Users/Documents/wrk_dir/code/nerfactor/data_gen/merl/make_dataset.py", line 144, in
app.run(main)
File "/home/user/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/user/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/mnt/c/Users/Documents/wrk_dir/code/nerfactor/data_gen/merl/make_dataset.py", line 75, in main
brdf = MERL(path=path)
File "/mnt/c/Users/Documents/wrk_dir/code/nerfactor/brdf/merl/merl.py", line 31, in init
cube_rgb = merl.readMERLBRDF(path) # (phi_d, theta_h, theta_d, ch)
File "/mnt/c/Users/Documents/wrk_dir/code/nerfactor/third_party/nielsen2015on/merlFunctions.py", line 19, in readMERLBRDF
BRDFVals = np.swapaxes(np.reshape(vals,(dims[2], dims[1], dims[0], 3),'F'),1,2)
File "<array_function internals>", line 6, in reshape
File "/home/user/anaconda3/envs/nerfactor/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 299, in reshape
return _wrapfunc(a, 'reshape', newshape, order=order)
File "/home/user/anaconda3/envs/nerfactor/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
return bound(*args, **kwds)
ValueError: cannot reshape array of size 144 into shape (808591476,1751607666,2037411651,3)
Resolved: need to move the brdfs folder out of the downloaded data and move/delete the Readme inside the folder.
I successfully run Shape Pre-Training but got error at II. Joint Optimization.
2023-02-23 00:27:17.819403: I tensorflow/core/kernels/data/shuffle_dataset_op.cc:230] Shuffle buffer filled.
INFO:tensorflow:Error reported to Coordinator: in user code:
/home/maojiahui/conda/nerfactor_b/nerfactor/nerfactor/models/nerfactor.py:209 call *
normal_pred = self._pred_normal_at(xyz)
/home/maojiahui/conda/nerfactor_b/nerfactor/nerfactor/models/shape.py:203 chunk_func *
normals = out_layer(mlp_layers(surf_embed))
/home/maojiahui/conda/nerfactor_b/nerfactor/nerfactor/models/shape.py:191 chunk_apply *
y_chunk = func(x_chunk)
/home/maojiahui/conda/nerfactor_b/nerfactor/nerfactor/networks/mlp.py:46 __call__ *
y = layer(x_)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:1023 __call__ **
self._maybe_build(inputs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:2625 _maybe_build
self.build(input_shapes) # pylint:disable=not-callable
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/layers/core.py:1198 build
trainable=True)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py:655 add_weight
caching_device=caching_device)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py:815 _add_variable_with_custom_getter
**kwargs_for_getter)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer_utils.py:139 make_variable
shape=variable_shape if variable_shape else None)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:260 __call__
return cls._variable_v1_call(*args, **kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:221 _variable_v1_call
shape=shape)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/shared_variable_creator.py:69 create_new_variable
v = next_creator(**kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_lib.py:2111 creator_with_resource_vars
created = self._create_variable(next_creator, **kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py:538 _create_variable
distribute_utils.VARIABLE_POLICY_MAPPING, **kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/distribute_utils.py:306 create_mirrored_variable
value_list = real_mirrored_creator(**kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/distribute/mirrored_strategy.py:530 _real_mirrored_creator
v = next_creator(**kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:67 getter
return captured_getter(captured_previous, **kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py:752 variable_capturing_scope
lifted_initializer_graph=lifted_initializer_graph, **kwds)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/ops/variables.py:264 __call__
return super(VariableMetaclass, cls).__call__(*args, **kwargs)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py:293 __init__
initial_value = initial_value()
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py:87 __call__
self._checkpoint_position, shape, shard_info=shard_info)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/training/tracking/base.py:122 __init__
self.wrapped_value.set_shape(shape)
/home/maojiahui/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/framework/ops.py:1240 set_shape
(self.shape, shape))
ValueError: Tensor's shape (3, 128) is not compatible with supplied shape [63, 128]
I run this code with 3090Ti(tf2.5) . Looking forward to your help.
nerfactor/brdf/microfacet/microfacet.py
Line 57 in 0831990
However, according two Equ.(23) in the original BTDF paper it should be G(v,m)G(l,m).
Here only half part is calculated.
Hi,
Great work. I am training your model on my own dataset in real-data format. However, it always reports the following error message when processing step II. Joint Optimization
in Training, Validation, and Testing. Could you provide me some insight about what configuration/data format might be wrong?
Error message
Exception has occurred: InvalidArgumentError (note: full exception trace is shown but execution is paused at: _run_module_as_main)
2 root error(s) found.
(0) Invalid argument: Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' : Tensor had NaN values
[[{{node cond/else/_1/StatefulPartitionedCall/gradient_tape/model/CheckNumerics_2}}]]
[[cond/else/_1/StatefulPartitionedCall/replica_1/model/assert_greater_3/Assert/AssertGuard/branch_executed/_57539/_6203]]
(1) Invalid argument: Not a number (NaN) or infinity (Inf) values detected in gradient. b'Albedo' : Tensor had NaN values
[[{{node cond/else/_1/StatefulPartitionedCall/gradient_tape/model/CheckNumerics_2}}]]
0 successful operations.
3 derived errors ignored. [Op:__inference_fn_with_cond_190304]
Function call stack:
fn_with_cond -> fn_with_cond
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
inputs, attrs, num_outputs)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 598, in call
ctx=ctx)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1746, in _call_flat
ctx, args, cancellation_manager=cancellation_manager))
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 1665, in _filtered_call
self.captured_inputs)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/function.py", line 2420, in __call__
return graph_function._filtered_call(args, kwargs) # pylint: disable=protected-access
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 708, in _call
return function_lib.defun(fn_with_cond)(*canon_args, **canon_kwds)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/tensorflow/python/eager/def_function.py", line 580, in __call__
result = self._call(*args, **kwds)
File "/home/admin/FaceReal/nerfactor/nerfactor/trainvali.py", line 181, in main
strategy, model, batch, optimizer, global_bs_train)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 258, in _run_main
sys.exit(main(argv))
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/site-packages/absl/app.py", line 312, in run
_run_main(main, args)
File "/home/admin/FaceReal/nerfactor/nerfactor/trainvali.py", line 341, in <module>
app.run(main)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 96, in _run_module_code
mod_name, mod_spec, pkg_name, script_name)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 263, in run_path
pkg_name=pkg_name, script_name=fname)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
File "/home/admin/anaconda3/envs/nerfactor/lib/python3.6/runpy.py", line 193, in _run_module_as_main (Current frame)
"__main__", mod_spec)
My dataset directory looks like
root
│ transforms_test.json
│ transforms_train.json
│ transforms_val.json
│
├───test_000
│ metadata.json
│ nn.png
│ rgba.png
│
├───train_000
│ albedo.png
│ metadata.json
│ rgba.png
│
├───train_001
│ metadata.json
│ rgba.png
│
├───train_002
│ metadata.json
│ rgba.png
│
├───train_003
│ metadata.json
│ rgba.png
│
├───train_004
│ metadata.json
│ rgba.png
│
├───train_005
│ metadata.json
│ rgba.png
│
├───train_006
│ metadata.json
│ rgba.png
│
├───train_007
│ metadata.json
│ rgba.png
│
├───train_008
│ metadata.json
│ rgba.png
│
├───train_009
│ metadata.json
│ rgba.png
│
└───val_000
metadata.json
rgba.png
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.