Git Product home page Git Product logo

taichi-dev / taichi-nerfs Goto Github PK

View Code? Open in Web Editor NEW
722.0 14.0 50.0 153.65 MB

Implementations of NeRF variants based on Taichi + PyTorch

License: Apache License 2.0

Python 45.12% Shell 1.16% Jupyter Notebook 5.88% CMake 4.03% C++ 21.71% C 19.75% Objective-C 0.41% Objective-C++ 1.93%
nerf neural-radiance-field taichi 3d-reconstruction computer-graphics computer-vision machine-learning neural-network real-time real-time-rendering dreamfusion

taichi-nerfs's Introduction

Taichi NeRFs

A PyTorch + Taichi implementation of instant-ngp NeRF training pipeline. For more details about modeling, please checkout this article on our blog site.

Installation

  1. Install PyTorch by python -m pip install torch torchvision --extra-index-url https://download.pytorch.org/whl/cu116 (update the url with your installed CUDA Toolkit version number).
  2. Install taichi nightly via pip install -U pip && pip install -i https://pypi.taichi.graphics/simple/ taichi-nightly.
  3. Install requirements by pip install -r requirements.txt.
  4. If you plan to train with your own video, please install colmap via sudo apt install colmap or follow instructions at https://colmap.github.io/install.html.

Train with preprocessed datasets

Synthetic NeRF

Download Synthetic NeRF dataset and unzip it. Please keep the folder name unchanged.

We also provide a script to train the Lego scene from scratch, and display an interactive GUI at the end of the training.

./scripts/train_nsvf_lego.sh

Performance is measured on a Ubuntu 20.04 with an RTX3090 GPU.

Scene avg PSNR Training Time(20 epochs) GPU
Lego 35.0 208s RTX3090

To reach the best performance, here are the steps to follow:

  1. Your work station is running on Linux and has RTX 3090 Graphics card
  2. Follow the steps in Installation Section
  3. Uncomment --half2_opt to enable half2 optimization in the script, then ./scripts/train_nsvf_lego.sh. For now, half2 optimization is only supported on Linux with Graphics Card Architecture >Pascal.

360_v2 dataset

Download 360 v2 dataset and unzip it. Please keep the folder name unchanged. The default batch_size=8192 takes up to 18GB RAM on a RTX3090. Please adjust batch_size according to your hardware spec.

./scripts/train_360_v2_garden.sh

Train with your own video

Place your video in data folder and pass the video path to the script. There are several key parameters for producing a sound dataset for NeRF training. For a real scene, scale is recommended to set to 16. video_fps determines the number of images generated from the video, typically 150~200 images are sufficient. For a one minute video, 2 is a suitable number. Running this script will preprocess your video and start training a NeRF out of it:

./scripts/train_from_video.sh -v {your_video_name} -s {scale} -f {video_fps}

Mobile Deployment

Taichi NGP Deployment

Using Taichi AOT, you can easily deploy a NeRF rendering pipeline on any mobile devices!

We're able to achieve real-time interactive on iOS devices.

Performance iPad Pro (M1) iPhone 14 Pro Max iPhone 14
Taichi Instant NGP 22.4 fps 18 fps 13.5 fps

Stay tuned, more cool demos are on the way! For business inquiries, please reach out us at [email protected].

Text to 3D

Taichi-nerf serves as a new backend of the text-to-3D project stable-dreamfusion.

Frequently asked questions (FAQ)

Q: Is CUDA the only supported Taichi backend? How about vulkan backend?

A: For the most efficient interop with PyTorch CUDA backend, training is mostly tested with Taichi CUDA backend. However it's pretty straightforward to switch to Taichi vulkan backend if interop is removed, check out this awesome taichi-ngp inference demo!

Q: I got OOM(Out of Memory) error on my GPU, what can I do?

A: Reduce batch_size passed to train.py! By default it's 8192 which fits a RTX3090, you should reduce this accordingly. For instance, batch_size=2048 is recommended on a RTX3060Ti.

Acknowledgement

The PyTorch interface of the training pipeline and colmap preprocessing are highly referred to:

taichi-nerfs's People

Contributors

ailzhang avatar asxcvbn avatar cxzhou35 avatar erizmr avatar erjanmx avatar identxxy avatar jiahaoplus avatar jim19930609 avatar linyou avatar rexwangcc avatar taichi-gardener avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

taichi-nerfs's Issues

train for ngp datasets failed

I tried to reconstruct a instant-ngp dataset and got a fairly problematic result shown below. I re-created transforms.json with data/colmap2nerf.py and the -dataset_name ngp method still don't work. However, switching to -dataset_name colmap works. Maybe camera position was misinterpreted in ngp method here

图片

grad_scaler.scale(loss).backward

grad_scaler.scale(loss).backward()

hi,thanks for your amazing work,however,when I run the line grad_scaler.scale(loss).backward() ,it stop without any prompt in debug env.
In the command line env, it says"./scripts/train_nsvf_lego.sh: line 12: 3368608 Segmentation fault".

运行taichi-nerfs时候的错误

[Taichi] version 1.6.0, llvm 15.0.4, commit c8a31d35, linux, python 3.7.12
/opt/conda/lib/python3.7/site-packages/torch/amp/autocast_mode.py:202: UserWarning: User provided device_type of 'cuda', but CUDA is not available. Disabling
warnings.warn('User provided device_type of 'cuda', but CUDA is not available. Disabling')
Traceback (most recent call last):
File "train.py", line 16, in
from modules.networks import TaichiNGP
File "/kaggle/working/taichi-nerfs/modules/networks.py", line 10, in
from .hash_encoder import HashEncoder
File "/kaggle/working/taichi-nerfs/modules/hash_encoder.py", line 7, in
from .utils import (data_type, ti2torch, ti2torch_grad, ti2torch_grad_vec,
File "/kaggle/working/taichi-nerfs/modules/utils.py", line 91, in
def morton3D_invert_kernel(indices: ti.types.ndarray(field_dim=1),
File "/opt/conda/lib/python3.7/site-packages/taichi/types/ndarray_type.py", line 88, in init
"The field_dim argument for ndarray type is already deprecated. Please use ndim instead."
ValueError: The field_dim argument for ndarray type is already deprecated. Please use ndim instead.

请问需要修改什么解决报错

RuntimeError

RuntimeError: [auto_diff.cpp:visit@57] Assertion failure: stmt->dest->is()

pytorch_lightning Strategy error

(taichi-nerfs) E:\source\py\taichi-nerfs>python train.py --root_dir ./Synthetic_NeRF/Lego --exp_name Lego --perf --num_epochs 20 --batch_size 8192 --lr 1e-2 --no_save_test --gui
[Taichi] version 1.6.0, llvm 15.0.1, commit 98574106, win, python 3.8.16
[Taichi] Starting on arch=cuda
[W 04/05/23 13:48:29.986 66204] [memory_pool.cpp:taichi::lang::MemoryPool::MemoryPool@43] Missing CUDA implementation
GridEncoding: Nmin=16 b=1.31951 F=2 T=2^19 L=16
per_level_scale:  1.3195079107728942
offset_:  5710032
total_hash_size:  11420064
Traceback (most recent call last):
  File "train.py", line 291, in <module>
    trainer = Trainer(
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\utilities\argparse.py", line 69, in insert_env_defaults
    return fn(self, **kwargs)
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 393, in __init__
    self._accelerator_connector = _AcceleratorConnector(
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py", line 140, in __init__
    self._check_config_and_set_final_flags(
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\connectors\accelerator_connector.py", line 206, in _check_config_and_set_final_flags
    raise ValueError(
ValueError: You selected an invalid strategy name: `strategy=None`. It must be either a string or an instance of `pytorch_lightning.strategies.Strategy`. Example choices: auto, ddp, ddp_spawn, deepspeed, ... Find a complete list of options in our documentation at https://lightning.ai

libcuda.so lib not found. (other projects where cuda is needed work well)

[Taichi] version 1.8.0, llvm 15.0.4, commit 52b24f3e, linux, python 3.8.18
[W 03/03/24 17:05:31.950 2179169] [cuda_driver.cpp:load_lib@36] libcuda.so lib not found.
[W 03/03/24 17:05:31.950 2179169] [misc.py:adaptive_arch_select@758] Arch=[<Arch.cuda: 3>] is not supported, falling back to CPU
[Taichi] Starting on arch=x64
Loading 161 train images ...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████| 161/161 [00:05<00:00, 31.34it/s]
Loading 24 test images ...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 37.24it/s]
Hash Encoder: base_res=16 max_res=4096 hash_level=16 feat_per_level=2 per_level_scale=0.36967849629863747 total_hash_size=6299960
Failed to import apex FusedAdam, use torch Adam instead.
./scripts/train_360_v2_garden.sh: line 11: 2179169 Bus error (core dumped) python3 train.py --root_dir $ROOT_DIR/garden --dataset_name colmap --exp_name garden --downsample $DOWNSAMPLE --scale 8.0 --batch_size 4096

I have tried reducing batch_size but it didn't work.
I wonder why there's waring "libcuda.so lib not found."

crashed when finished train lego dataset?

Thanks for your great work, I found that I can not train success follow the README documents, It seems that program will crashed after train all epoches.

OS : ubuntu 20.04, 3060Ti 12GB
Mem: 16GB
taichi: taichi-nightly 1.5.0.post20230307

# run command
 ./scripts/train_nsvf_lego.sh

error message are show in the "details"

Epoch 19: 100%|███████████████total training time: 347.69████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1000/1000 [00:17<00:00, 58.52it/s, loss=0.000223] Traceback (most recent call last): File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 990, in _try_get_data data = self._data_queue.get(timeout=timeout) File "/usr/lib/python3.8/queue.py", line 179, in get self.not_empty.wait(remaining) File "/usr/lib/python3.8/threading.py", line 306, in wait gotit = waiter.acquire(True, timeout) File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/_utils/signal_handling.py", line 66, in handler _error_if_any_worker_fails() RuntimeError: DataLoader worker (pid 31182) is killed by signal: Killed.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "train.py", line 307, in
trainer.fit(system)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 608, in fit
call._call_and_handle_interrupt(
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/call.py", line 38, in _call_and_handle_interrupt
return trainer_fn(*args, **kwargs)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 650, in _fit_impl
self._run(model, ckpt_path=self.ckpt_path)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1112, in _run
results = self._run_stage()
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1191, in _run_stage
self._run_train()
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/trainer/trainer.py", line 1214, in _run_train
self.fit_loop.run()
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/fit_loop.py", line 267, in advance
self._outputs = self.epoch_loop.run(self._data_fetcher)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 200, in run
self.on_advance_end()
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 250, in on_advance_end
self._run_validation()
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/training_epoch_loop.py", line 308, in _run_validation
self.val_loop.run()
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/dataloader/evaluation_loop.py", line 152, in advance
dl_outputs = self.epoch_loop.run(self._data_fetcher, dl_max_batches, kwargs)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/loop.py", line 199, in run
self.advance(*args, **kwargs)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/loops/epoch/evaluation_epoch_loop.py", line 121, in advance
batch = next(data_fetcher)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/fetching.py", line 184, in next
return self.fetching_function()
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/fetching.py", line 265, in fetching_function
self._fetch_next_batch(self.dataloader_iter)
File "/home/dlkou/.local/lib/python3.8/site-packages/pytorch_lightning/utilities/fetching.py", line 280, in _fetch_next_batch
batch = next(iterator)
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 521, in next
data = self._next_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1186, in _next_data
idx, data = self._get_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1142, in _get_data
success, data = self._try_get_data()
File "/usr/local/lib/python3.8/dist-packages/torch/utils/data/dataloader.py", line 1003, in _try_get_data
raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
RuntimeError: DataLoader worker (pid(s) 31182) exited unexpectedly

How do I find bbox.txt values when using a custom dataset?

Hi, not sure what all your bbox values mean, how do you come up with your own? I formatted my data to be exactly like that of the datasets in the SyntheticNeRF folder, but I am sure how to make my own "bbox.txt", or what the values mean.

Thanks :)

bad result when deployment

train own dataset like graden, normal training got psnr 27. But add --deployment, psnr is 22, the image is blurred.

error: Assertion failure: prealloc_size <= total_mem when I try to train the Lego

RTX3050 mobile, Ubuntu 22.04.2 LTS x86_64

I don't know how to solve this problem, I try to change the patch_size into 2048 even 512 in train_nsvf_lego.sh and train.py but it can't work for me.

What can I do next.

Screenshot from 2023-04-20 00-36-17

./scripts/train_nsvf_lego.sh
[Taichi] version 1.7.0, llvm 15.0.4, commit 40ba02a3, linux, python 3.10.6
[Taichi] Starting on arch=cuda
[E 04/20/23 00:28:48.932 11717] [llvm_runtime_executor.cpp:materialize_runtime@570] Assertion failure: prealloc_size <= total_mem

Traceback (most recent call last):
File "/home/zzz/lec/taichi-nerfs/train.py", line 279, in
taichi_init(hparams)
File "/home/zzz/lec/taichi-nerfs/train.py", line 273, in taichi_init
ti.init(**taichi_init_args)
File "/home/zzz/.local/lib/python3.10/site-packages/taichi/lang/misc.py", line 455, in init
impl.get_runtime().prog.materialize_runtime()
RuntimeError: [llvm_runtime_executor.cpp:materialize_runtime@570] Assertion failure: prealloc_size <= total_mem

colmap unrecognised option '--output_path'

==== running: mkdir colmap_sparse
==== running: colmap mapper --database_path colmap.db --image_path "images" --output_path colmap_sparse
ERROR: Failed to parse options: unrecognised option '--output_path'.
FATAL: command failed
~/dl-learning/taichi/taichi-nerfs
请问这是什么问题?

batch_size改为1依旧OOM

我的GPU是2070,8G显存,batch_size改为1依旧爆显存,是本来对显存的要求大还是我设置得有问题?

about export

hi, can I export the result with point cloud or mesh but not the video?

valueError: could not convert string to float:

all .txt files are empy, cause following error when run ./scripts/train_nsvf_lego.sh

[Taichi] version 1.7.0, llvm 15.0.4, commit daeb0139, linux, python 3.10.12
[Taichi] Starting on arch=cuda
self.root_dir is ./Synthetic_NeRF/Lego
line is
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /opt/ml/volume/default/work/taichi-nerfs/train.py:312 in │
│ │
│ 309 │ │ ).render() │
│ 310 │
│ 311 if name == 'main': │
│ ❱ 312 │ main() │
│ 313 │
│ │
│ /opt/ml/volume/default/work/taichi-nerfs/train.py:62 in main │
│ │
│ 59 │ │
│ 60 │ # datasets │
│ 61 │ dataset = dataset_dict[hparams.dataset_name] │
│ ❱ 62 │ train_dataset = dataset( │
│ 63 │ │ root_dir=hparams.root_dir, │
│ 64 │ │ split=hparams.split, │
│ 65 │ │ downsample=hparams.downsample, │
│ │
│ /opt/ml/volume/default/work/taichi-nerfs/datasets/nsvf.py:18 in init
│ │
│ 15 │ def init(self, root_dir, split='train', downsample=1.0, **kwargs): │
│ 16 │ │ super().init(root_dir, split, downsample) │
│ 17 │ │ │
│ ❱ 18 │ │ self.read_intrinsics() │
│ 19 │ │ │
│ 20 │ │ if kwargs.get('read_meta', True): │
│ 21 │ │ │ xyz_min, xyz_max = \ │
│ │
│ /opt/ml/volume/default/work/taichi-nerfs/datasets/nsvf.py:41 in read_intrinsics │
│ │
│ 38 │ │ │ with open(os.path.join(self.root_dir, 'intrinsics.txt')) as f: │
│ 39 │ │ │ │ line = f.readline() │
│ 40 │ │ │ │ print("line is{}".format(line)) │
│ ❱ 41 │ │ │ │ fx = fy = float(line.split()[0]) * self.downsample │
│ 42 │ │ │ if 'Synthetic' in self.root_dir: │
│ 43 │ │ │ │ w = h = int(800 * self.downsample) │
│ 44 │ │ │ else: │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: could not convert string to float:
'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x
00\x00'

NotImplementedError: Support for `validation_epoch_end` has been removed in v2.0.0.

(taichi-nerfs) E:\source\py\taichi-nerfs>python train.py --root_dir ./Synthetic_NeRF/Lego --exp_name Lego --perf --num_epochs 20 --batch_size 8192 --lr 1e-2 --no_save_test --gui
[Taichi] version 1.6.0, llvm 15.0.1, commit 98574106, win, python 3.8.16
[Taichi] Starting on arch=cuda
[W 04/05/23 13:54:46.404 57464] [memory_pool.cpp:taichi::lang::MemoryPool::MemoryPool@43] Missing CUDA implementation
GridEncoding: Nmin=16 b=1.31951 F=2 T=2^19 L=16
per_level_scale:  1.3195079107728942
offset_:  5710032
total_hash_size:  11420064
Using 16bit Automatic Mixed Precision (AMP)
GPU available: True (cuda), used: True
TPU available: False, using: 0 TPU cores
IPU available: False, using: 0 IPUs
HPU available: False, using: 0 HPUs
Traceback (most recent call last):
  File "train.py", line 307, in <module>
    trainer.fit(system)
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 520, in fit
    call._call_and_handle_interrupt(
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\call.py", line 44, in _call_and_handle_interrupt
    return trainer_fn(*args, **kwargs)
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 559, in _fit_impl
    self._run(model, ckpt_path=ckpt_path)
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\trainer.py", line 883, in _run
    _verify_loop_configurations(self)
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py", line 38, in _verify_loop_configurations
    __verify_train_val_loop_configuration(trainer, model)
  File "E:\Users\wangc\miniconda3\envs\taichi-nerfs\lib\site-packages\pytorch_lightning\trainer\configuration_validator.py", line 86, in __verify_train_val_loop_configuration
    raise NotImplementedError(
NotImplementedError: Support for `validation_epoch_end` has been removed in v2.0.0. `NeRFSystem` implements this method. You can use the `on_validation_epoch_end` hook instead. To access outputs, save them in-memory as instance attributes. You can find migration examples in https://github.com/Lightning-AI/lightning/pull/16520.

raymarching 问题请教

您好,我在自己的代码中使用的是 torch-npg 中的 raymarching 方法,如果想使用您这里的方法,能直接将您实现的 raymarching 方法替换 torch-ngp 中的 raymarching 方法吗(包括训练和测试)

iOS demo issue

Hi Thank you for awesome project for all. I have a one issue that demo for iOS
I was run the code :

1. python3 train.py  
   --root_dir ./data/Synthetic_NeRF/Chair
   --max_steps 20000 --batch_size 8192 --lr 1e-2 
   --deployment --deployment_model_path=. 
2. python3 deployment/InstantNGP/taichi_ngp/taichi_ngp.py --aot --model_path=./deployment.npy --res_w=300 --res_h=600
  1. edit code using Xcode Viewcontroller.m 69:

NSString *aotFilePath = [bundleRoot stringByAppendingPathComponent:@"taichi_ngp/compiled"];
(compiled is generated folder from 2)

  1. build Xcode

Then I got error like this:

[I 05/22/23 14:52:53.314 2775915] [taichi_core_impl.cpp:ti_create_runtime@246] Taichi Runtime C-API version is: 1005000
2023-05-22 14:52:53.315954+0900 TaichiNGPDemo[6094:2775915] Metal GPU Frame Capture Enabled
2023-05-22 14:52:53.316143+0900 TaichiNGPDemo[6094:2775915] Metal API Validation Enabled
[E 05/22/23 14:52:53.626 2775915] [runtime.cpp:host_to_device@117] Device does not support arg type=unknown

[W 05/22/23 14:52:53.627 2775915] [taichi_core_impl.cpp:ti_set_last_error@227] C-API error: (invalid state)[runtime.cpp:host_to_device@117] Device does not support arg type=unknown
libc++abi: terminating with uncaught exception of type std::runtime_error: Memory allocation failed
[runtime.cpp:host_to_device@117] Device does not support arg type=unknown
(lldb)

can I get some solutions?

./scripts/train_nsvf_lego.sh: line 11: 18824 Segmentation fault

Thanks for your great work, I encountered the following error when running the ./scripts/train_nsvf_lego.sh script:
My GPU is RTX3090 and the system is Ubuntu 18.04.6 LTS.
error message are show in the "details":
[Taichi] version 1.7.0, llvm 15.0.4, commit a992f22e, linux, python 3.9.16
[Taichi] Starting on arch=cuda
Loading 100 train images ...
100it [00:02, 38.78it/s]
Loading 200 test images ...
200it [00:05, 39.27it/s]
Hash Encoder: base_res=16 max_res=1024 hash_level=16 feat_per_level=2 per_level_scale=0.2772588722239781 total_hash_size=5710032
Failed to import apex FusedAdam, use torch Adam instead.
[W 07/06/23 20:09:27.367 19027] [type_check.cpp:type_check_store@37] [$13858] Local store may lose precision: f16 <- f32

[W 07/06/23 20:09:27.367 19027] [type_check.cpp:type_check_store@37] [$13883] Local store may lose precision: f16 <- f32

[W 07/06/23 20:09:27.367 19027] [type_check.cpp:type_check_store@37] [$13908] Local store may lose precision: f16 <- f32

elapsed_time=2.19s | step=0 | psnr=10.84 | loss=0.082505 | rays=8192 | rm_s=246.9 | vr_s=246.9 |
elapsed_time=18.99s | step=1000 | psnr=28.49 | loss=0.001417 | rays=8192 | rm_s=26.7 | vr_s=14.3 |
elapsed_time=31.39s | step=2000 | psnr=31.23 | loss=0.000753 | rays=8192 | rm_s=24.6 | vr_s=9.8 |
elapsed_time=43.47s | step=3000 | psnr=31.70 | loss=0.000675 | rays=8192 | rm_s=24.9 | vr_s=9.0 |
elapsed_time=55.69s | step=4000 | psnr=32.43 | loss=0.000572 | rays=8192 | rm_s=24.3 | vr_s=8.3 |
elapsed_time=68.28s | step=5000 | psnr=33.79 | loss=0.000418 | rays=8192 | rm_s=22.9 | vr_s=8.1 |
elapsed_time=81.02s | step=6000 | psnr=33.98 | loss=0.000400 | rays=8192 | rm_s=23.7 | vr_s=7.1 |
elapsed_time=93.21s | step=7000 | psnr=34.45 | loss=0.000359 | rays=8192 | rm_s=23.5 | vr_s=7.0 |
elapsed_time=105.40s | step=8000 | psnr=35.15 | loss=0.000305 | rays=8192 | rm_s=23.7 | vr_s=7.0 |
elapsed_time=118.02s | step=9000 | psnr=35.66 | loss=0.000272 | rays=8192 | rm_s=23.4 | vr_s=6.7 |
elapsed_time=130.00s | step=10000 | psnr=35.29 | loss=0.000296 | rays=8192 | rm_s=23.0 | vr_s=6.6 |
elapsed_time=142.20s | step=11000 | psnr=34.47 | loss=0.000357 | rays=8192 | rm_s=23.4 | vr_s=6.6 |
elapsed_time=154.58s | step=12000 | psnr=35.71 | loss=0.000269 | rays=8192 | rm_s=24.0 | vr_s=6.5 |
elapsed_time=166.82s | step=13000 | psnr=36.06 | loss=0.000248 | rays=8192 | rm_s=23.7 | vr_s=6.6 |
elapsed_time=179.23s | step=14000 | psnr=36.39 | loss=0.000230 | rays=8192 | rm_s=22.9 | vr_s=6.3 |
elapsed_time=191.38s | step=15000 | psnr=36.37 | loss=0.000231 | rays=8192 | rm_s=23.4 | vr_s=6.4 |
elapsed_time=203.60s | step=16000 | psnr=36.54 | loss=0.000222 | rays=8192 | rm_s=23.4 | vr_s=6.6 |
elapsed_time=216.32s | step=17000 | psnr=37.19 | loss=0.000191 | rays=8192 | rm_s=23.1 | vr_s=6.5 |
elapsed_time=229.25s | step=18000 | psnr=37.12 | loss=0.000194 | rays=8192 | rm_s=22.7 | vr_s=6.1 |
elapsed_time=241.81s | step=19000 | psnr=37.59 | loss=0.000174 | rays=8192 | rm_s=22.7 | vr_s=6.4 |
elapsed_time=253.84s | step=20000 | psnr=37.10 | loss=0.000195 | rays=8192 | rm_s=22.5 | vr_s=6.1 |
evaluating: 0%| | 0/200 [00:00<?, ?it/s][W 07/06/23 20:13:40.575 18824] [type_check.cpp:type_check_store@37] [$28398] Global store may lose precision: u8 <- i32
File "/data2/gwx/taichi-nerfs/modules/ray_march.py", line 254, in raymarching_test_kernel:
valid_mask[idx] = 1
^^^^^^^^^^^^^^^^^^^

evaluating: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 200/200 [00:10<00:00, 19.48it/s]
evaluation: psnr_avg=34.72092056274414 | ssim_avg=0.9757876992225647
[Taichi] Starting on arch=cuda
Loading 100 train images ...
100it [00:02, 39.17it/s]
Hash Encoder: base_res=16 max_res=1024 hash_level=16 feat_per_level=2 per_level_scale=0.2772588722239781 total_hash_size=5710032
loading ckpt from: results/model.pth
./scripts/train_nsvf_lego.sh: 行 11: 18824 段错误 (核心已转储) python3 train.py --root_dir $DATA_DIR/Lego --exp_name Lego --batch_size 8192 --lr 1e-2 --gui

Other camera model supports

Hi, would there be any plans of implementing other camera models? The original NGP supports other OPENCV and radial distortion models. Would be great to see them here too!
If we were to deploy them ourselves, any tips on where to look at? Maybe in the direction calculation+get_rays?

Autodiff runtime Error

I want to change the kernel function at step ray_marching since I want to get the gradient for xyzs.
I reimplement it like volumn rendering but I get the runtime error.

image
`
class RayMarchingRenderer(torch.nn.Module):

def __init__(self):
    super(RayMarchingRenderer, self).__init__()

    self._raymarching_rendering_kernel = raymarching_train_kernel
    class _module_function(torch.autograd.Function):

        @staticmethod
        def forward(
                ctx, 
                rays_o, 
                rays_d, 
                hits_t, 
                density_bitfield, 
                cascades,
                scale, 
                exp_step_factor, 
                grid_size, 
                max_samples
            ):
            noise = torch.rand_like(rays_o[:, 0])
            counter = torch.zeros(
                2,
                device=rays_o.device,
                dtype=torch.int32
            )
            rays_a = torch.empty(
                rays_o.shape[0], 3,
                device=rays_o.device,
                dtype=torch.int32,
            )
            xyzs = torch.empty(
                rays_o.shape[0] * max_samples, 3,
                device=rays_o.device,
                dtype=torch_type,
                requires_grad=True
            )
            dirs = torch.empty(
                rays_o.shape[0] * max_samples, 3,
                device=rays_o.device,
                dtype=torch_type,
                requires_grad=True
            )
            deltas = torch.empty(
                rays_o.shape[0] * max_samples,
                device=rays_o.device,
                dtype=torch_type,
            )
            ts = torch.empty(
                rays_o.shape[0] * max_samples,
                device=rays_o.device,
                dtype=torch_type,
            )
            
            raymarching_train_kernel(
                rays_o, 
                rays_d,
                hits_t,
                density_bitfield, 
                noise, 
                counter,
                rays_a,
                xyzs,
                dirs,
                deltas,
                ts,
                cascades, grid_size, scale,
                exp_step_factor, max_samples
            )

            # total samples for all rays
            total_samples = counter[0]  
            # remove redundant output
            xyzs = xyzs[:total_samples]
            dirs = dirs[:total_samples]
            deltas = deltas[:total_samples]
            ts = ts[:total_samples]
            
            ctx.save_for_backward(
                rays_o, 
                rays_d,
                hits_t,
                density_bitfield, 
                noise, 
                counter,
                rays_a,
                xyzs,
                dirs,
                deltas,
                ts,
            )
            ctx.cascades=cascades
            ctx.grid_size=grid_size
            ctx.scale=scale
            ctx.exp_step_factor=exp_step_factor
            ctx.max_samples=max_samples
            return rays_a, xyzs, dirs, deltas, ts, total_samples

        @staticmethod
        def backward(
                ctx, 
                dL_drays_a, 
                dL_dxyzs, 
                dL_ddirs,
                dL_ddeltas,
                dL_dts,
                dL_dtotal_samples
            ):
            
            cascades=ctx.cascades
            grid_size=ctx.grid_size
            scale=ctx.scale
            exp_step_factor=ctx.exp_step_factor
            max_samples=ctx.max_samples
            (
                rays_o, 
                rays_d,
                hits_t,
                density_bitfield, 
                noise, 
                counter,
                rays_a,
                xyzs,
                dirs,
                deltas,
                ts,
            ) = ctx.saved_tensors
            # put the gradients into the tensors before calling the grad kernel
            rays_a.grad = dL_drays_a
            xyzs.grad = dL_dxyzs
            dirs.grad = dL_ddirs
            deltas.grad=dL_ddeltas
            ts.grad =dL_dts
            # total_samples.grad=dL_dtotal_samples

            self._raymarching_rendering_kernel.grad(
                rays_o, 
                rays_d,
                hits_t,
                density_bitfield, 
                noise, 
                counter,
                rays_a,
                xyzs,
                dirs,
                deltas,
                ts,
                cascades, grid_size, scale,
                exp_step_factor,max_samples
            )

            return rays_o.grad, rays_d.grad, None, None, None, None, None, xyzs.grad, dirs.grad, deltas.grad, ts.grad, None, None, None, None, None

    self._module_function = _module_function.apply

def forward(
        self, 
        rays_o, 
        rays_d, 
        hits_t, 
        density_bitfield, 
        cascades,
        scale, 
        exp_step_factor, 
        grid_size, 
        max_samples
    ):
    return self._module_function(
        rays_o.contiguous(), 
        rays_d.contiguous(), 
        hits_t.contiguous(), 
        density_bitfield, 
        cascades,
        scale, 
        exp_step_factor, 
        grid_size, 
        max_samples
    )

`

请教问题:density_grid为什么有多个level

density_grid设置多个level,代表的尺寸由小到达,是为了找到最合适物体的大小吗?
那么是否只有一个level实际起作用?
另外,如果物体本身并不在中心坐标原点附近会怎样呢?
不知理解对不对,是否能回答我的疑问

self.register_buffer( 'density_grid', torch.zeros(self.cascades, self.grid_size**3), )

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.