harry-zhi / semantic_nerf Goto Github PK

View Code? Open in Web Editor NEW

418.0 418.0 54.0 3.93 MB

The implementation of "In-Place Scene Labelling and Understanding with Implicit Scene Representation" [ICCV 2021].

License: Other

Python 100.00%

semantic_nerf's People

Contributors

Stargazers

Watchers

semantic_nerf's Issues

Generating the traj_w_c file for custom dataset

Thank you for your great contribution and for sharing this wonderful code.

I am trying to run the semantic nerf on my custom dataset where I have a bunch of images on which I wish to train this network. How do I generate the traj_w_c file as well as other required files?

Thank you in advance for your help.

Indexing error when generating rays

Thanks for the graceful codebase. When switching the depth type from z-dim to euclidean, the error occurs due to this line:

semantic_nerf/SSR/models/rays.py

Line 57 in bb98f10

dirs = dirs * (1. / norm)[:, :, :, None]

The proper way might be
dirs = dirs * (1. / norm)
dirs = dirs[:, :, :, None]

Question of randomly generated camera poses in Replica

Hi,

Thank you for the wonderful work. You said camera poses of Replica are randomly generated in #5 . However, the images rendered seem to be in a smooth trajectory instead of produced from random camera poses. May I ask what is the algorithm used to generate the trajectory?

Thanks!

Correlation bwteen traj_w_c.txt and transform.json from nerfstudio

Hello, thanks for the great work,
I am processing Replica dataset with nerfstudio and I am trying to correlate your traj_w_c.txt with nerfstudio's transform.json but it doesn't make sense. I tried even to maintain camera coordinate system to OpenCv but I didnt find similarities between the two files. Is there any way to produce your traj_w_c.txt with nerfstudio?
Thank you!

Mesh reconstruction for semantic_nerf for Outdoor scenes

Hi, I am able to implement semantic nerf on the outdoor unbounded scene by following your kind suggestions vide #38 (comment)

I have good results for semantic rendered images during training but now I am facing problem with mesh reconstruction (snapshot attached). I will be glad to have suggestions/guidance regarding it.

The tesing visualization issue

Hi! Thanks for your great job again! It is really a cool idea to integrate the semantic in the nerf model. And I meet some trouble in the visualization, I can train perfectly now on my computer, but when I try to visualize the mesh there is always error of CUDA out of memory, i try to reduce the chunk and the netchunk to 8 but it still occurs, did you have any idea? FYI I notice that in the visualization you use another test.yaml, is it possible that you share this test.yaml?

differences between replica and replica nyu datasets

Hi,
So appreciate for your sharing and it's really amazing.
I have two questions about the dataset.

What's the differences between replica and replica nyu datasets? Some differences on semantic segmentation?
Are there the instance segmentation results of replica dataset? If not, do you know how I can get it?
Thanks again

meaning of pixel-wise denoising

HI @Harry-Zhi,

Firstly, this is amazing work. I had a query, what do you mean by pixel-wise denoising? Is it similar to image deblurring?

Is there rendering code or trajectory-generation code?

For Replica and/or scannet.
It would be very helpful.

如何获取habitat-sim的相机内参

您好！在data_generation文件夹中使用habitat-sim生成数据集时，相机内参是多少呢？或者说如何获取相机内存呢？感谢您的答复。

IndexError: list index out of range

Hi Harry, thanks a lot for sharing your work!
I am new to NERF and I was trying to replicate your results using the pre-rendered Replica data, but I'm getting the following error:

"semantic = cv2.imread(self.semantic_list[idx], cv2.IMREAD_UNCHANGED)
IndexError: list index out of range"

The reason I could figure out was that the files in semantic_class folder do not have 900 samples starting with "semantic_class_" which was used for searching in the code (after about 100 files, the filename changes to vis_sem_).

Could you help me figure out why that is the case?
I'm using the following command to train on just room 0 sequence 1 set as PATHtoRENDERED_REPLICA_DATA:

!python3 train_SSR_main.py --config_file /content/drive/MyDrive/semantic_nerf/SSR/configs/SSR_room0_config.yaml --sparse_views --sparse_ratio 0.6

bad result of office scene

Hello!
Thanks for your great work!! I am very interested in your semantic-nerf.
And I encountered a problem when I apply other model on your pre-rendered replica dataset. The same parameters performed well on room0, but performed very poorly on scene data like office4. Have you encountered a similar problem? Are the hyperparameters of each scene basically the same?

This is novel view synthesis result of room0:

This is novel view synthesis result of office4:

Can it work with multi-GPUs?

in config just see "gpu".
can we use more than one to speed up training?

Instance Labels in pre-rendered dataset Replica Dataset

Hi, Thanks for the amazing work!

I downloaded the pre-rendered Replica dataset from provided link but was unable to find the instance label folder for each scene.
The directory semantic_instance is missing from each scene. In the dataset this is optionally loaded here.

it would be great if the link can be updated with the instance id folders as well.

Alternatively, if the data rendering scripts from Habitat can be open-sourced that would be really great.

Thanks!

Mesh reconstruction and Implementation of semantic_nerf for Outdoor scenes

Hi,

Thanks for sharing your work.

Code works fine for the indoor scenes. I trying to implement this code on images of an outdoor seen e.g. building. But I am unable to generate the render scenes. I will be glad if you can guide me about parameters I need to change to get the required output for outdoor scenes.

info_semantic.json for replica

Can you share a copy of info_semantic.json so we can directly use prerendered replica data instead of having to download the whole dataset?

Training time for a scene

I noticed in your paper that the model was trained for 200,000 iterations on a single RTX2080Ti GPU. Could you please provide an estimate of the time it took to complete this training?

A question about coloured mesh

Hello, thanks for your excellent job! Generating a semantic model in a nerf-way is so cool. With the shared codes, I succeeded in training with Replica data and stopped the training when the testing results are perceptually good. However, when generating colored meshes, I met some trouble because no "mesh.ply" file can be found. I wonder where I can get the "mesh.ply" file.

Label mapping details

Hi,
thanks for your code and efforts! I have a problem about the details of label mapping.
In the paper, you mentioned "manually map these labels to the popular NYUv2-13 definition", could you please share the details of it?
I've used the habitat-sim as you suggested and want to have exactly the same label mapping.
For example in the script of Habitat-sim:
self.labels = {
'rug': 6,
'wall': 12,
'floor': 5,
'ceiling': 3,
'chair': 4,
'table': 10,
'window': 13,
.......
}

As for "Semantic View Synthesis with Sparse Labels" part, it seems to have more classes, could you please share about the label mapping of it as well?
Thanks a lot for your reply!

How to convert the provided poses in traj_w_c.txt to the transforms.json used in Nerfstudio?

I follow nerfstudio to convert the provided poses to nerfstudio format with the following:
c2w = read_poses_c2w[idx].reshape((4,4))
# Convert from COLMAP's camera coordinate system to ours
c2w[0:3, 1:3] *= -1
c2w = c2w[np.array([1, 0, 2, 3]), :]
c2w[2, :] *= -1

However, it does not work for nerfstudio. Can you give some hints?

Processing a dataset with a different camera model

Hey,

once again congrats for the amazing work.

I am trying to train the semantic-nerf in a custom dataset. I have pre-processed the dataset to meet the format of replica dataset, classes are also familiar, so I have managed to train the model without tuning the data loader (replica_datasets.py). Yet the model cannot learn the 3D representation due to camera poses incompatibility.

In specific, as far as I can understand from trainer.py and set_params_replica function, the rgb images are supposed to be captured via a pin-hole camera. In my case, the camera model is different so I want to modify the function so as to process the camera poses of the traj_w_c.txt file correctly. Is it enough changing fx, fy, etc. parameters? What if there is a more complex camera model with additional parameters such as p?

Any tips or ideas on how to implement a different camera model than pin-hole are more than welcome!

How to run the project?

Hello author, I am a Nerf beginner and I would like to know if there are pre-trained weights that can directly run the model to produce semantically segmented images? And how should I run this project?

What format is the "traj_w_c.txt" file in the pre-rendered replica dataset in?

Hi,
Thanks for sharing your code, I am looking to explore this system with other datasets and would like to know thew format of the "traj_w_c.txt" in the pre-rendered replica dataset(s).

Cheers,
Kurt

Semantic segmentation results do not show void classes(color is black)

appreciating your excellent work !!!
I want to follow your work, but when I used your most recent code in the replica room_0, I discovered that the void class could not display in the image,but the generated sem.mp4 can display void class normally. Why is this? I recall using your code a few months ago, and the output image showed a void class(color is black). Please give me some advice on how to switch back to the old code.
Thanks a lot!!!

Questions about NDC space

Hi, thanks for the great work!
It seems that semantic-nerf doesn't seem to use NDC space, I want to know why?

config/models for Scannet

Hello!
Can you provide more files(configs of parameters setting, pretrained models...) on Scannet dataset in the paper?
Thanks

Custom dataset

Hello!
Thank you a lot for your great work!
I am planning to use your method for robotic application and I have a question. Do you have any guide how to prepare my own data for training and rendering?
As I understood the input includes 2D color images, depth images, camera poses and sematic class? Am I correct? These data should be stored as a dataset?

Thank you in advance!

Error when evaluating

The error occurs on both Replica and ScanNet datasets.

Traceback (most recent call last):
File "train_SSR_main.py", line 224, in
train()
File "train_SSR_main.py", line 212, in train
ssr_trainer.step(global_step)
File "/opt/data3/semantic_nerf-main/SSR/training/trainer.py", line 1027, in step
number_classes=self.num_valid_semantic_class, ignore_label=self.ignore_label)
File "/opt/data3/semantic_nerf-main/SSR/training/training_utils.py", line 67, in calculate_segmentation_metrics
conf_mat = confusion_matrix(true_labels, predicted_labels, labels=list(range(number_classes)))
File "/home/zy/miniconda3/envs/nerf/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
return f(**kwargs)
File "/home/zy/miniconda3/envs/nerf/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 276, in confusion_matrix
y_type, y_true, y_pred = _check_targets(y_true, y_pred)
File "/home/zy/miniconda3/envs/nerf/lib/python3.7/site-packages/sklearn/metrics/_classification.py", line 81, in _check_targets
check_consistent_length(y_true, y_pred)
File "/home/zy/miniconda3/envs/nerf/lib/python3.7/site-packages/sklearn/utils/validation.py", line 256, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [19814400, 11767603]

How to train on my own dataset? I have segmentation data and RGB images, what else do I need for training?

Mesh file

Hi, i want to test semantic_nerf model, but is extract colour mesh file for testing. If it is, why neccessary mesh file, is nerf model need to generate mesh model. In conclusion, how can i do testing

CUDA out of memory when using RTX2080-Ti GPU

Hi,

Thank you for your inspiring work and the code released!

I'm trying running your demo on "room0" of Replica by

python3 train_SSR_main.py --config_file /SSR/configs/SSR_room0_config.yaml

Whenever it comes to validation time, it will not be able to move on and interrupted by "CUDA out of memory". However, I use the same GPU (RTX2080-Ti with 11GB memory) as mentioned in your paper. Also, no matter how much I decrease the numbers "chunk" and "netchunk" in the config file (even decreasing them to 1024*1), this "CUDA out of memory" would happen during validation time.

I'm wondering whether you could give me some advice.

Some result confuse

Thanks for your publish code! When I test the result in replica, I found that the black area(void label) can't not be perfertly rendered. Could I konw the reason?

CUDA error: no kernel image is available for execution on the device

 home/xxxx/.conda/envs/semantic_nerf/lib/python3.7/site-packages/torch/nn/functional.py:3121: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details.
  "See the documentation of nn.Upsample for details.".format(mode))
/home/xxxx/.conda/envs/semantic_nerf/lib/python3.7/site-packages/torch/cuda/__init__.py:125: UserWarning: 
NVIDIA GeForce RTX 3070 Laptop GPU with CUDA capability sm_86 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_37 sm_50 sm_60 sm_70 sm_75.
If you want to use the NVIDIA GeForce RTX 3070 Laptop GPU GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/

  warnings.warn(incompatible_device_warn.format(device_name, capability, " ".join(arch_list), device_name))
prepare rays
prepare rays
prepare rays
Begin
  0%|                                                                                                                                                                                                  | 0/200001 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "train_SSR_main.py", line 224, in <module>
    train()
  File "train_SSR_main.py", line 212, in train
    ssr_trainer.step(global_step)
  File "/home/xxxx/Desktop/work/semantic_nerf/SSR/training/trainer.py", line 882, in step
    sampled_data = self.sample_data(global_step, self.rays, self.H, self.W, no_batching=True, mode="train")
  File "/home/xxxx/Desktop/work/semantic_nerf/SSR/training/trainer.py", line 650, in sample_data
    sampled_rays = rays[index_batch, index_hw, :]
RuntimeError: CUDA error: no kernel image is available for execution on the device

Sampling to get the training set (180 images)

Where can I find code related to this part of the description? （Sampling 180 images as training set）

Camera Intrinsic

Thank you for your dataset! I want to know where can I find the camera intrinsic K ?

RuntimeError: CUDA out of memory.

I want to run on my win10 computer, and the GPU is RTX2060super8G. Maybe GPU is not good enough :) , and I just want to try to run through it.

Traceback (most recent call last):
File "train_SSR_main.py", line 225, in
train()
File "train_SSR_main.py", line 213, in train
ssr_trainer.step(global_step)
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\trainer.py", line 981, in step
rgbs, disps, deps, vis_deps, sems, vis_sems, sem_uncers, vis_sem_uncers = self.render_path(self.rays_vis, save_dir=trainsavedir)
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\trainer.py", line 1156, in render_path
output_dict = self.render_rays(rays[i])
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\trainer.py", line 703, in render_rays
all_ret = batchify_rays(fn, flat_rays, self.chunk)
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\training_utils.py", line 10, in batchify_rays
ret = render_fn(rays_flat[i:i + chunk])
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\training\trainer.py", line 745, in volumetric_rendering
raw_coarse = run_network(pts_coarse_sampled, viewdirs, self.ssr_net_coarse,
File "D:\Users\admin\Documents\GitHub\semantic_nerf\SSR\models\model_utils.py", line 31, in run_network
embedded = torch.cat([embedded, embedded_dirs], -1)
RuntimeError: CUDA out of memory. Tried to allocate 1.65 GiB (GPU 0; 8.00 GiB total capacity; 5.00 GiB already allocated; 910.21 MiB free; 5.14 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

After GOOGLE, I know I need to reduce the batch_size or chunk size, but I don't know where PROJECT CODE I need to edit and what reduced value of batch_size and chunk size is proper.
And batch_size or chunk size, which one should I choose to reduce first?

thanks !!!!!!

Edit: I'm a beginner in NeRF. Apologize if it's a dummy question:)

How do you pre-render the Replica Dataset and how to render own Dataset

Hi,
thanks for your code and efforts! I have two questions which about pre-render the Dataset.

From the NeRF Dataset we can take photos and get poses through the library directly. How do you pre-render the Replica Dataset which you provided us?
How can we pre-render our own dataset for using this semantic-nerf? What should I prepare my own Dataset
Many thanks for your reply!

About camera poses

Hi, I wonder what is the camera format in traj_w_c.txt ? Does the filename "*_w_c" suggests it is in world to camera format? Thank you very much.

How to set some semantic labels to void

Thank you for your awesome work！！！

I plan to set the semantics of some instances to void when training the model, because I don't want some instances to be segmented, what can I do?

thanks！！！

Question about data format

I want to render a replica dataset with the exact same format as used in iMap.

So far I was able to generate a smooth trajectory, navigate the agent along it, and store the respective depth maps and images which look nice. However, I'm failing when trying to run a method on it, while it works perfectly fine on the replica data from iMap.

I'm a little bit confused about how exactly to test/verify that my poses are legit. The snippet I used to get them assuming that there's a legit trajectory:

  for ix, point in enumerate(tqdm(path_points)):
      if ix >= len(path_points) - 1:
          break

      tangent = path_points[ix + 1] - point
      agent_state.position = point

      tangent_orientation_matrix = mn.Matrix4.look_at(point, point + tangent, np.array([0, 1.0, 0]))
      tangent_orientation_q = mn.Quaternion.from_matrix(tangent_orientation_matrix.rotation())
      agent_state.rotation = utils.quat_from_magnum(tangent_orientation_q)
      agent.set_state(agent_state)

      pose = np.eye(4)
      pose[:3, :3] = qt.as_rotation_matrix(agent_state.rotation)
      pose[:3, 3] = agent_state.position
      poses.append(pose)

pre-rendered Replica data can't download

Thank you very much for your work and sharing
However, when I download the pre-rendered Replica data, it shows failure at the end of the download.
Is there any other way to download it?

Question about label propagation task

Thanks for your publish code! When I run your code about label propagation task, I found that the semantic loss is nan because gt_label almost is 0 after single-click. You mention that "we do not apply any loss on the void regions hence the network is able to predict arbitrary classes without penalty (though in fact it tend to predict some reasonable classes based on the similarity in appearance or geometry). And the void region also does not contribute to the evaluation metrics" in #3 .
So it is easy-to-understand why the semantic loss is nan. However, in this case, how can the network train to the effect that you demonstrate? What I should do?

Questions about Multi-view Semantic Fusion

Hi, I am your sincere follower! Your semantic-NeRF is a very suprising work！

I wonder if you could release the Multi-view Semantic Fusion code?

KeyError when training on ScanNet

The error is that lacking sample_step:

Experiment GPU is 0.
INFO - 2022-08-30 21:44:12,892 - trainer - Using gpu's: 0
----- ScanNet Dataset with NYUv2-40 Conventions-----
processing ScanNet scene: scene0010_00
Traceback (most recent call last):
File "train_SSR_main.py", line 224, in
train()
File "train_SSR_main.py", line 168, in train
sample_step=config["experiment"]["sample_step"],
KeyError: 'sample_step'

about replica's bounding box

How can I define the bouding box? like llff : self.scene_bbox = torch.tensor([[-1.5, -1.67, -1.0], [1.5, 1.67, 1.0]]).
But I really don't know what exactly it is or how big is suitable...

I want to test on another nerf...please help me...

best wishes

Generate per scene semantic img

The code is wonderful. But I have some trouble generating per scene semantic img from this code.
o3d_mesh_canonical_clean.vertex_colors = o3d.utility.Vector3dVector(v_colors/255.0)
Could you please give me some advice?

Semantic Label Denoising

Thanks for your work！
In the paper，4.4 Semantic Fusion，with further experiences，includes Semantic Label Denoising and Super-Resolution. How can I get the processed data？Could you upload the code that processes the data？
Looking forward to your reply！

harry-zhi / semantic_nerf Goto Github PK

semantic_nerf's People

Contributors

Stargazers

Watchers

Forkers

semantic_nerf's Issues

Recommend Projects

Recommend Topics

Recommend Org