minghanqin / langsplat Goto Github PK

View Code? Open in Web Editor NEW

451.0 19.0 44.0 20.91 MB

Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting" [CVPR2024 Highlight]

Home Page: https://langsplat.github.io/

License: Other

Python 98.86% Shell 1.14%

3d 3d-gaussian-splatting 3d-reconstruction language

langsplat's People

Contributors

Stargazers

Watchers

langsplat's Issues

different effects between Fig1 in the paper and the video demo on the website

Very amazing results for your LangSplat! Actually, I'm writing because I still have some questions about your work.
As you showed on your official website, the rendered feature map has stable results in different views. The color represents the different features. However, the effect shown in Fig.1 in your paper does not seem to achieve this multi-view consistency. So I want to ask how you achieve this consistency in the video.

How to use eval code for 3D-OVS data

Hi,
Thanks for great work and appreciate the released code. As for the evaluation code, I think currently it only supports Lerf dataset because it provides the GT label. How can I evaluate the results on 3D-OVS dataset? Thanks!

display result

Thank you very much for your excellent work!
However, while following this project, I am unsure how to display a demo similar to the effects in your demo. Could you please tell me how to do it? Thank you!

setting up conda environment fail

-First of all, congrats on your CVPR2024 Highlight!-

I have been facing issue setting up conda environment.
Do I have to do something besides conda env create --file environment.yml?

I encountered similar errors from #13 but have not been able to solve it.

if there are any alternative way to setup the environment, I'd like to know how.
Or a step by step explanation on how to do a setup would be very helpful...
im currently using cudatoolkit version 11.7

3D Semantic Mesh output

Hi,

Thanks a lot for sharing great work. I was curious about how to generate 3D semantic point cloud and mesh.

The point cloud saved/generated in the output folder doesn't have the semantic colours.

Is there any already code available in this repository?

Preprocess Error

Hi, I encountered such error when I ran preprocess.py. Any idea how to solve it? Thank you!

(langsplat) ➜  LangSplat git:(main) ✗ python preprocess.py --dataset_path ./data/snacks 
[ INFO ] Encountered quite large input images (>1080P), rescaling to 1080P.
 If this is not desired, please explicitly specify '--resolution/-r' as 1
Traceback (most recent call last):  
  File "preprocess.py", line 126, in create
    img_embed, seg_map = _embed_clip_sam_tiles(img.unsqueeze(0), sam_encoder)
  File "preprocess.py", line 178, in _embed_clip_sam_tiles
    seg_images, seg_map = sam_encoder(aug_imgs)
  File "preprocess.py", line 299, in sam_encoder
    masks_default, masks_s, masks_m, masks_l = mask_generator.generate(image)
ValueError: too many values to unpack (expected 4)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "preprocess.py", line 404, in <module>
    create(imgs, data_list, save_folder)
  File "preprocess.py", line 128, in create
    raise ValueError(timer)
ValueError: 1

Ground-truth data for Evaluation

Hi @minghanqin , this is an interesting work and the performance is impressive! 🙂

In the paper you mention that:

"we extend the LERF dataset by annotating ground truth masks for textual queries, enabling the evaluation of the
open-vocabulary 3D semantic segmentation on the LERF dataset....."
"Therefore, we
further manually annotated additional challenging localization samples to better evaluate method performance."

Could you please share your newly annotated dataset (and if possible, the eval code too), so that it's possible to have a fair comparison with the LangSplat method?

Thank you!
Yash

Where is your modified diff_gaussian_rasterization?

From gaussian_renderer/init.py I see you modify the cuda operations to rasterize language features. May you provide your modified package codes? I fail to find it. Thanks for your great work!

Rendered Segmentation Map is All Black

Thanks for your excellent work. I noticed the rendered segmentation maps generated by render.py are all black, no matter whether on the sofa dataset provided with the pre-trained model or my dataset. Could you help me fix this?

How to calculate mIoU for the origin Lerf model?

Hi, thanks for the work.
I see the metric table in the paper, and it puzzles me to render relevancy map for Lerf, because Lerf doesn't provide how to render it. Would you share the corresponding script?

How fast is the training process of 3D field and autoencoder?

~~Great paper.Can the autoencoder be trained in real-time？ Or is the usage scenario of this model limited by the scene-wise language autoencoder? ~~

Never mind, stupid question.

What I really want to know is, how fast is the training process from a set of image to the language 3D field?

And how much of images needed?

I am wondering if this method can be applied to real time robot navigation.

About the input format "chkpnt30000"

A quick check about where to find the "chkpnt30000.pth". In original 3D gaussian, we don't have such an output. Can you tell me what's inside this file or how can we get this file?

How can we evaluate the lerf segmentation and localization results as you mentioned in your paper?

Thanks for your great works! Since LERF doesn't implement their custom ns-render under nerfstudio framework, I just wonder how can we evaluate their segmentation results or relavancy maps on specific viewpoints. Thanks!

Can someone help me with the following? After I set up the environment, I cannot understand how to start quickly

The problems I have encountered are:

一：As mentioned earlier, placing the pre trained model in the output
1：Is the pre trained model only applicable to Baidu Cloud's pre trained model? Or preprocessed_dataset?
2：Where should the output folder be placed

二：Do I need to modify the instruction for quick start? For example, $CASEAME

Thank you very much for your help,

IndexError: index 4 is out of bounds for dimension 1 with size 4

File "LangSplat/scene/cameras.py", line 73, in get_language_feature
seg = seg_map[:, y, x].squeeze(-1).long()
IndexError: index 4 is out of bounds for dimension 1 with size 4

can you fix the bug?
and there are some bugs in the process.sh file also.

The configuration of eval code for differenct scenes

I tested on 3D-OVS data. It can achieve similar results (IoU) on the sofa scene.
But the performance for other scenes are not good as expeted.

I noted the language feature images are well-trained, the reason should be the setting of the eval code, such as threshold and the kernal size. Does it mean we need to try the setting manually to achieve the best results?

Below is the sample of the bench scene, including language feature image, groundtruth and the predicted mask. Do you have any suggestion?

-

Can you upload the data on google drive or huggingface?

I cannot download files from baidu :(

How can I query the 3D object using 'text', similar to how lerf's demo demonstrates?

lerf website: https://www.lerf.io/. Thanks!

set include_feature=False will lead to illegal memory access

Hi,

I have tried to set include_feature=False for the original 3DGS training, but I encounter an error after the model runs several iterations:

Training progress:   2%|█▋                                                                      | 700/30000 [00:46<29:56, 16.31it/s, Loss=0.0869420]
[CUDA ERROR] in cuda_rasterizer/rasterizer_impl.cu
Line 415: an illegal memory access was encountered [14/02 23:44:58]
An error occured in backward. Writing snapshot_bw.dump for debugging. [14/02 23:44:58]
 [14/02 23:44:58]
Traceback (most recent call last):
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "train.py", line 231, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 104, in training
    loss.backward()
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 147, in backward
    raise ex
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 143, in backward
    grad_means2D, grad_colors_precomp, grad_language_feature_precomp, grad_opacities, grad_means3D, grad_cov3Ds_precomp, grad_sh, grad_scales, grad_rotations = _C.rasterize_gaussians_backward(*args)
RuntimeError: an illegal memory access was encountered

I can successfully train with the original 3DGS code.

Could you provide the preprocess LERF dataset used in this paper?

Thanks for your great work!
I would like to know when will the preprocessed LERF dataset be available?

No loop matching the specified signature and casting was found for ufunc greater

I use the dataset you provided for training, but encounter the following issue at checkpoint 7000.

testing for iter 7000 [28/01 20:59:13]

[ITER 7000] Evaluating train: L1 0.01737641841173172 PSNR 30.760613250732423 [28/01 20:59:26]
Traceback (most recent call last):
  File "/data/LangSplat/train.py", line 231, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/data/LangSplat/train.py", line 116, in training
    training_report(tb_writer, iteration, Ll1, loss, l1_loss, iter_start.elapsed_time(iter_end), testing_iterations, scene, render, (pipe, background, opt))
  File "/data/LangSplat/train.py", line 200, in training_report
    tb_writer.add_histogram("scene/opacity_histogram", scene.gaussians.get_opacity, iteration)
  File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 485, in add_histogram
    histogram(tag, values, bins, max_bins=max_bins), global_step, walltime
  File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py", line 358, in histogram
    hist = make_histogram(values.astype(float), bins, max_bins)
  File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py", line 386, in make_histogram
    cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32))
TypeError: No loop matching the specified signature and casting was found for ufunc greater

It seems that this happened when plotting the histogram. Do you know how to resolve it?

poor results on teamtime

Dear author,

Thanks for your excellent work.
I evaluated the released pretrained_model on teatime. The localization accuracy is only 0.1017.

Do you have any idea what the problem is.

2024-04-25 16:47:42,245 - teatime - INFO - trunc thresh: 0.4
INFO:teatime:trunc thresh: 0.4
2024-04-25 16:47:42,245 - teatime - INFO - iou chosen: 0.0245
INFO:teatime:iou chosen: 0.0245
2024-04-25 16:47:42,248 - teatime - INFO - chosen_lvl:
[array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0)]
INFO:teatime:chosen_lvl:
[array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0)]
2024-04-25 16:47:42,248 - teatime - INFO - Localization accuracy: 0.1017
INFO:teatime:Localization accuracy: 0.1017

Querying the most relevant 3D Gaussians

I couldn't find any code for inputting text and querying the most relevant 3D Gaussians in the code repository. Will it be provided later?

About the Ground Truth

I want to ask where the Ground Truth data in the paper comes from. I use the trained decoder to get the 'language_feature_dim3', and draw the array with shape of (H, W, 3) using the 'plt.imshow'. But the quality is worse than the GT image in the paper.
Could you tell me how do you get and draw the GT image in the paper? Thank you!

change language feature encoder to dim=4 - CUDA error: an illegal memory access was encountered

I tried to explore the more dimension listed in the paper(which in practice paper used language feature as dim = 3).
I tried to use dim = 4 now.

my steps:

modify the LangSplat/submodules/diff-gaussian-rasterization/cuda_rasterizer/config.h, changing NUM_CHANNELS_language_feature to 4, rebuild and re-install
re-train the autoencoder - modify the last layer's dim from 3 to 4
generated the language_features_dim4 folder
train the langsplat: python train.py -s $dataset_path -m output/${casename} --start_checkpoint $dataset_path/$casename/chkpnt30000.pth --feature_level ${level}

and here is the error I got：

Traceback (most recent call last):
  File "train.py", line 240, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 99, in training
    gt_language_feature, language_feature_mask = viewpoint_cam.get_language_feature(language_feature_dir=dataset.lf_path, feature_level=dataset.feature_level)
  File "/datadrive/yingwei/LangSplat/scene/cameras.py", line 94, in get_language_feature
    return point_feature.cuda(), mask.cuda()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

after I added CUDA_LAUNCH_BLOCKING = 1:

/Traceback (most recent call last):
  File "train.py", line 240, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 93, in training
    render_pkg = render(viewpoint_cam, gaussians, pipe, background, opt)
  File "/datadrive/yingwei/LangSplat/gaussian_renderer/__init__.py", line 113, in render
    "visibility_filter" : radii > 0,
RuntimeError: CUDA error: an illegal memory access was encountered

I tried this RP: graphdeco-inria/gaussian-splatting#41 (comment) but didn't work

I basically can conclude the issue didn't happen in my CUDA or pytorch, since I can run when I set language feature dim as 3 smoothly.

Are there any other changes I need to edit besides NUM_CHANNELS_language_feature?

How is query speed measured?

Hi, really nice work!

I wanted to ask you, how did you compute the query speed of tables 3 and 4 of the paper?

I've observed that the current implementation stores each level of language features on a different set of gaussians. Therefore you require 3 rasterization steps to obtain the 2D language features for a given point of view.
Are you taking this rasterization and the posterior decoding time into account, or you are just measuring the query matching with the CLIP features?

Thank you!

Inconsistency in Loss Calculation between Training and Evaluation of Autoencoder Model

Hello!

First of all, I'd like to extend my appreciation for the work put into this project.

I've been exploring the code related to training the autoencoder model, specifically within the "train.py" file. I came across an inconsistency in the calculation of loss between the training and evaluation phases.

During training, the loss is defined as follows: loss = l2loss + cosloss * 0.001. However, during evaluation, the loss seems to be calculated slightly differently: loss = l2_loss(outputs, data) + cos_loss(outputs, data), where the cos_loss term is not multiplied by 0.001.

I'm curious to understand whether this difference is intentional or if it might be an oversight. If intentional, I'd appreciate some insight into the rationale behind this choice.

Thanks!

Preprocess Too slow

Is there any process to speed up the preprocess? Currently, I need almost 2 mins for one image on A6000.

Script for Step 2 (Train the Autoencoder and get the lower-dims Feature.)

Dear Author,

Thanks for your great work!
I am wondering how to conduct step 2, could you please provide a more explicit script for this part?

Many thanks in advance.

Bests,
Runsong

Got blank heatmap when running evaluation code

Hi,

I tried to run the eval.sh for evaluation, but I found the result is all blank using the provided pre-trained autoencoder (download from 'pretrained_model/ckpt') and language-embedded Gaussian splat (download from 'pretrained_model/output').

I also tried to follow the process.sh file to self-train the autoencoder and language-embedded Gaussian splat, but the heatmaps are still blank.

All images under LangSplat/eval_result/teatime/***/heatmap are looking like this

However, the renders I got are all looking good:
LangSplat/output/teatime/teatime_1/train/ours_None/renders/00000.png

Does anyone get the evaluation work or face the same problem as me?

The experiment reproduction question for 3D semantic segmentation on the 3D-OVS

Thank you so much for sharing your great work.

When I used your code and pre-trained weights to reproduce the segmentation results for the "sofa" scene, my mIoU results were calculated to be just over 70. My current calculation method is to calculate mIoU separately for all "three relevancy maps" generated by any "positive" and then average them. I don't know if my calculation manner is correct.

In addition, the segmentation map generated for the 'grey sofa' in the 'sofa' scene is completely black. Do you have any suggestions to help me improve my results?

A typo on Readme

Pretrained model link is not working

Hi there!

I'm trying to get started with your repository, but when I click on the "[Pre-trained Models]" link in the README it redirects me to the README.

Can I really query the 3D gaussians? Or I can only query the rendered images?

I have a lingering question that has been on my mind, and I was hoping you could help clarify it for me.

The focal point of the paper is "3D Scene Querying," but upon reading it, I find myself pondering whether it is feasible to query a set of 3D Gaussians.

To elaborate, let's consider a scenario where I have five million trained Gaussians representing an unfamiliar scene. My objective is to locate the position of a 'TV.'

Can I use the term 'TV' to determine the 3D spatial coordinates of the TV from this bunch of 3DGS and retrieve its corresponding image (i.e., determine the appropriate camera position for rendering)? How can you query a set of probabilistic distributions using a text prompt?

Alternatively, is my only option to query the rendered egocentric 2D image? If the TV is not present in the image, does that imply there is no means for me to ascertain the where the TV is ?

I appreciate your expertise and insights into this matter.

incorrect filename and missing glm when building

Problem 1:

Problem 2:

missing glm package here:
https://github.com/g-truc/glm/tree/5c46b9c07008ae65cb81ab79cd677ecc1934b903

Intermittent memory errors while running preprocess.py

While running preprocess.py, I have received intermittent memory access errors.

I have been using the ramen dataset from the link provided and running python preprocess.py --dataset_path data/ramen.

Randomly, it will give me memory errors like double free or corruption (!prev) Aborted (core dumped) , corrupted size vs. prev_size Aborted and additionally a pytorch error that has been patched in pytorch 2.0.1 when trying to run on multithreaded CPU.

Setting the threads to 1 with torch.set_num_threads(1) fixes the pytorch issue but makes the preprocessing speed very slow.

I was wondering if there was any advice on how to fix these issues happening during the preprocess script execution.

Installation problem: File 'submodules/langsplat-rasterization' does not exist.

Hi
First of all, thanks for your amazing job here.
I have couple of question for setting LangSplat up and need your guidance.

When running conda env create --file environment.yml, I encountered an error below, is this submodule missing in the repo?

Pip subprocess error:
ERROR: Invalid requirement: 'submodules/langsplat-rasterization' (from line 2 of /mnt/c/Users/lzlal/LangSplat/condaenv.75uxspmj.requirements.txt)
Hint: It looks like a path. File 'submodules/langsplat-rasterization' does not exist.

failed

CondaEnvException: Pip failed

I noticed that the segment-anything-langsplat is one of the sub-module listed in environment.yml, so I think it woud be installed once the command conda env create --file environment.yml is completed successfully. However, you also mention segment-anything-langsplat must be installed and that makes me confused, so...do you mean that there is an additional installation process for segment-anything-langsplat or just to download the checkpoints of SAM would work?

Once the checkpoint is downloaded, where should I place the folder /ckpts? Is it under LangSplat/submodules/segment-anything-langsplat? /submodules/segment-anything-langsplat/segment_anything ? or LangSplat ?

In addition, there seems something wrong in 'cfg_args' that misses the attribute 'language_features_name'.

Would you please offer some help to solve it?

minghanqin / langsplat Goto Github PK

langsplat's People

Contributors

Stargazers

Watchers

Forkers

langsplat's Issues

Recommend Projects

Recommend Topics

Recommend Org