Git Product home page Git Product logo

langsplat's People

Contributors

minghanqin avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

langsplat's Issues

different effects between Fig1 in the paper and the video demo on the website

Very amazing results for your LangSplat! Actually, I'm writing because I still have some questions about your work.
As you showed on your official website, the rendered feature map has stable results in different views. The color represents the different features. However, the effect shown in Fig.1 in your paper does not seem to achieve this multi-view consistency. So I want to ask how you achieve this consistency in the video.

How to use eval code for 3D-OVS data

Hi,
Thanks for great work and appreciate the released code. As for the evaluation code, I think currently it only supports Lerf dataset because it provides the GT label. How can I evaluate the results on 3D-OVS dataset? Thanks!

display result

Thank you very much for your excellent work!
However, while following this project, I am unsure how to display a demo similar to the effects in your demo. Could you please tell me how to do it? Thank you!

setting up conda environment fail

-First of all, congrats on your CVPR2024 Highlight!-

I have been facing issue setting up conda environment.
Do I have to do something besides conda env create --file environment.yml?

I encountered similar errors from #13 but have not been able to solve it.

if there are any alternative way to setup the environment, I'd like to know how.
Or a step by step explanation on how to do a setup would be very helpful...
im currently using cudatoolkit version 11.7

3D Semantic Mesh output

Hi,

Thanks a lot for sharing great work. I was curious about how to generate 3D semantic point cloud and mesh.

The point cloud saved/generated in the output folder doesn't have the semantic colours.

Is there any already code available in this repository?

Preprocess Error

Hi, I encountered such error when I ran preprocess.py. Any idea how to solve it? Thank you!

(langsplat) ➜  LangSplat git:(main) ✗ python preprocess.py --dataset_path ./data/snacks 
[ INFO ] Encountered quite large input images (>1080P), rescaling to 1080P.
 If this is not desired, please explicitly specify '--resolution/-r' as 1
Traceback (most recent call last):  
  File "preprocess.py", line 126, in create
    img_embed, seg_map = _embed_clip_sam_tiles(img.unsqueeze(0), sam_encoder)
  File "preprocess.py", line 178, in _embed_clip_sam_tiles
    seg_images, seg_map = sam_encoder(aug_imgs)
  File "preprocess.py", line 299, in sam_encoder
    masks_default, masks_s, masks_m, masks_l = mask_generator.generate(image)
ValueError: too many values to unpack (expected 4)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "preprocess.py", line 404, in <module>
    create(imgs, data_list, save_folder)
  File "preprocess.py", line 128, in create
    raise ValueError(timer)
ValueError: 1

Ground-truth data for Evaluation

Hi @minghanqin , this is an interesting work and the performance is impressive! 🙂

In the paper you mention that:

"we extend the LERF dataset by annotating ground truth masks for textual queries, enabling the evaluation of the
open-vocabulary 3D semantic segmentation on the LERF dataset....."
"Therefore, we
further manually annotated additional challenging localization samples to better evaluate method performance."

Could you please share your newly annotated dataset (and if possible, the eval code too), so that it's possible to have a fair comparison with the LangSplat method?

Thank you!
Yash

Rendered Segmentation Map is All Black

Thanks for your excellent work. I noticed the rendered segmentation maps generated by render.py are all black, no matter whether on the sofa dataset provided with the pre-trained model or my dataset. Could you help me fix this?

How to calculate mIoU for the origin Lerf model?

Hi, thanks for the work.
I see the metric table in the paper, and it puzzles me to render relevancy map for Lerf, because Lerf doesn't provide how to render it. Would you share the corresponding script?

How fast is the training process of 3D field and autoencoder?

~~Great paper.Can the autoencoder be trained in real-time? Or is the usage scenario of this model limited by the scene-wise language autoencoder? ~~

Never mind, stupid question.

What I really want to know is, how fast is the training process from a set of image to the language 3D field?

And how much of images needed?

I am wondering if this method can be applied to real time robot navigation.

About the input format "chkpnt30000"

A quick check about where to find the "chkpnt30000.pth". In original 3D gaussian, we don't have such an output. Can you tell me what's inside this file or how can we get this file?

The configuration of eval code for differenct scenes

I tested on 3D-OVS data. It can achieve similar results (IoU) on the sofa scene.
But the performance for other scenes are not good as expeted.

I noted the language feature images are well-trained, the reason should be the setting of the eval code, such as threshold and the kernal size. Does it mean we need to try the setting manually to achieve the best results?

Below is the sample of the bench scene, including language feature image, groundtruth and the predicted mask. Do you have any suggestion?

sample

set include_feature=False will lead to illegal memory access

Hi,

I have tried to set include_feature=False for the original 3DGS training, but I encounter an error after the model runs several iterations:

Training progress:   2%|█▋                                                                      | 700/30000 [00:46<29:56, 16.31it/s, Loss=0.0869420]
[CUDA ERROR] in cuda_rasterizer/rasterizer_impl.cu
Line 415: an illegal memory access was encountered [14/02 23:44:58]
An error occured in backward. Writing snapshot_bw.dump for debugging. [14/02 23:44:58]
 [14/02 23:44:58]
Traceback (most recent call last):
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
    cli.main()
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
    run()
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
    runpy.run_path(target, run_name="__main__")
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
    exec(code, run_globals)
  File "train.py", line 231, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 104, in training
    loss.backward()
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
    torch.autograd.backward(
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
    Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
    return user_fn(self, *args)
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 147, in backward
    raise ex
  File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 143, in backward
    grad_means2D, grad_colors_precomp, grad_language_feature_precomp, grad_opacities, grad_means3D, grad_cov3Ds_precomp, grad_sh, grad_scales, grad_rotations = _C.rasterize_gaussians_backward(*args)
RuntimeError: an illegal memory access was encountered

I can successfully train with the original 3DGS code.

No loop matching the specified signature and casting was found for ufunc greater

I use the dataset you provided for training, but encounter the following issue at checkpoint 7000.

testing for iter 7000 [28/01 20:59:13]

[ITER 7000] Evaluating train: L1 0.01737641841173172 PSNR 30.760613250732423 [28/01 20:59:26]
Traceback (most recent call last):
  File "/data/LangSplat/train.py", line 231, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "/data/LangSplat/train.py", line 116, in training
    training_report(tb_writer, iteration, Ll1, loss, l1_loss, iter_start.elapsed_time(iter_end), testing_iterations, scene, render, (pipe, background, opt))
  File "/data/LangSplat/train.py", line 200, in training_report
    tb_writer.add_histogram("scene/opacity_histogram", scene.gaussians.get_opacity, iteration)
  File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 485, in add_histogram
    histogram(tag, values, bins, max_bins=max_bins), global_step, walltime
  File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py", line 358, in histogram
    hist = make_histogram(values.astype(float), bins, max_bins)
  File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py", line 386, in make_histogram
    cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32))
TypeError: No loop matching the specified signature and casting was found for ufunc greater

It seems that this happened when plotting the histogram. Do you know how to resolve it?

poor results on teamtime

Dear author,

Thanks for your excellent work.
I evaluated the released pretrained_model on teatime. The localization accuracy is only 0.1017.

Do you have any idea what the problem is.

2024-04-25 16:47:42,245 - teatime - INFO - trunc thresh: 0.4
INFO:teatime:trunc thresh: 0.4
2024-04-25 16:47:42,245 - teatime - INFO - iou chosen: 0.0245
INFO:teatime:iou chosen: 0.0245
2024-04-25 16:47:42,248 - teatime - INFO - chosen_lvl:
[array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0)]
INFO:teatime:chosen_lvl:
[array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0)]
2024-04-25 16:47:42,248 - teatime - INFO - Localization accuracy: 0.1017
INFO:teatime:Localization accuracy: 0.1017

About the Ground Truth

I want to ask where the Ground Truth data in the paper comes from. I use the trained decoder to get the 'language_feature_dim3', and draw the array with shape of (H, W, 3) using the 'plt.imshow'. But the quality is worse than the GT image in the paper.
Could you tell me how do you get and draw the GT image in the paper? Thank you!

change language feature encoder to dim=4 - CUDA error: an illegal memory access was encountered

I tried to explore the more dimension listed in the paper(which in practice paper used language feature as dim = 3).
I tried to use dim = 4 now.

my steps:

  1. modify the LangSplat/submodules/diff-gaussian-rasterization/cuda_rasterizer/config.h, changing NUM_CHANNELS_language_feature to 4, rebuild and re-install
  2. re-train the autoencoder - modify the last layer's dim from 3 to 4
  3. generated the language_features_dim4 folder
  4. train the langsplat: python train.py -s $dataset_path -m output/${casename} --start_checkpoint $dataset_path/$casename/chkpnt30000.pth --feature_level ${level}

and here is the error I got:

Traceback (most recent call last):
  File "train.py", line 240, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 99, in training
    gt_language_feature, language_feature_mask = viewpoint_cam.get_language_feature(language_feature_dir=dataset.lf_path, feature_level=dataset.feature_level)
  File "/datadrive/yingwei/LangSplat/scene/cameras.py", line 94, in get_language_feature
    return point_feature.cuda(), mask.cuda()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.

after I added CUDA_LAUNCH_BLOCKING = 1:

/Traceback (most recent call last):
  File "train.py", line 240, in <module>
    training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
  File "train.py", line 93, in training
    render_pkg = render(viewpoint_cam, gaussians, pipe, background, opt)
  File "/datadrive/yingwei/LangSplat/gaussian_renderer/__init__.py", line 113, in render
    "visibility_filter" : radii > 0,
RuntimeError: CUDA error: an illegal memory access was encountered

I tried this RP: graphdeco-inria/gaussian-splatting#41 (comment) but didn't work

I basically can conclude the issue didn't happen in my CUDA or pytorch, since I can run when I set language feature dim as 3 smoothly.

Are there any other changes I need to edit besides NUM_CHANNELS_language_feature?

How is query speed measured?

Hi, really nice work!

I wanted to ask you, how did you compute the query speed of tables 3 and 4 of the paper?

I've observed that the current implementation stores each level of language features on a different set of gaussians. Therefore you require 3 rasterization steps to obtain the 2D language features for a given point of view.
Are you taking this rasterization and the posterior decoding time into account, or you are just measuring the query matching with the CLIP features?

Thank you!

Inconsistency in Loss Calculation between Training and Evaluation of Autoencoder Model

Hello!

First of all, I'd like to extend my appreciation for the work put into this project.

I've been exploring the code related to training the autoencoder model, specifically within the "train.py" file. I came across an inconsistency in the calculation of loss between the training and evaluation phases.

During training, the loss is defined as follows: loss = l2loss + cosloss * 0.001. However, during evaluation, the loss seems to be calculated slightly differently: loss = l2_loss(outputs, data) + cos_loss(outputs, data), where the cos_loss term is not multiplied by 0.001.

I'm curious to understand whether this difference is intentional or if it might be an oversight. If intentional, I'd appreciate some insight into the rationale behind this choice.

Thanks!

Preprocess Too slow

Is there any process to speed up the preprocess? Currently, I need almost 2 mins for one image on A6000.

Got blank heatmap when running evaluation code

Hi,

I tried to run the eval.sh for evaluation, but I found the result is all blank using the provided pre-trained autoencoder (download from 'pretrained_model/ckpt') and language-embedded Gaussian splat (download from 'pretrained_model/output').

I also tried to follow the process.sh file to self-train the autoencoder and language-embedded Gaussian splat, but the heatmaps are still blank.

All images under LangSplat/eval_result/teatime/***/heatmap are looking like this
apple_2

However, the renders I got are all looking good:
LangSplat/output/teatime/teatime_1/train/ours_None/renders/00000.png
00000

Does anyone get the evaluation work or face the same problem as me?

The experiment reproduction question for 3D semantic segmentation on the 3D-OVS

Thank you so much for sharing your great work.

When I used your code and pre-trained weights to reproduce the segmentation results for the "sofa" scene, my mIoU results were calculated to be just over 70. My current calculation method is to calculate mIoU separately for all "three relevancy maps" generated by any "positive" and then average them. I don't know if my calculation manner is correct.

In addition, the segmentation map generated for the 'grey sofa' in the 'sofa' scene is completely black. Do you have any suggestions to help me improve my results?

Pretrained model link is not working

Hi there!

I'm trying to get started with your repository, but when I click on the "[Pre-trained Models]" link in the README it redirects me to the README.

Can I really query the 3D gaussians? Or I can only query the rendered images?

I have a lingering question that has been on my mind, and I was hoping you could help clarify it for me.

The focal point of the paper is "3D Scene Querying," but upon reading it, I find myself pondering whether it is feasible to query a set of 3D Gaussians.

To elaborate, let's consider a scenario where I have five million trained Gaussians representing an unfamiliar scene. My objective is to locate the position of a 'TV.'

Can I use the term 'TV' to determine the 3D spatial coordinates of the TV from this bunch of 3DGS and retrieve its corresponding image (i.e., determine the appropriate camera position for rendering)? How can you query a set of probabilistic distributions using a text prompt?

Alternatively, is my only option to query the rendered egocentric 2D image? If the TV is not present in the image, does that imply there is no means for me to ascertain the where the TV is ?

I appreciate your expertise and insights into this matter.

Intermittent memory errors while running preprocess.py

While running preprocess.py, I have received intermittent memory access errors.

I have been using the ramen dataset from the link provided and running python preprocess.py --dataset_path data/ramen.

Randomly, it will give me memory errors like double free or corruption (!prev) Aborted (core dumped) , corrupted size vs. prev_size Aborted and additionally a pytorch error that has been patched in pytorch 2.0.1 when trying to run on multithreaded CPU.

Setting the threads to 1 with torch.set_num_threads(1) fixes the pytorch issue but makes the preprocessing speed very slow.

I was wondering if there was any advice on how to fix these issues happening during the preprocess script execution.

Installation problem: File 'submodules/langsplat-rasterization' does not exist.

Hi
First of all, thanks for your amazing job here.
I have couple of question for setting LangSplat up and need your guidance.

  1. When running conda env create --file environment.yml, I encountered an error below, is this submodule missing in the repo?
Pip subprocess error:
ERROR: Invalid requirement: 'submodules/langsplat-rasterization' (from line 2 of /mnt/c/Users/lzlal/LangSplat/condaenv.75uxspmj.requirements.txt)
Hint: It looks like a path. File 'submodules/langsplat-rasterization' does not exist.

failed

CondaEnvException: Pip failed
  1. I noticed that the segment-anything-langsplat is one of the sub-module listed in environment.yml, so I think it woud be installed once the command conda env create --file environment.yml is completed successfully. However, you also mention segment-anything-langsplat must be installed and that makes me confused, so...do you mean that there is an additional installation process for segment-anything-langsplat or just to download the checkpoints of SAM would work?
image
  1. Once the checkpoint is downloaded, where should I place the folder /ckpts? Is it under LangSplat/submodules/segment-anything-langsplat? /submodules/segment-anything-langsplat/segment_anything ? or LangSplat ?
image

Can we have a non-baidu pretrained model link?

I'd like to play with the pretrained models, but it's very cumbersome to figure out how to download from baidu from US. Can we upload them to Google Drive or something easier? Thanks!

Different feature shape?

language_feature = torch.zeros((self._xyz.shape[0], 3), device="cuda") from here

Why are language features set to 3 dimensions? In the L1 loss computation then we have (3,H,W) predicted features and (512,H,W) precomputed GT features. Is this normal / supposed to be a tunable parameter as in Table 7 of the paper?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.