minghanqin / langsplat Goto Github PK
View Code? Open in Web Editor NEWOfficial implementation of the paper "LangSplat: 3D Language Gaussian Splatting" [CVPR2024 Highlight]
Home Page: https://langsplat.github.io/
License: Other
Official implementation of the paper "LangSplat: 3D Language Gaussian Splatting" [CVPR2024 Highlight]
Home Page: https://langsplat.github.io/
License: Other
Very amazing results for your LangSplat! Actually, I'm writing because I still have some questions about your work.
As you showed on your official website, the rendered feature map has stable results in different views. The color represents the different features. However, the effect shown in Fig.1 in your paper does not seem to achieve this multi-view consistency. So I want to ask how you achieve this consistency in the video.
Hi,
Thanks for great work and appreciate the released code. As for the evaluation code, I think currently it only supports Lerf dataset because it provides the GT label. How can I evaluate the results on 3D-OVS dataset? Thanks!
Thank you very much for your excellent work!
However, while following this project, I am unsure how to display a demo similar to the effects in your demo. Could you please tell me how to do it? Thank you!
-First of all, congrats on your CVPR2024 Highlight!-
I have been facing issue setting up conda environment.
Do I have to do something besides conda env create --file environment.yml
?
I encountered similar errors from #13 but have not been able to solve it.
if there are any alternative way to setup the environment, I'd like to know how.
Or a step by step explanation on how to do a setup would be very helpful...
im currently using cudatoolkit version 11.7
Hi,
Thanks a lot for sharing great work. I was curious about how to generate 3D semantic point cloud and mesh.
The point cloud saved/generated in the output folder doesn't have the semantic colours.
Is there any already code available in this repository?
Hi, I encountered such error when I ran preprocess.py
. Any idea how to solve it? Thank you!
(langsplat) ➜ LangSplat git:(main) ✗ python preprocess.py --dataset_path ./data/snacks
[ INFO ] Encountered quite large input images (>1080P), rescaling to 1080P.
If this is not desired, please explicitly specify '--resolution/-r' as 1
Traceback (most recent call last):
File "preprocess.py", line 126, in create
img_embed, seg_map = _embed_clip_sam_tiles(img.unsqueeze(0), sam_encoder)
File "preprocess.py", line 178, in _embed_clip_sam_tiles
seg_images, seg_map = sam_encoder(aug_imgs)
File "preprocess.py", line 299, in sam_encoder
masks_default, masks_s, masks_m, masks_l = mask_generator.generate(image)
ValueError: too many values to unpack (expected 4)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "preprocess.py", line 404, in <module>
create(imgs, data_list, save_folder)
File "preprocess.py", line 128, in create
raise ValueError(timer)
ValueError: 1
Hi @minghanqin , this is an interesting work and the performance is impressive! 🙂
In the paper you mention that:
"we extend the LERF dataset by annotating ground truth masks for textual queries, enabling the evaluation of the
open-vocabulary 3D semantic segmentation on the LERF dataset....."
"Therefore, we
further manually annotated additional challenging localization samples to better evaluate method performance."
Could you please share your newly annotated dataset (and if possible, the eval code too), so that it's possible to have a fair comparison with the LangSplat method?
Thank you!
Yash
From gaussian_renderer/init.py I see you modify the cuda operations to rasterize language features. May you provide your modified package codes? I fail to find it. Thanks for your great work!
Thanks for your excellent work. I noticed the rendered segmentation maps generated by render.py are all black, no matter whether on the sofa dataset provided with the pre-trained model or my dataset. Could you help me fix this?
Hi, thanks for the work.
I see the metric table in the paper, and it puzzles me to render relevancy map for Lerf, because Lerf doesn't provide how to render it. Would you share the corresponding script?
~~Great paper.Can the autoencoder be trained in real-time? Or is the usage scenario of this model limited by the scene-wise language autoencoder? ~~
Never mind, stupid question.
What I really want to know is, how fast is the training process from a set of image to the language 3D field?
And how much of images needed?
I am wondering if this method can be applied to real time robot navigation.
A quick check about where to find the "chkpnt30000.pth". In original 3D gaussian, we don't have such an output. Can you tell me what's inside this file or how can we get this file?
Thanks for your great works! Since LERF doesn't implement their custom ns-render under nerfstudio framework, I just wonder how can we evaluate their segmentation results or relavancy maps on specific viewpoints. Thanks!
The problems I have encountered are:
一:As mentioned earlier, placing the pre trained model in the output
1:Is the pre trained model only applicable to Baidu Cloud's pre trained model? Or preprocessed_dataset?
2:Where should the output folder be placed
二:Do I need to modify the instruction for quick start? For example, $CASEAME
Thank you very much for your help,
File "LangSplat/scene/cameras.py", line 73, in get_language_feature
seg = seg_map[:, y, x].squeeze(-1).long()
IndexError: index 4 is out of bounds for dimension 1 with size 4
can you fix the bug?
and there are some bugs in the process.sh file also.
I tested on 3D-OVS data. It can achieve similar results (IoU) on the sofa scene.
But the performance for other scenes are not good as expeted.
I noted the language feature images are well-trained, the reason should be the setting of the eval code, such as threshold and the kernal size. Does it mean we need to try the setting manually to achieve the best results?
Below is the sample of the bench scene, including language feature image, groundtruth and the predicted mask. Do you have any suggestion?
I cannot download files from baidu :(
lerf website: https://www.lerf.io/. Thanks!
Hi,
I have tried to set include_feature=False for the original 3DGS training, but I encounter an error after the model runs several iterations:
Training progress: 2%|█▋ | 700/30000 [00:46<29:56, 16.31it/s, Loss=0.0869420]
[CUDA ERROR] in cuda_rasterizer/rasterizer_impl.cu
Line 415: an illegal memory access was encountered [14/02 23:44:58]
An error occured in backward. Writing snapshot_bw.dump for debugging. [14/02 23:44:58]
[14/02 23:44:58]
Traceback (most recent call last):
File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/__main__.py", line 39, in <module>
cli.main()
File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 430, in main
run()
File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/adapter/../../debugpy/launcher/../../debugpy/../debugpy/server/cli.py", line 284, in run_file
runpy.run_path(target, run_name="__main__")
File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 321, in run_path
return _run_module_code(code, init_globals, run_name,
File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 135, in _run_module_code
_run_code(code, mod_globals, init_globals,
File "/home/kaizhi/.vscode-server/extensions/ms-python.debugpy-2024.0.0-linux-x64/bundled/libs/debugpy/_vendored/pydevd/_pydevd_bundle/pydevd_runpy.py", line 124, in _run_code
exec(code, run_globals)
File "train.py", line 231, in <module>
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train.py", line 104, in training
loss.backward()
File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/_tensor.py", line 487, in backward
torch.autograd.backward(
File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/autograd/__init__.py", line 200, in backward
Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/torch/autograd/function.py", line 274, in apply
return user_fn(self, *args)
File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 147, in backward
raise ex
File "/home/kaizhi/.conda/envs/langsplat/lib/python3.10/site-packages/diff_gaussian_rasterization/__init__.py", line 143, in backward
grad_means2D, grad_colors_precomp, grad_language_feature_precomp, grad_opacities, grad_means3D, grad_cov3Ds_precomp, grad_sh, grad_scales, grad_rotations = _C.rasterize_gaussians_backward(*args)
RuntimeError: an illegal memory access was encountered
I can successfully train with the original 3DGS code.
Thanks for your great work!
I would like to know when will the preprocessed LERF dataset be available?
I use the dataset you provided for training, but encounter the following issue at checkpoint 7000.
testing for iter 7000 [28/01 20:59:13]
[ITER 7000] Evaluating train: L1 0.01737641841173172 PSNR 30.760613250732423 [28/01 20:59:26]
Traceback (most recent call last):
File "/data/LangSplat/train.py", line 231, in <module>
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "/data/LangSplat/train.py", line 116, in training
training_report(tb_writer, iteration, Ll1, loss, l1_loss, iter_start.elapsed_time(iter_end), testing_iterations, scene, render, (pipe, background, opt))
File "/data/LangSplat/train.py", line 200, in training_report
tb_writer.add_histogram("scene/opacity_histogram", scene.gaussians.get_opacity, iteration)
File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/writer.py", line 485, in add_histogram
histogram(tag, values, bins, max_bins=max_bins), global_step, walltime
File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py", line 358, in histogram
hist = make_histogram(values.astype(float), bins, max_bins)
File "/data/anaconda3/envs/langsplat/lib/python3.9/site-packages/torch/utils/tensorboard/summary.py", line 386, in make_histogram
cum_counts = np.cumsum(np.greater(counts, 0, dtype=np.int32))
TypeError: No loop matching the specified signature and casting was found for ufunc greater
It seems that this happened when plotting the histogram. Do you know how to resolve it?
Dear author,
Thanks for your excellent work.
I evaluated the released pretrained_model on teatime. The localization accuracy is only 0.1017.
Do you have any idea what the problem is.
2024-04-25 16:47:42,245 - teatime - INFO - trunc thresh: 0.4
INFO:teatime:trunc thresh: 0.4
2024-04-25 16:47:42,245 - teatime - INFO - iou chosen: 0.0245
INFO:teatime:iou chosen: 0.0245
2024-04-25 16:47:42,248 - teatime - INFO - chosen_lvl:
[array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0)]
INFO:teatime:chosen_lvl:
[array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0), array(0)]
2024-04-25 16:47:42,248 - teatime - INFO - Localization accuracy: 0.1017
INFO:teatime:Localization accuracy: 0.1017
I couldn't find any code for inputting text and querying the most relevant 3D Gaussians in the code repository. Will it be provided later?
I want to ask where the Ground Truth data in the paper comes from. I use the trained decoder to get the 'language_feature_dim3', and draw the array with shape of (H, W, 3) using the 'plt.imshow'. But the quality is worse than the GT image in the paper.
Could you tell me how do you get and draw the GT image in the paper? Thank you!
I tried to explore the more dimension listed in the paper(which in practice paper used language feature as dim = 3).
I tried to use dim = 4 now.
my steps:
python train.py -s $dataset_path -m output/${casename} --start_checkpoint $dataset_path/$casename/chkpnt30000.pth --feature_level ${level}
and here is the error I got:
Traceback (most recent call last):
File "train.py", line 240, in <module>
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train.py", line 99, in training
gt_language_feature, language_feature_mask = viewpoint_cam.get_language_feature(language_feature_dir=dataset.lf_path, feature_level=dataset.feature_level)
File "/datadrive/yingwei/LangSplat/scene/cameras.py", line 94, in get_language_feature
return point_feature.cuda(), mask.cuda()
RuntimeError: CUDA error: an illegal memory access was encountered
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
after I added CUDA_LAUNCH_BLOCKING = 1:
/Traceback (most recent call last):
File "train.py", line 240, in <module>
training(lp.extract(args), op.extract(args), pp.extract(args), args.test_iterations, args.save_iterations, args.checkpoint_iterations, args.start_checkpoint, args.debug_from)
File "train.py", line 93, in training
render_pkg = render(viewpoint_cam, gaussians, pipe, background, opt)
File "/datadrive/yingwei/LangSplat/gaussian_renderer/__init__.py", line 113, in render
"visibility_filter" : radii > 0,
RuntimeError: CUDA error: an illegal memory access was encountered
I tried this RP: graphdeco-inria/gaussian-splatting#41 (comment) but didn't work
I basically can conclude the issue didn't happen in my CUDA or pytorch, since I can run when I set language feature dim as 3 smoothly.
Are there any other changes I need to edit besides NUM_CHANNELS_language_feature?
Hi, really nice work!
I wanted to ask you, how did you compute the query speed of tables 3 and 4 of the paper?
I've observed that the current implementation stores each level of language features on a different set of gaussians. Therefore you require 3 rasterization steps to obtain the 2D language features for a given point of view.
Are you taking this rasterization and the posterior decoding time into account, or you are just measuring the query matching with the CLIP features?
Thank you!
Hello!
First of all, I'd like to extend my appreciation for the work put into this project.
I've been exploring the code related to training the autoencoder model, specifically within the "train.py" file. I came across an inconsistency in the calculation of loss between the training and evaluation phases.
During training, the loss is defined as follows: loss = l2loss + cosloss * 0.001. However, during evaluation, the loss seems to be calculated slightly differently: loss = l2_loss(outputs, data) + cos_loss(outputs, data), where the cos_loss term is not multiplied by 0.001.
I'm curious to understand whether this difference is intentional or if it might be an oversight. If intentional, I'd appreciate some insight into the rationale behind this choice.
Thanks!
Is there any process to speed up the preprocess? Currently, I need almost 2 mins for one image on A6000.
Dear Author,
Thanks for your great work!
I am wondering how to conduct step 2, could you please provide a more explicit script for this part?
Many thanks in advance.
Bests,
Runsong
Hi,
I tried to run the eval.sh
for evaluation, but I found the result is all blank using the provided pre-trained autoencoder (download from 'pretrained_model/ckpt') and language-embedded Gaussian splat (download from 'pretrained_model/output').
I also tried to follow the process.sh
file to self-train the autoencoder and language-embedded Gaussian splat, but the heatmaps are still blank.
All images under LangSplat/eval_result/teatime/***/heatmap
are looking like this
However, the renders I got are all looking good:
LangSplat/output/teatime/teatime_1/train/ours_None/renders/00000.png
Does anyone get the evaluation work or face the same problem as me?
Thank you so much for sharing your great work.
When I used your code and pre-trained weights to reproduce the segmentation results for the "sofa" scene, my mIoU results were calculated to be just over 70. My current calculation method is to calculate mIoU separately for all "three relevancy maps" generated by any "positive" and then average them. I don't know if my calculation manner is correct.
In addition, the segmentation map generated for the 'grey sofa' in the 'sofa' scene is completely black. Do you have any suggestions to help me improve my results?
Hi there!
I'm trying to get started with your repository, but when I click on the "[Pre-trained Models]" link in the README it redirects me to the README.
I have a lingering question that has been on my mind, and I was hoping you could help clarify it for me.
The focal point of the paper is "3D Scene Querying," but upon reading it, I find myself pondering whether it is feasible to query a set of 3D Gaussians.
To elaborate, let's consider a scenario where I have five million trained Gaussians representing an unfamiliar scene. My objective is to locate the position of a 'TV.'
Can I use the term 'TV' to determine the 3D spatial coordinates of the TV from this bunch of 3DGS and retrieve its corresponding image (i.e., determine the appropriate camera position for rendering)? How can you query a set of probabilistic distributions using a text prompt?
Alternatively, is my only option to query the rendered egocentric 2D image? If the TV is not present in the image, does that imply there is no means for me to ascertain the where the TV is ?
I appreciate your expertise and insights into this matter.
missing glm package here:
https://github.com/g-truc/glm/tree/5c46b9c07008ae65cb81ab79cd677ecc1934b903
While running preprocess.py, I have received intermittent memory access errors.
I have been using the ramen dataset from the link provided and running python preprocess.py --dataset_path data/ramen
.
Randomly, it will give me memory errors like double free or corruption (!prev) Aborted (core dumped)
, corrupted size vs. prev_size Aborted
and additionally a pytorch error that has been patched in pytorch 2.0.1 when trying to run on multithreaded CPU.
Setting the threads to 1 with torch.set_num_threads(1)
fixes the pytorch issue but makes the preprocessing speed very slow.
I was wondering if there was any advice on how to fix these issues happening during the preprocess script execution.
Hi
First of all, thanks for your amazing job here.
I have couple of question for setting LangSplat up and need your guidance.
conda env create --file environment.yml
, I encountered an error below, is this submodule missing in the repo?Pip subprocess error:
ERROR: Invalid requirement: 'submodules/langsplat-rasterization' (from line 2 of /mnt/c/Users/lzlal/LangSplat/condaenv.75uxspmj.requirements.txt)
Hint: It looks like a path. File 'submodules/langsplat-rasterization' does not exist.
failed
CondaEnvException: Pip failed
segment-anything-langsplat
is one of the sub-module listed in environment.yml
, so I think it woud be installed once the command conda env create --file environment.yml
is completed successfully. However, you also mention segment-anything-langsplat
must be installed and that makes me confused, so...do you mean that there is an additional installation process for segment-anything-langsplat
or just to download the checkpoints of SAM would work? /ckpts
? Is it under LangSplat/submodules/segment-anything-langsplat
? /submodules/segment-anything-langsplat/segment_anything
? or LangSplat
?I'd like to play with the pretrained models, but it's very cumbersome to figure out how to download from baidu from US. Can we upload them to Google Drive or something easier? Thanks!
language_feature = torch.zeros((self._xyz.shape[0], 3), device="cuda")
from here
Why are language features set to 3 dimensions? In the L1 loss computation then we have (3,H,W) predicted features and (512,H,W) precomputed GT features. Is this normal / supposed to be a tunable parameter as in Table 7 of the paper?
RT
Hi, authors! Thanks for your great work!
When I run the QuickStart, it seems that I need to specify the source path.
In addition, there seems something wrong in 'cfg_args' that misses the attribute 'language_features_name'.
Would you please offer some help to solve it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.