Git Product home page Git Product logo

segmentanything3d's Introduction

Segment Anything 3D

We extend Segment Anything to 3D perception by transferring the segmentation information of 2D images to 3D space. We expect that the segment information can be helpful to 3D traditional perception and the open world perception. This project is still in progress, and it will be embedded into our perception codebase Pointcept. We very much welcome any issue or pull request.

Result

Example mesh is available here.

Installation

conda create -n sam3d python=3.8 -y
conda activate sam3d
# Choose version you want here: https://pytorch.org/get-started/previous-versions/
conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch
conda install plyfile -c conda-forge -y
pip install scikit-image opencv-python open3d imageio
pip install git+https://github.com/facebookresearch/segment-anything.git 

cd libs/pointops
# usual
python setup.py install
# docker & multi GPU arch
TORCH_CUDA_ARCH_LIST="ARCH LIST" python  setup.py install
# e.g. 7.5: RTX 3000; 8.0: a100 More available in: https://developer.nvidia.com/cuda-gpus
TORCH_CUDA_ARCH_LIST="7.5 8.0" python  setup.py install
cd ../..

Data Preparation

ScanNet v2

Download the ScanNet v2 dataset.
Run preprocessing code for raw ScanNet as follows:

# RAW_SCANNET_DIR: the directory of downloaded ScanNet v2 raw dataset.
# PROCESSED_SCANNET_DIR: the directory of processed ScanNet dataset (output dir).
python scannet-preprocess/preprocess_scannet.py --dataset_root ${RAW_SCANNET_DIR} --output_root ${PROCESSED_SCANNET_DIR}
  • Prepare RGBD data (follow BPNet)
python scannet-preprocess/prepare_2d_data/prepare_2d_data.py --scannet_path data/scannetv2 --output_path data/scannetv2_images --export_label_images

Getting Started

Please try it via sam3d.py

# RGB_PATH: the path of rgb data
# DATA_PATH: the path of pointcload data
# SAVE_PATH: Where to save the pcd results
# SAVE_2DMASK_PATH: Where to save 2D segmentation result from SAM
# SAM_CHECKPOINT_PATH: the path of checkpoint for SAM

python sam3d.py --rgb_path $RGB_PATH --data_path $DATA_PATH --save_path $SAVE_PATH --save_2dmask_path $SAVE_2DMASK_PATH --sam_checkpoint_path $SAM_CHECKPOINT_PATH 

Pipeline

Our SAM3D pipeline looks as follows:

  1. SAM Generate Masks
    Use SAM to get the segmentation masks on 2D frames and then map them into the 3D space via depth information.
Image
  1. Merge Two Adjacent Pointclouds
    Use "Bidirectional-group-overlap-algorithm" (modified from ContrastiveSceneContexts) to merge two adjacent pointclouds.
Image
  1. Region Merging Method
    Merge the entire pointcloud by region merging method.
Image
  1. Merge 2 Segmentation Results
    We apply Felzenswalb and Huttenlocher's Graph Based Image Segmentation algorithm to the scenes using the default parameters. Please refer to the original repository for details. Then merge the 2 segmentation results to get the final result (merging code is in sam3d.py/pcd_ensemble).
Image

Citation

If you find SAM3D useful to your research, please cite our work:

@misc{yang2023sam3d,
      title={SAM3D: Segment Anything in 3D Scenes}, 
      author={Yunhan Yang, Xiaoyang Wu, Tong He, Hengshuang Zhao and Xihui Liu},
      year={2023},
      eprint={2306.03908},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgements

SAM3D is inspirited by the following repos: Segment Anything, Pointcept, BPNet, ContrastiveSceneContexts.

segmentanything3d's People

Contributors

gofinge avatar kenomo avatar small-zeng avatar sohaibanwaar avatar yhyang-myron avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

segmentanything3d's Issues

The run results did not meet expectations.

Why do I get different results when I follow the tutorial step by step to configure the environment and run the code?

As shown below, there are a lot of super pixels that are not merged together.

I have tried to reconfigure the environment and run the unmodified source code, but it still fails.

image

image

image

Inputs required

Hey! First thank you for exposing your work :)

Sorry I'm a little confused by what is required to run the script once prepared

  • Does the RGB_PATH points to a single image that will be segmented?
  • What does the DATA_PATH must point to? An original .ply file as per the examples?

Thank you

Change the dataset

Hi, if I want to change the dataset (e.g. S3DIS), how should I generate the over-segmentation result in step 4?

Can we determine which color belongs to which class through object detection module?

Thank your for your excellent work!
I was considering the following approach: What if we integrate an object detection module into 2D images, and then map the detection results onto a 3D point cloud? This way, we could determine the classes of each group within the point cloud segmentation results. Do you think this approach would be feasible for the project? Thanks!

Result Reproduction

Hey, thanks for your work.

I reproduce the results in scene0000_00, I follow the default setting in your repo, but I get about 817 instances in an scene, which is seemed not reasonable.

I think maybe there are something wrong when I prepare the dataset. So I want to check that whether the frames_skip is 20 when prepare the rgbd data, and did your rgbd frames are come from .sens file? Or maybe other issues that I haven't noticed...

Thank you!

Unable to get access to the database

Hello
I have requested access to the ScanNet database since last week and I didn't get any answer till now
how can I run the code in this case?

Please clarify how to visualize the pointclouds?

Hi there,

Thank you for this excellent work, your code is very clear for something this new. Congrats to the team!

I apologize if this was a lot to read, I just want to be through with my question.

Main Question

I ran your code on scene0000_00 from the scannet dataset, and the resulting segmentation looks much worse than your results on your github repo. In my results, the segmentation is very messy and oversegmented, leading me to think I am doing something wrong.

See here, where I'm plotting the original colors, sam3d segmentation, instance_gt, semantic_gt20, semantic_gt200.
my_sam3d_masks

Compared to your results:
image

What's the difference in your results between the SAM3D and the SAM3D Merged output? How is each one produced?

I'm just trying to figure out what I might be doing wrong.

Extra details:

Detail 1

I had to comment out this line

color_image = cv2.resize(color_image, (640, 480))

Otherwise, the code would crash due to a dimension mismatch error between the masks and the image. Why are you resizing the image? Could this be related to my results?

Detail 2

Another thing, since I cannot download the whole scannet dataset, I'm downloading only individual scenes, and that doesn't download/create any intrinsic_depth.txt file. I am creating it manually from the scans/scene0000_00/scene0000_00.txt text file, which has content like this:

axisAlignment = 0.945519 0.325568 0.000000 -5.384390 -0.325568 0.945519 0.000000 -2.871780 0.000000 0.000000 1.000000 -0.064350 0.000000 0.000000 0.000000 1.000000 
colorHeight = 968
colorToDepthExtrinsics = 0.999973 0.006791 0.002776 -0.037886 -0.006767 0.999942 -0.008366 -0.003410 -0.002833 0.008347 0.999961 -0.021924 -0.000000 0.000000 -0.000000 1.000000
colorWidth = 1296
depthHeight = 480
depthWidth = 640
fx_color = 1170.187988
fx_depth = 571.623718
fy_color = 1170.187988
fy_depth = 571.623718
mx_color = 647.750000
mx_depth = 319.500000
my_color = 483.750000
my_depth = 239.500000
numColorFrames = 5578
numDepthFrames = 5578
numIMUmeasurements = 11834
sceneType = Apartment

Detail 3

Here is the code I am using to make my plots. I noticed you don't have any plotting code. Maybe there is a mistake in mine?

import torch
import open3d as o3d
import numpy as np

# Define the file paths
pcd_filepath = '/pcd_gt_data/train/scene0000_00.pth'  # Replace with your file path
pcd_seg_filepath = '/sam_3d_out/scene0000_00.pth'  # Replace with your file path

# Load point cloud data
pcd_data = torch.load(pcd_filepath)
seg_data = torch.load(pcd_seg_filepath)

# Get the coordinates
coordinates = pcd_data['coord'].astype('float64')  # convert to float64 for o3d

# Get the unique labels in seg_data
unique_labels = np.unique(seg_data)

# Generate random colors for each label
color_map = {label: np.random.rand(3) for label in unique_labels}

# Create an array for colors using the color_map
colors = np.array([color_map[label] for label in seg_data])

# Create a PointCloud object
pcd = o3d.geometry.PointCloud()

# Assign coordinates to the PointCloud object
pcd.points = o3d.utility.Vector3dVector(coordinates)

# Assign colors to the PointCloud object
pcd.colors = o3d.utility.Vector3dVector(colors)

# Visualize the point cloud
o3d.visualization.draw_geometries([pcd])

Thank you for your time and help in advance!

Optimal parameter values

I would like to ask you, in the fourth step of Pipeline, in the command ". /segmentator input.ply [kThresh=0.01] [segMinVerts=20]", what is the optimal value to set for the parameters "kThresh" and "segMinVerts"?

What the lib "pointops" for

Hi,
I'm kind of a rookie here. Could someone please tell me something about the lib "pointops"? I searched it on google but got insufficient results. I guess it is some tool for point operations and saw "pointops_cuda" when searching, but I wonder where is its source?
Thanks a lot.

2D image number for segment

Thank you for your excellent work!
I wonder how many 2D images are required at minimum for segmenting a 3D point cloud? Thanks!

missing intrinsics.txt file

When I preprocess the RGBD data in the data preprocessing step, I processed it according to the prepare_2d_data.py file, and each scene only generated four folders of color, depth, label, and pose, but when I ran the sam3d.py file, it prompted me that I could not find the intrinsics.txt file under the intrinsics folder.This error is reported on the second line of the get_pcd function in the sam3d.py file, how can I generate intrinsics.txt data for each scene?

4cfedb57068d6a3f2de15d59160c6975

Value to determine camera angles

Hello!
I would like to use this package to swgment my own pointcloud data. However, it does not contain RGB values. Also, I don't have a pre-segmented ground truth to evaluate the outcome. My questikn is twofold:

  1. is it advisable to use the number of returns/intensity value as a substitute for RGB? Do I need to rescale the values to fall into EGB value ranges?

  2. is there a metric to determine you have accumulated enough training imagesfrom different angles from your pointcloud? I was thinking something like a pointwise-conteibution metric, determining how often a point has been captured in an image? Or, across multiple trainings sequences, stop when segments become more stable across predictions?

Why does "sam3d.py" run as a ".pth" file?

Why does "sam3d.py" run as a ".pth" file? Normally, shouldn't the result of semantic segmentation be a tag file or a point cloud file?

".pth" file is the final training result? If so, how should I test it?

License file

Hello,
Amazing work! Would it be possible for you to add a license file to the repo? (preferably MIT)
Best,
Florian

Error while running libs\pointops\setup.py

When running the setup.py I am getting the following error:

Traceback (most recent call last): File "setup.py", line 9, in <module> flag for flag in opt.split() if flag != '-Wstrict-prototypes' AttributeError: 'NoneType' object has no attribute 'split'
On further investigation, it seems that "get_config_vars('OPT')" is returning 'None'. Now I can't find what 'OPT' configuration is expected. Please help me with this.

I am using Windows 11, Cuda 11.7, and Python 3.8 (as recommended)

About Data Structure

Can someone take a picture of the folder structure after it has been prepared ?

RGBD Cameras

Hi,
Thanks for developing SAM3D. I have a L515 LiDAR that generates both RGB and Depth frames, or point clouds. How can I use your codes to prepare my own input dataset? I saw a link in the ReadMe, but it is not clear for me. Could you please clarify a little more?

Thank you

Couldn't locate scannetv2 while preparing RGBD data.

While preparing PointCloud data 3 different folders are created. but only the train folder got one .pth. Is it ok? Or am I mistaken? Please let me know about the point cloud data format after preparing it.

And RGBD preparation occurs File not found data/scannetv2.

cannot import open3d

import open3d leads to Illegal instruction (core dumped) on some machines. Fix by installing open3d-python instead of just open3d

Code error

Hello, how can I solve this problem?
Traceback (most recent call last):
File "sam3d.py", line 283, in
voxelize, args.th, train_scenes, val_scenes, args.save_2dmask_path)
File "sam3d.py", line 239, in seg_pcd
indices, dis = pointops.knn_query(1, gen_coord, offset, scene_coord, new_offset)
AttributeError: module 'pointops' has no attribute 'knn_query'

Using Customized Dataset

I want to use SAM3D for my own dataset which contains outdoor scenes. Is that possible? If it is how can I do that, can you help me with that?

Missing `instrinc_depth.txt` files when processing 2d image.

When I process 2d images with the following command, I found instrinc_depth.txt missing when generate results with sam3d.py

python scannet-preprocess/prepare_2d_data/prepare_2d_data.py --scannet_path data/scannetv2 --output_path data/scannetv2_images --export_label_images

Question about pcd_ensemble

Hi,
Thanks for your SAM3D. I have a question about "org_path" in sam3d.py/pcd_ensemble. Could you please explain what the "org_path" parameter in this function refers to?

Thank you

can't find intrinsic_depth.txt

when run sam3d will report intrinsics/intrinsic_depth.txt not found
i check code def export_intrinsics(self, output_path): in SegmentAnything3D/scannet-preprocess/prepare_2d_data/SensorData.py will save intrinsics/intrinsic_depth.txt. but not find prepare_2d_data call this function

The code ran into a problem

Do you have any questions about this step? Could you take a look for me? Is it the dataset? I only got fourpython 1.py -o scannet/ --type _vh_clean_2.ply
python 1.py -o scannet/ --type _vh_clean_2.labels.ply
python 1.py -o scannet/ --type _vh_clean_2.0.010000.segs.json
python 1.py -o scannet/ --type .aggregation.json.

python scannet-preprocess/prepare_2d_data/prepare_2d_data.py --scannet_path dataout/train --output_path data/scannetv2_images --export_la
bel_images
Namespace(export_label_images=True, frame_skip=20, label_map_file='D:\Desktop\dianyun\SegmentAnything3D-main\data\scannetv2\scannetv2-labels.combined.tsv', label_type='label-filt
', output_image_height=240, output_image_width=320, output_path='data/scannetv2_images', scannet_path='dataout/train')
Traceback (most recent call last):
File "scannet-preprocess/prepare_2d_data/prepare_2d_data.py", line 118, in
main()
File "scannet-preprocess/prepare_2d_data/prepare_2d_data.py", line 71, in main
label_map = util.read_label_mapping(opt.label_map_file, label_from='id', label_to='nyu40id')
File "D:\Desktop\dianyun\SegmentAnything3D-main\scannet-preprocess\prepare_2d_data\util.py", line 42, in read_label_mapping
if represents_int(mapping.keys()[0]):
TypeError: 'dict_keys' object is not subscriptable

Suggestion - Integrate MobileSAM into the pipeline for lightweight and faster inference

Reference: https://github.com/ChaoningZhang/MobileSAM

Our project performs on par with the original SAM and keeps exactly the same pipeline as the original SAM except for a change on the image encode, therefore, it is easy to Integrate into any project.

MobileSAM is around 60 times smaller and around 50 times faster than original SAM, and it is around 7 times smaller and around 5 times faster than the concurrent FastSAM. The comparison of the whole pipeline is summarzed as follows:

image

image

Best Wishes,

Qiao

Using SAM3D for Custom Point Clouds via Photogrammetry: Seeking Guidance on Implementation

Hey everyone,

I'm diving into SAM3D as a beginner and successfully used it with a scanner dataset using the command:
python sam3d.py --rgb_path $RGB_PATH --data_path $DATA_PATH --save_path $SAVE_PATH --save_2dmask_path $SAVE_2DMASK_PATH --sam_checkpoint_path $SAM_CHECKPOINT_PATH

After running this, I'm unsure about the next steps, particularly with the .pth data, and how to apply it. Now, I'm excited to apply SAM3D to my personal point cloud obtained through photogrammetry.

Could someone guide me through adjusting parameters or any additional steps necessary to use SAM3D with my photogrammetry-generated point cloud? Any advice or assistance would be incredibly helpful!

Thanks a lot!

scannetv2

scannetv2 dataset is too large, do we need to download it all? Or just download these four?
python 1.py -o scannet/ --type _vh_clean_2.ply
python 1.py -o scannet/ --type _vh_clean_2.labels.ply
python 1.py -o scannet/ --type _vh_clean_2.0.010000.segs.json
python 1.py -o scannet/ --type .aggregation.json

Error when trying to run setup.py, and requirements.txt

So when trying to install Sam3D, i ran into an error on the second part of the installation, when running the python setup.py install, i ran into this issue at the bottom:
(Having Python 3.8.10 installed, CUDA 11.8, and pytorch installed through their official website, last version)
image

It gives an error on the typing of that specific line, that i assume it should be a string.

I tried to run the requirements.txt, but lots of them require a higher python version, while the tensorflow-io-gcs-filesystem specifies one version that doesnt exist for the highest version of the python to which the requirements don't give any trouble (which i found to be Python 3.10).

I also noticed that the package "torch.utils.cpp_extension" is not found at the installation, if this gives more insight on the problem.

I tried with multiple CUDA and Python versions, but none of them seem to work, so I'm stuck here.

Thanks in advance if someone responds :)

The script sam3d.py ran into a problem.

I am getting this error for line 78 in sam3d.py script
color_image = np.reshape(color_image[mask], [-1,3])

IndexError: boolean index did not match indexed array along dimension 0; dimension is 480 but corresponding boolean dimension is 240

Help me, please.

Hi, I have tried many times to reproduce this work but still not achieving the desired result.

Do you have any other contact information that I would like to ask you for further advice. I hope you can check my code.

Thank you very much!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.