Git Product home page Git Product logo

Comments (10)

Nik-V9 avatar Nik-V9 commented on July 29, 2024 1

Hi, Thanks for trying out the code!

SplaTAM requires depth input for running SLAM & reconstruction. Our dataloaders, by default, expect both an rgb and depth folder. We haven't tested the offline setting in the NeRFCapture App. We have only tested our scripts, which interface with NeRFCapture in Online mode.

I just checked capturing an offline dataset with NeRFCapture. It looks like both the RGB and depth pngs are saved under the images folder. You would need to move the .depth.png images to a depth folder and rename them to .png in addition to the renaming of images folder. This should be a pretty simple script to write.

from splatam.

Nik-V9 avatar Nik-V9 commented on July 29, 2024 1

I think the offline mode in the Nerfcapture app is broken as pointed out by the app's developer here jc211/NeRFCapture#10 (comment). I tried renaming the files yesterday but that does not seem to cut it: the tensor dimensions were off and the depth PNG files themselves looked wrong: they do not seem to store the full depth range. This is probably a bit different to how the online mode works.

Yes, this is correct. Looks like the offline mode is broken. So far, we have only used the online mode.

I'm actually trying to get this to work using our own iOS data collection app (not related to Nerfcapture or SplaTAM), see here for details, but I'm not sure if we got the depth conversion correct yet.

This looks like a cool app.

So to summarize, if an app/script would export RGBD data where the depth PNGs have a depth scale is 6553.5, and the camera intrinsics are correctly set in transforms.json, it probably should work?

Yes! We need depth and intrinsics in addition to RGB for SLAM. The depth scale doesn't specifically have to be 6553.5 (as long as the pixel intensity to meter scaling is known). That's what our iPhone dataloader is currently hardcoded to:

config_dict["camera_params"]["png_depth_scale"] = 6553.5 # Depth is in mm

from splatam.

oseiskar avatar oseiskar commented on July 29, 2024

I think the offline mode in the Nerfcapture app is broken as pointed out by the app's developer here jc211/NeRFCapture#10 (comment). I tried renaming the files yesterday but that does not seem to cut it: the tensor dimensions were off and the depth PNG files themselves looked wrong: they do not seem to store the full depth range. This is probably a bit different to how the online mode works.

I'm actually trying to get this to work using our own iOS data collection app (not related to Nerfcapture or SplaTAM), see here for details, but I'm not sure if we got the depth conversion correct yet.

If I understood this comment correctly:

  • the online mode reads (uint32 or float32?) data from the Nerfcapture app
  • then scales that by some number (1/10?) and saves to PNG
  • the scale of the original depth image (before saving to PNG) is assumed to be 65535 units = 1m
  • so the PNG depth scale here is 6553.5 units = 1m
  • other datasets are configured to use other depth scaling, more typically 1000 units = 1m, i.e., depth in millimeters

So to summarize, if an app/script would export RGBD data where the depth PNGs have a depth scale is 6553.5, and the camera intrinsics are correctly set in transforms.json, it probably should work?

from splatam.

LemonSoda-RPG avatar LemonSoda-RPG commented on July 29, 2024

Hi, Thanks for trying out the code!

SplaTAM requires depth input for running SLAM & reconstruction. Our dataloaders, by default, expect both an rgb and depth folder. We haven't tested the offline setting in the NeRFCapture App. We have only tested our scripts, which interface with NeRFCapture in Online mode.

I just checked capturing an offline dataset with NeRFCapture. It looks like both the RGB and depth pngs are saved under the images folder. You would need to move the .depth.png images to a depth folder and rename them to .png in addition to the renaming of images folder. This should be a pretty simple script to write.

Hello, I am running a program using WSL and I don't know how to keep WSL and my phone on the same network segment. Therefore, I used nerfcapture for offline data collection. However, after the collection was completed, I found that the folder on my phone only contains color images and transformer. json, and does not include depth maps. My phone is an Apple 14, is it unable to capture depth maps?

from splatam.

StarsTesla avatar StarsTesla commented on July 29, 2024

@Nik-V9 I certainly check the images dir, there is only rgb image. Does the data need collect by iPhone with lidar? Maybe here could be improved to use something like MiDAS or MVSnet to get the depth?

from splatam.

H-tr avatar H-tr commented on July 29, 2024

Hi, Thanks for trying out the code!
SplaTAM requires depth input for running SLAM & reconstruction. Our dataloaders, by default, expect both an rgb and depth folder. We haven't tested the offline setting in the NeRFCapture App. We have only tested our scripts, which interface with NeRFCapture in Online mode.
I just checked capturing an offline dataset with NeRFCapture. It looks like both the RGB and depth pngs are saved under the images folder. You would need to move the .depth.png images to a depth folder and rename them to .png in addition to the renaming of images folder. This should be a pretty simple script to write.

Hello, I am running a program using WSL and I don't know how to keep WSL and my phone on the same network segment. Therefore, I used nerfcapture for offline data collection. However, after the collection was completed, I found that the folder on my phone only contains color images and transformer. json, and does not include depth maps. My phone is an Apple 14, is it unable to capture depth maps?

Hi, I also faced the same issue and found only Pro support the LiDAR
image

from splatam.

Nik-V9 avatar Nik-V9 commented on July 29, 2024

@Nik-V9 I certainly check the images dir, there is only rgb image. Does the data need collect by iPhone with lidar? Maybe here could be improved to use something like MiDAS or MVSnet to get the depth?

Yes, you need a LiDAR-equipped iPhone for the demo.

Using a depth estimation network would make the method up to scale (not metric) since monocular depth wouldn't have scale. Your camera tracking performance would be influenced by the accuracy and multi-view consistency of the depth estimation network. An RGB-only SLAM method using 3D Gaussians is currently future research and might be one of the things we might consider.

from splatam.

pablovela5620 avatar pablovela5620 commented on July 29, 2024

I think the offline mode in the Nerfcapture app is broken as pointed out by the app's developer here jc211/NeRFCapture#10 (comment). I tried renaming the files yesterday but that does not seem to cut it: the tensor dimensions were off and the depth PNG files themselves looked wrong: they do not seem to store the full depth range. This is probably a bit different to how the online mode works.

Yes, this is correct. Looks like the offline mode is broken. So far, we have only used the online mode.

I'm actually trying to get this to work using our own iOS data collection app (not related to Nerfcapture or SplaTAM), see here for details, but I'm not sure if we got the depth conversion correct yet.

This looks like a cool app.

So to summarize, if an app/script would export RGBD data where the depth PNGs have a depth scale is 6553.5, and the camera intrinsics are correctly set in transforms.json, it probably should work?

Yes! We need depth and intrinsics in addition to RGB for SLAM. The depth scale doesn't specifically have to be 6553.5 (as long as the pixel intensity to meter scaling is known). That's what our iPhone dataloader is currently hardcoded to:

config_dict["camera_params"]["png_depth_scale"] = 6553.5 # Depth is in mm

where does the 6553.5 number come from? I'm also trying to get this working, I see you use a depth_scale of 10 and this magic number of 6553.5 but I don't fully understand. What is encoded into the depth image, to see the actual metric value, I would need to divide by the 6553.5 and multiply by 10?

from splatam.

Nik-V9 avatar Nik-V9 commented on July 29, 2024

Hi @pablovela5620, the 6553.5 is the scaling factor for the depth png image. When you load the depth image, you need to divide the pixel values by this number to get metric depth. By default, the iPhone depth image has a pixel intensity of 65535, corresponding to 1 meter. When we save the depth image, we divide this by 10 and save the depth image.

from splatam.

jeezrick avatar jeezrick commented on July 29, 2024

I think the offline mode in the Nerfcapture app is broken as pointed out by the app's developer here jc211/NeRFCapture#10 (comment). I tried renaming the files yesterday but that does not seem to cut it: the tensor dimensions were off and the depth PNG files themselves looked wrong: they do not seem to store the full depth range. This is probably a bit different to how the online mode works.

Yes, this is correct. Looks like the offline mode is broken. So far, we have only used the online mode.

I'm actually trying to get this to work using our own iOS data collection app (not related to Nerfcapture or SplaTAM), see here for details, but I'm not sure if we got the depth conversion correct yet.

This looks like a cool app.

So to summarize, if an app/script would export RGBD data where the depth PNGs have a depth scale is 6553.5, and the camera intrinsics are correctly set in transforms.json, it probably should work?

Yes! We need depth and intrinsics in addition to RGB for SLAM. The depth scale doesn't specifically have to be 6553.5 (as long as the pixel intensity to meter scaling is known). That's what our iPhone dataloader is currently hardcoded to:

config_dict["camera_params"]["png_depth_scale"] = 6553.5 # Depth is in mm

where does the 6553.5 number come from? I'm also trying to get this working, I see you use a depth_scale of 10 and this magic number of 6553.5 but I don't fully understand. What is encoded into the depth image, to see the actual metric value, I would need to divide by the 6553.5 and multiply by 10?

Because they try to save the depth array like this:

def save_depth_as_png(depth, filename, png_depth_scale):
    depth = depth * png_depth_scale
    depth = depth.astype(np.uint16)
    depth = Image.fromarray(depth)
    depth.save(filename)

when doing it like this, you need to consider that the range of uint16 is from 0 to 65535(2 ^16 -1). So, I guess what they did must be first clamp actual depth value to [0, 10.0], than multiply it by 6553.5, then convert it to uint16 without the risk of overflowing (but lose some accuracy from float to int).
So, after loading the image, just divide it by 6553.5, and you get depth value after clamp.
like this part of code at SplaTAM/datasets/gradslam_datasets/basedataset.py:

    def _preprocess_depth(self, depth: np.ndarray):
        r"""Preprocesses the depth image by resizing, adding channel dimension, and scaling values to meters. Optionally
        converts depth from channels last :math:`(H, W, 1)` to channels first :math:`(1, H, W)` representation.

        Args:
            depth (np.ndarray): Raw depth image

        Returns:
            np.ndarray: Preprocessed depth

        Shape:
            - depth: :math:`(H_\text{old}, W_\text{old})`
            - Output: :math:`(H, W, 1)` if `self.channels_first == False`, else :math:`(1, H, W)`.
        """
        depth = cv2.resize(
            depth.astype(float),
            (self.desired_width, self.desired_height),
            interpolation=cv2.INTER_NEAREST,
        )
        if len(depth.shape) == 2:
            depth = np.expand_dims(depth, -1)
        if self.channels_first:
            depth = datautils.channels_first(depth)
        return depth / self.png_depth_scale

from splatam.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.