How to load and transform depth data? about imagebind HOT 4 CLOSED

facebookresearch commented on August 24, 2024

How to load and transform depth data?

from imagebind.

Comments (4)

zhang-ziang commented on August 24, 2024 1

I tried this code, it seems to work fine.

I tried the code, but still enconter some problems, here is my code and output.

import numpy as np
import torch
from PIL import Image

sensor_to_params = {
    "kv1": {
        "baseline": 0.075,
    },
    "kv1_b": {
        "baseline": 0.075,
    },
    "kv2": {
        "baseline": 0.075,
    },
    "realsense": {
        "baseline": 0.095,
    },
    "xtion": {
        "baseline": 0.095, # guessed based on length of 18cm for ASUS xtion v1
    },
}


def convert_depth_to_disparity(depth_file, intrinsics_file, sensor_type, min_depth=0.01, max_depth=50):
    """
    depth_file is a png file that contains the scene depth
    intrinsics_file is a txt file supplied in SUNRGBD with sensor information
            Can be found at the path: os.path.join(root_dir, room_name, "intrinsics.txt")
    """
    with open(intrinsics_file, 'r') as fh:
        lines = fh.readlines()
        focal_length = float(lines[0].strip().split()[0])
    baseline = sensor_to_params[sensor_type]["baseline"]
    depth_image = np.array(Image.open(depth_file))
    depth = np.array(depth_image).astype(np.float32)
    depth_in_meters = depth / 1000.
    if min_depth is not None:
        depth_in_meters = depth_in_meters.clip(min=min_depth, max=max_depth)
    disparity = baseline * focal_length / depth_in_meters
    return torch.from_numpy(disparity).float()

# ...

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = imagebind_model.imagebind_huge(pretrained=True)
model.eval()
model.to(device)

with torch.no_grad():
    for dep_file in tqdm(depth_files):
        sensor_type = ...
        disparity = convert_depth_to_disparity(dep_file, sensor_type, min_depth=0.01, max_depth=50).unsqueeze_(dim=0).to(device)
        print(disparity.shape)
        # Load data
        inputs = {
            ModalityType.DEPTH: disparity,
        }
        embeddings = model(inputs)

the print output: torch.Size([1, 530, 730]) imagebind throw an error:

RuntimeError: Given normalized_shape=[384], expected input with shape [*, 384], but got input of size[384, 45, 33]

Do you have any idea? Or could you share your code? Thanks a lot. :)

I solve the problem by resizing the tensor to the shape [B, 1, 224, 224], it seems to work well. :)

from imagebind.

tfwang08 commented on August 24, 2024

I tried this code, it seems to work fine.

from imagebind.

LinB203 commented on August 24, 2024

I tried this code, it seems to work fine.

We can use absolute depth in meters to inference by this repo

from imagebind.

zhang-ziang commented on August 24, 2024

I tried this code, it seems to work fine.

I tried the code, but still enconter some problems, here is my code and output.

import numpy as np
import torch
from PIL import Image

sensor_to_params = {
    "kv1": {
        "baseline": 0.075,
    },
    "kv1_b": {
        "baseline": 0.075,
    },
    "kv2": {
        "baseline": 0.075,
    },
    "realsense": {
        "baseline": 0.095,
    },
    "xtion": {
        "baseline": 0.095, # guessed based on length of 18cm for ASUS xtion v1
    },
}


def convert_depth_to_disparity(depth_file, intrinsics_file, sensor_type, min_depth=0.01, max_depth=50):
    """
    depth_file is a png file that contains the scene depth
    intrinsics_file is a txt file supplied in SUNRGBD with sensor information
            Can be found at the path: os.path.join(root_dir, room_name, "intrinsics.txt")
    """
    with open(intrinsics_file, 'r') as fh:
        lines = fh.readlines()
        focal_length = float(lines[0].strip().split()[0])
    baseline = sensor_to_params[sensor_type]["baseline"]
    depth_image = np.array(Image.open(depth_file))
    depth = np.array(depth_image).astype(np.float32)
    depth_in_meters = depth / 1000.
    if min_depth is not None:
        depth_in_meters = depth_in_meters.clip(min=min_depth, max=max_depth)
    disparity = baseline * focal_length / depth_in_meters
    return torch.from_numpy(disparity).float()

# ...

device = "cuda:0" if torch.cuda.is_available() else "cpu"
model = imagebind_model.imagebind_huge(pretrained=True)
model.eval()
model.to(device)

with torch.no_grad():
    for dep_file in tqdm(depth_files):
        sensor_type = ...
        disparity = convert_depth_to_disparity(dep_file, sensor_type, min_depth=0.01, max_depth=50).unsqueeze_(dim=0).to(device)
        print(disparity.shape)
        # Load data
        inputs = {
            ModalityType.DEPTH: disparity,
        }
        embeddings = model(inputs)

the print output: torch.Size([1, 530, 730])
imagebind throw an error:

RuntimeError: Given normalized_shape=[384], expected input with shape [*, 384], but got input of size[384, 45, 33]

Do you have any idea? Or could you share your code? Thanks a lot. :)

from imagebind.

How to load and transform depth data? about imagebind HOT 4 CLOSED

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent