Git Product home page Git Product logo

3d-box-annotations-objects's Introduction

Large-scale monocular dataset with 3D Normalized Object Coordinate Space maps for effective 3D annotation

Data format

Each frame in OmniNOCS provides instance segmentations, NOCS coordinates and 3D bounding boxes in the following format:

  • Instance segmentations: An instance map image stored at <path_to_frame>_instances.png, as a 16-bit single-channel PNG image. Each object with a valid mask has its instance ID at each pixel in its mask. The instance ID is unique to this particular frame. Regions without a valid object index can either be background (contains 0) or unknown (contains 65535). Unknown regions should not be used for supervision / evaluation.

  • NOCS coordinates: A NOCS map stored at <path_to_frame>_nocs.png as a 16-bit 4-channel PNG image. The first 3 channels represent the X,Y,Z NOCS values. The last channel is a binary mask that denotes whether the pixel location contains a valid NOCS coordinate. Note that the NOCS annotations in some domains (particularly outdoor datasets such as KITTI) are sparse, and therefore all pixels within an object's instance mask need not have valid NOCS coordinates.

  • 3D bounding box: We provide the 3D bounding box for each object in a frame in the JSON metadata file (explained below).

Each OmniNOCS <source>-<split> combination (for example KITTI-train) has its own JSON metadata file. Each JSON contains a list of per-frame metadata, with the length of the list being equal to the number of frames in the particular combination. For each frame, the metadata has the following structure:

'objects': [
  // A dict for each object in the frame.
  {
    'rotation': 3x3 canonical orientation (object to camera transformation),
    'translation': 3x1 3D translation (in meters) (in camera coordinates),
    'size': 3x1 3D size (in meters),
    'object_id': instance ID used in the instance segmentation map,
    'category': name of the object class as a string,
  }
  ...
]
'image_name': Path to the image for this frame in the original dataset.
'omninocs_name': Path to the NOCS and instance images for this frame in OmniNOCS.
'nocs_image_downscale': Scalar NOCS downscaling factor (Image resolution / NOCS map resolution), for cases where the NOCS image is smaller than the color image.
'intrinsics': {
  'fx': Focal length (x) in pixels.
  'fy': Focal length (y) in pixels.
  'cx': Principal point (x) in pixels.
  'cy': Principal point (y) in pixels.
}

Coordinate convention

We use right-handed coordinate frames for objects and cameras.

Object coordinate frame:

OmniNOCS objects have a per-category canonically oriented frames. This means that the X, Y and Z axes of all objects in the category are consistently oriented. For example, cars have the +X axis forwards, +Y axis to their left and the +Z axis pointing upwards. When objects are placed upright in the scene, their +Z axis points in the direction opposite to gravity. Examples for a few object categories are shown below:

object coordinate frames

There are a few classes with symmetries (eg: bottle or bowl) that have rotational symmetry about an axis, making the canonical orientation ambiguous. We are also aware that there are a few objects for which the orientations may not be canonical, due to occlusions or labelling error. Please report any such cases you may find, so that they can be removed or corrected.

Camera coordinate frame:

Our camera convention uses the +X axis towards right, +Y downwards, and +Z outwards from the camera. Our metadata files only contain the camera intrinsics (without extrinsics), since the object pose is already provided with respect to the camera frame.

Download

OmniNOCS provides NOCS annotations for images from other datasets. Please refer to SETUP.md for instructions to download all data and setup OmniNOCS.

Links to download OmniNOCS will be available soon!

Usage

To visualize our data and illustrate its usage, we provide a colab that downloads a small mini split of the training set and visualizes the NOCS and bounding box annotations from OmniNOCS. colab_badge

License and disclaimer

Copyright 2024 DeepMind Technologies Limited

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

Our dataset adds annotations to images from the datasets listed below, these are made available with licenses as follows:

Dataset License
KITTI CC BY-NC-SA 3.0 DEED
ARKitScenes CC BY-NC-SA 4.0 DEED
Virtual KITTI CC BY-NC-SA 3.0 LEGAL CODE
nuScenes CC BY-NC-SA 4.0
Hypersim CC BY-NC-SA 3.0 DEED
NOCS-Real275 MIT
Waymo OD Apache 2.0 (See Waymo terms)
Objectron Microsoft C-UDA
Cityscapes 3D CC-BY 4.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

3d-box-annotations-objects's People

Contributors

tharun-tharun avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.