Large-scale monocular dataset with 3D Normalized Object Coordinate Space maps for effective 3D annotation

Data format

Each frame in OmniNOCS provides instance segmentations, NOCS coordinates and 3D bounding boxes in the following format:

Instance segmentations: An instance map image stored at <path_to_frame>_instances.png, as a 16-bit single-channel PNG image. Each object with a valid mask has its instance ID at each pixel in its mask. The instance ID is unique to this particular frame. Regions without a valid object index can either be background (contains 0) or unknown (contains 65535). Unknown regions should not be used for supervision / evaluation.
NOCS coordinates: A NOCS map stored at <path_to_frame>_nocs.png as a 16-bit 4-channel PNG image. The first 3 channels represent the X,Y,Z NOCS values. The last channel is a binary mask that denotes whether the pixel location contains a valid NOCS coordinate. Note that the NOCS annotations in some domains (particularly outdoor datasets such as KITTI) are sparse, and therefore all pixels within an object's instance mask need not have valid NOCS coordinates.
3D bounding box: We provide the 3D bounding box for each object in a frame in the JSON metadata file (explained below).

Each OmniNOCS <source>-<split> combination (for example KITTI-train) has its own JSON metadata file. Each JSON contains a list of per-frame metadata, with the length of the list being equal to the number of frames in the particular combination. For each frame, the metadata has the following structure:

'objects': [
  // A dict for each object in the frame.
  {
    'rotation': 3x3 canonical orientation (object to camera transformation),
    'translation': 3x1 3D translation (in meters) (in camera coordinates),
    'size': 3x1 3D size (in meters),
    'object_id': instance ID used in the instance segmentation map,
    'category': name of the object class as a string,
  }
  ...
]
'image_name': Path to the image for this frame in the original dataset.
'omninocs_name': Path to the NOCS and instance images for this frame in OmniNOCS.
'nocs_image_downscale': Scalar NOCS downscaling factor (Image resolution / NOCS map resolution), for cases where the NOCS image is smaller than the color image.
'intrinsics': {
  'fx': Focal length (x) in pixels.
  'fy': Focal length (y) in pixels.
  'cx': Principal point (x) in pixels.
  'cy': Principal point (y) in pixels.
}

Coordinate convention

We use right-handed coordinate frames for objects and cameras.

Object coordinate frame:

OmniNOCS objects have a per-category canonically oriented frames. This means that the X, Y and Z axes of all objects in the category are consistently oriented. For example, cars have the +X axis forwards, +Y axis to their left and the +Z axis pointing upwards. When objects are placed upright in the scene, their +Z axis points in the direction opposite to gravity. Examples for a few object categories are shown below:

There are a few classes with symmetries (eg: bottle or bowl) that have rotational symmetry about an axis, making the canonical orientation ambiguous. We are also aware that there are a few objects for which the orientations may not be canonical, due to occlusions or labelling error. Please report any such cases you may find, so that they can be removed or corrected.

Camera coordinate frame:

Our camera convention uses the +X axis towards right, +Y downwards, and +Z outwards from the camera. Our metadata files only contain the camera intrinsics (without extrinsics), since the object pose is already provided with respect to the camera frame.

Download

OmniNOCS provides NOCS annotations for images from other datasets. Please refer to SETUP.md for instructions to download all data and setup OmniNOCS.

Links to download OmniNOCS will be available soon!

Usage

To visualize our data and illustrate its usage, we provide a colab that downloads a small mini split of the training set and visualizes the NOCS and bounding box annotations from OmniNOCS.

License and disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

Our dataset adds annotations to images from the datasets listed below, these are made available with licenses as follows:

Dataset	License
KITTI	CC BY-NC-SA 3.0 DEED
ARKitScenes	CC BY-NC-SA 4.0 DEED
Virtual KITTI	CC BY-NC-SA 3.0 LEGAL CODE
nuScenes	CC BY-NC-SA 4.0
Hypersim	CC BY-NC-SA 3.0 DEED
NOCS-Real275	MIT
Waymo OD	Apache 2.0 (See Waymo terms)
Objectron	Microsoft C-UDA
Cityscapes 3D	CC-BY 4.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

tharun-tharun / 3d-box-annotations-objects Goto Github PK

3d-box-annotations-objects's Introduction

Large-scale monocular dataset with 3D Normalized Object Coordinate Space maps for effective 3D annotation

Data format

Coordinate convention

Download

Usage

License and disclaimer

3d-box-annotations-objects's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent