Large-scale monocular dataset with 3D Normalized Object Coordinate Space maps for effective 3D annotation
Each frame in OmniNOCS provides instance segmentations, NOCS coordinates and 3D bounding boxes in the following format:
-
Instance segmentations: An instance map image stored at
<path_to_frame>_instances.png
, as a 16-bit single-channel PNG image. Each object with a valid mask has its instance ID at each pixel in its mask. The instance ID is unique to this particular frame. Regions without a valid object index can either be background (contains 0) or unknown (contains 65535). Unknown regions should not be used for supervision / evaluation. -
NOCS coordinates: A NOCS map stored at
<path_to_frame>_nocs.png
as a 16-bit 4-channel PNG image. The first 3 channels represent the X,Y,Z NOCS values. The last channel is a binary mask that denotes whether the pixel location contains a valid NOCS coordinate. Note that the NOCS annotations in some domains (particularly outdoor datasets such as KITTI) are sparse, and therefore all pixels within an object's instance mask need not have valid NOCS coordinates. -
3D bounding box: We provide the 3D bounding box for each object in a frame in the JSON metadata file (explained below).
Each OmniNOCS <source>-<split>
combination (for example KITTI-train
) has its
own JSON metadata file. Each JSON contains a list of per-frame metadata, with
the length of the list being equal to the number of frames in the particular
combination. For each frame, the metadata has the following structure:
'objects': [
// A dict for each object in the frame.
{
'rotation': 3x3 canonical orientation (object to camera transformation),
'translation': 3x1 3D translation (in meters) (in camera coordinates),
'size': 3x1 3D size (in meters),
'object_id': instance ID used in the instance segmentation map,
'category': name of the object class as a string,
}
...
]
'image_name': Path to the image for this frame in the original dataset.
'omninocs_name': Path to the NOCS and instance images for this frame in OmniNOCS.
'nocs_image_downscale': Scalar NOCS downscaling factor (Image resolution / NOCS map resolution), for cases where the NOCS image is smaller than the color image.
'intrinsics': {
'fx': Focal length (x) in pixels.
'fy': Focal length (y) in pixels.
'cx': Principal point (x) in pixels.
'cy': Principal point (y) in pixels.
}
We use right-handed coordinate frames for objects and cameras.
Object coordinate frame:
OmniNOCS objects have a per-category canonically oriented frames. This means that the X, Y and Z axes of all objects in the category are consistently oriented. For example, cars have the +X axis forwards, +Y axis to their left and the +Z axis pointing upwards. When objects are placed upright in the scene, their +Z axis points in the direction opposite to gravity. Examples for a few object categories are shown below:
There are a few classes with symmetries (eg: bottle or bowl) that have rotational symmetry about an axis, making the canonical orientation ambiguous. We are also aware that there are a few objects for which the orientations may not be canonical, due to occlusions or labelling error. Please report any such cases you may find, so that they can be removed or corrected.
Camera coordinate frame:
Our camera convention uses the +X axis towards right, +Y downwards, and +Z outwards from the camera. Our metadata files only contain the camera intrinsics (without extrinsics), since the object pose is already provided with respect to the camera frame.
OmniNOCS provides NOCS annotations for images from other datasets. Please refer to SETUP.md for instructions to download all data and setup OmniNOCS.
Links to download OmniNOCS will be available soon!
To visualize our data and illustrate its usage, we provide a colab that downloads a small mini split of the training set and visualizes the NOCS and bounding box annotations from OmniNOCS.
Copyright 2024 DeepMind Technologies Limited
All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0
Our dataset adds annotations to images from the datasets listed below, these are made available with licenses as follows:
Dataset | License |
---|---|
KITTI | CC BY-NC-SA 3.0 DEED |
ARKitScenes | CC BY-NC-SA 4.0 DEED |
Virtual KITTI | CC BY-NC-SA 3.0 LEGAL CODE |
nuScenes | CC BY-NC-SA 4.0 |
Hypersim | CC BY-NC-SA 3.0 DEED |
NOCS-Real275 | MIT |
Waymo OD | Apache 2.0 (See Waymo terms) |
Objectron | Microsoft C-UDA |
Cityscapes 3D | CC-BY 4.0 |
All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode
Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.