Git Product home page Git Product logo

autolay's Introduction

AutoLay

AutoLay: Benchmarking Monocular Layout Estimation

Kaustubh Mani, N. Sai Shankar, J. Krishna Murthy, and K. Madhava Krishna

For more details, and to download the full dataset, please visit https://autolay.github.io/

Abstract

In this paper, we tackle the problem of estimating the layout of a scene in bird’s eye view from monocular imagery. Specifically, we target amodal layout estimation, i.e., we estimate semantic labels for parts of the scene that do not even project to the visible regime of the image. While prior approaches to amodal layout estimation focused on coarse attributes of a scene(roads, sidewalks), we shift our attention to generate amodal estimation for fine-grained atrributes such as lanes, crosswalks, vehicles, etc. To this end, we introduce AutoLay, a new dataset for amodal layout estimation in bird’s eye view. AutoLay includes precise annotations for (amodal) layouts for 32 sequences from the KITTI dataset. In addition to fine-grained attributes such as lanes, sidewalks, and vehicles, we also provide detailed semantic annotations for 3D pointclouds. To foster reproducibility and further research in this nascent area, we open-source implementations for several baselines and current art. Further, we propose VideoLayout, a real-time neural net architecture that leverages temporal information from monocular video, to produce more accurate and consistent layouts. VideoLayout achieves state-of-the-art performance on AutoLay, while running in real-time (18 fps).

Dataset

We use 32 video sequences from the KITTI Raw dataset in AutoLay. We provide per-frame annotations in perspective, orthographic (bird’s eye view), as well as in 3D. Of the 32 annotated sequences, 17 sequences-containing 7414 images—are used for training. The other 15 sequences—comprising 4438 images—form the test set. This makes for nearly 12K annotated images, across a distance of 9.5 Km, and a variety of urban scenarios (residential, urban, road). The semantic classes considered in this dataset are road, sidewalk, vehicle, crosswalk, and lane. Each lane segment is provided a unique id, which we classify further. The lane class is further classified as ego-lane and other lane. We also have an other road class for road areas that do not fall under any of the above categories.

Sample dataset can be downloaded from here. To download the full dataset, please visit our webpage.

Benchmark

We provide a comprehensive benchmark of all the state-of-the-art methods for layout estimation on Autolay.

Results

Road Layout Estimation

Vehicle Layout Estimation

Lane Layout Estimation

autolay's People

Contributors

manila95 avatar krrish94 avatar

Stargazers

Sunny Lin avatar Song Wang avatar vasgaowei avatar Zhe Zhao avatar  avatar Ryan Zhao avatar  avatar  avatar  avatar  avatar Jay Sin avatar cuiyb avatar  avatar Leheng Li avatar  avatar  avatar 爱可可-爱生活 avatar Robin Ross avatar PDC avatar Shigeki Kobayashi avatar vtpp avatar Hou Lin Jie avatar  avatar Wenbo avatar Jiaqi Gu avatar  avatar HanRadioDXer avatar CHIEH avatar Ellis avatar MagicSource avatar  avatar  avatar Haimei Zhao avatar mxlol233 avatar  avatar Avinash Prabhu avatar  avatar  avatar Adam Forbes avatar  avatar

Watchers

 avatar Nicolas Granger avatar  avatar  avatar Haimei Zhao avatar

autolay's Issues

dataset labels to image projection

Hello.
Firstly I'd like to thank you for sharing your work with dataset.
But it seems that released data is not complete: there is an instruction how to get raw images for provided labels (its clear enough) but information about relations between image and annotations planes is missed.
Proposed in dataset labels are called "bird eye view" (BEV) labels, so homography between image plane and labels plane should exist.

Is there a homography matrix or any principal points (f. ex. to use cv2.findHomography) or anything else that define relations between labels and images?
Could you, please, provide transformation algorithm from labels plane to image plane?

There is a gif in readme where the annotations are projected into the images. Is there a way to reproduce this for the whole dataset?

Best regards.

Not found eval.py

Hi, maybe forgot to upload “eval.py”, run “python train.py”

Traceback (most recent call last):
File "train.py", line 19, in
from eval import evaluate_layout
ModuleNotFoundError: No module named 'eval'

Problems about the dataset

Hi, thanks a lot for sharing your interesting and great work. We downloaded the dataset at https://autolay.github.io/index.html and there are some problems.

  1. We don't seem to find the label of 'Argoverse split';
  2. We don’t seem to find the label of 'KITTI split', what is the relationship between it and 'KITTI raw split', and how to download the RGB image about 'KITTI split';
  3. We don't seem to find the Sidewalk category of 'KITTI raw split';
  4. How many categories are there in lane labels, and what are the category IDs of ego-lane and other-lane respectively;
  5. We did not find ‘eval.py’ in the Github repository.

I'm looking forward for your reply

Code and dataset availability

Nice work. Congratulation on your paper acceptance at IROS conference. Any time-line when code and Autolay dataset will be released?
Thanks

release of the entire dataset

Hi, thanks a lot for giving an explicit definition of Amodal layout estimation for autonomous driving. It's an interesting and practical task. I'd like to learn more about this task. When will you release the entire dataset? It will be very helpful.

how many decoder needed

In my understanding, the layout for each class needs a specific decoder, right? So it is a binary classification problem for each class?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.