Git Product home page Git Product logo

sdd-utils's Introduction

SDD-Utils

A couple of utilities that can be used with Stanford Drone Dataset.

1. Create Pascal VOC style annotations

This utility helps to convert the videos and annotations in Stanford Drone Dataset into Pascal VOC style images and annotations.

The utility basically does three tasks in the following sequence:

  • Split the video into frames
  • Create Pascal VOC style annotation for each of the frame in the video
  • Create train-validation-test split for the frames

This will come handy when we want to test existing Object Detection algorithms on Stanford Drone Dataset. This will enable us to train a network with minimal code change as most of the Object Detection algorithms do ship code to train on Pascal VOC.

Prerequisites

  • ffmpeg library
  • python 2.7

Usage

python annotate.py 

The script assumes that the Stanford Drone Datased is unzipped in the cwd of the script inside a folder named StanfordDroneDataset. A soft link will also work. The name of the folder is configurable in the script.

Declarative definition

The videos to be processed and the number of images in train-validation-test split can be defined declaratively in the script.

The videos_to_be_processed dictionary decides which videos should be processed and what would be its contribution towards train-validation-test set.

Key points:

  • Keys in this dictionary should match the 'scenes' in Stanford Drone Dataset.
  • Value for each key is also a dictionary. The number of items in this dictionary(child), can atmost be the number of videos each 'scene'.
  • Each item in this dictionary(child) is of the form {video_number:fraction_of_images_to_be_split_into_train_val_test_set}

Example:

videos_to_be_processed = {'bookstore': {0: (.5, .2, .3), 1: (.2, .1, .1)},
                              'coupa': {0: (.5, .2, .3)}}

Here, the first two videos from bookstore scene from Stanford Drone Dataset will be split into frames, with a train-validation-test split of (.5, .2, .3) and (.2, .1, .1) respectively. Then the first video from coupa scene will be processed similarly.

Output

The Pascal VOC style annotations will be created in the location specified at destination_folder_name variable in the script. By default, it creates a folder named sdd inside the dataset folder.

Output Folder Structure:

cs17mtech01001@ubuntu:/media/sdj/Open_Datasets/StanfordDroneDataset/sdd$ ls
Annotations  ImageSets  JPEGImages  pickle_store

FAQ

  1. What will happen when I run the script multiple time?

    Ans: If the videos specified in videos_to_be_processed dictionary is already made into frames, then those videos will not be again split. The train-validation-test set will be resampled for each run.

  2. Suppose I need to have training and validation data from the first two videos of 'bookstore' scene and testing from the third video of 'deathCircle' scene, how would videos_to_be_processed dictionary look like?

videos_to_be_processed = {'bookstore': {0: (.5, .5, 0), 1: (.5, .5, 0)},
                        'deathCircle': {2: (0, 0, 1)}}
  1. How many scenes are there in Stanford Drone Dataset? How many videos are there in each scene?
SDD contains the following 'scenes' and corresponding videos:
    'bookstore'   scene contains videos: (0, 1, 2, 3, 4, 5, 6)
    'coupa'       scene contains videos: (0, 1, 2, 3)
    'deathCircle' scene contains videos: (0, 1, 2, 3, 4)
    'gates'       scene contains videos: (0, 1, 2, 3, 4, 5, 6, 7, 8)
    'hyang'       scene contains videos: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14)
    'little'      scene contains videos: (0, 1, 2, 3)
    'nexus'       scene contains videos: (0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11)
    'quad'        scene contains videos: (0, 1, 2, 3)

Citation

A. Robicquet, A. Sadeghian, A. Alahi, S. Savarese, Learning Social Etiquette: Human Trajectory Prediction In Crowded Scenes in European Conference on Computer Vision (ECCV), 2016.

Dataset is available here.

sdd-utils's People

Contributors

josephkj avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.