Git Product home page Git Product logo

actions-for-actions's Introduction

What Actions are Needed for Understanding Human Actions in Videos?

Diagnostic tools and additional visualizations from "What Actions are Needed for Understanding Human Actions in Videos?" ICCV 2017

Contributor: Gunnar A. Sigurdsson

Please see the tool/, oracles/, and error_analysis/ subdirectories for the visualization tools and attributes for Charades! Let me know if you run into any problems.

Diagnostic Tool for Charades

To develop better algorithms for understanding activities, it will help to understand how and why they work. Looking solely at a single performance number obscures away from various details about the algorithms. To facilitate this process we are releasing a set of open-source Python scripts to generate detailed visualizations of one or more algorithms. The tool operates on the same file format required for the official Charades evaluation scripts, and therefore only needs a single submission file from each algorithm for analysis and comparison. The tool contains annotations for many attributes, and any subset of those can be selected for visualization. This allows to quickly generate diagnostic plots of one or more algorithms that fit into 1/4 of a page. These plots follow the same structure as the experiments section of the paper. A list of attributes and abbreviations is presented at the end of this section. We note that this analysis could be extended to any dataset, by following our methodology outlined in the paper to compute and collect the same attributes. To highlight the usefulness of the tool we consider use cases below.

Single Algorithm Diagnostics for Classification and Localization

First we visualize how individual baselines would look under the diagnostic. Output from the diagnostics tool is presented for multiple baselines on Charades. The same can be done for any algorithm that can do both classification and localization.

Diagnostic Output on Charades includes classification and localization. 0,1,2,3 denotes the degree of each continuous attribute and N,Y denotes if a binary attributes is present (Yes) or absent (No). Dashed lines indicate the overall performance of each algorithm. Blue line is classification and Brown line is localization performance.

If you want your algorith to appear here, you are welcome to send me an email with links to your classification and/or localization submission files on the Charades test set!

Two-Stream (Simonyan & Zisserman 2014)

Two-Stream

IDT (Wang & Schmid 2013)

Two-Stream

CNN+LSTM (Ng et al. 2015)

Two-Stream

ActionVLAD (Girdhar et al. 2017)

Two-Stream

Asynchromous Temporal Fields (Sigurdsson. 2017) (Link)

Two-Stream

Multiple Algorithm Comparison

To understand the difference between multiple algorithms, or different versions of the same algorithm, the tool can also be used to visualize many different algorithms. A condensed summary of the plots in the paper are presented in below for classification on Charades.

Algorithm comparison

List of Attributes and Their Abbreviations

Below we list the abbreviations used in the plots and their meanings. The continuous attributes are grouped into varying degrees of the attribute where 1 is the attribute is low and 4 the attribute is high.

  • #Samples: The number of occurrences of an action across the training videos.
  • #Frames: The number of frames where the activity is taking place.
  • Novel: Smallest edit distance between the sequence of activities in a test video and any video in the training set, and normalize by subtracting the average distance for all videos with that number of activities.
  • NewActor: The subject/environment in the video was not seen in the training set.
  • Extent: The average temporal extent of activities in each category in the training set.
  • Extent-v: The average temporal extent of activities in each video.
  • Seq: Is the category commonly part of a sequence or does it happen in isolation, such as "holding a cup" compared to "running".
  • Short: Is the activity a brief activity such as "putting down cup".
  • Passive: Does the activity happen in the background such as "sitting on a couch".
  • AvgMotion: Average motion in each instance of the category on average.
  • AvgMotion-v: Average motion in a given video by looking at the amount of optical flow in the video.
  • PersonSize: The average size of a person in a video as measured by the highest-scoring per-frame Faster-RCNN bounding box detection.
  • People: Whether there are more than one person in the video.
  • PoseVar: The average Procrustes distance between any two poses in the category.
  • PoseRatio: The ratio between the average pose variability within an instance and the pose variability for the category in general.
  • Obj: If the category involves an object such as “drinking from a cup” compared to "running"
  • #Obj: How many categories share an object with the given category
  • #Verbs: How many categories share an verb with the given category
  • Interact: If the activity involves physically interacting with the environment.
  • Tool: The activity involves the use of a tool for interacting with the environment.
  • #Actions: The number of activities that occur in a video.

actions-for-actions's People

Contributors

achalddave avatar gsig avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.