Git Product home page Git Product logo

predict-car-speed's Introduction

Learn and Predict Vehicle Speed From A Video

highway

Problem Definition

Given a 17:00 minutes video (20 fps), there's one label per frame which tells the vehicle velocity
Given a 9:00 minutes video (20 fps), inference the vehicle velocity for each frame

Dataset Exploration

data/train.mp4 is a video of driving containing 20400 frames. Video is shot at 20 fps.
data/train.txt contains the speed of the car at each frame, one speed on each line.

data/test.mp4 is a different driving video containing 10798 frames. Video is shot at 20 fps.
Your deliverable is test.txt

Train Dataset

  • 0:00 - 12:30: highway (12 min 30 sec)
  • 12:31 - 17:00: street (4 min 30 sec) plot_train_speed

AlexLSTM (2D CNN + LSTM)

clstm

[!] Model Summary:
AlexLSTM (
  (conv): Sequential (
    (0): Sequential (
      (0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
      (1): ReLU (inplace)
      (2): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1))
      (3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
      (4): ReLU (inplace)
      (5): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1))
      (6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (7): ReLU (inplace)
      (8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (9): ReLU (inplace)
      (10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (11): ReLU (inplace)
      (12): MaxPool2d (size=(3, 3), stride=(2, 2), dilation=(1, 1))
    )
  )
  (lstm): LSTM(12288, 1600, num_layers=3, dropout=0.3)
  (fc): Sequential (
    (0): Linear (1600 -> 512)
    (1): ReLU ()
    (2): Dropout (p = 0.2)
    (3): Linear (512 -> 64)
    (4): ReLU ()
    (5): Dropout (p = 0.2)
    (6): Linear (64 -> 1)
  )
)

The Dimension Between Each Layer

CNN Layer:
-> (batch_size, 20, hh, ww, 3) 
-> AlexNet -> (batch_size, 20, hh', ww', filter_size) 
-> flatten -> (batch_size, 20, 12288)

LSTM Layer: 
# 20 images per batch
# get the last 19 cells output (LSTM[1:]) from the last layer of LSTM
# as we can't predict speed for the 1st image because it is lack of context (no image before the 1st one)
-> (batch_size, 20, 12288) 
-> LSTM -> (batch_size, 19, 1600)  

Fully Connected Layer:
-> (batch_size, 19, 1600) 
-> fc_layer_1 -> (batch_size, 19, 512) 
-> fc_layer_2 -> (batch_size, 19, 64) 
-> fc_layer_3 -> (batch_size, 19, 1) 

Result

https://github.com/kingxueyuf/predict-car-speed/blob/master/test.txt

predict-car-speed's People

Contributors

kingxueyuf avatar

Stargazers

trafficplayer avatar Meixin avatar

Watchers

James Cloos avatar  avatar

Forkers

zwx1616

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.