Git Product home page Git Product logo

3d-bounding-boxes-from-monocular-images's Introduction

Predicting 3D Bounding Boxes for Non-temporal Monocular Images

3D Bounding Box Prediction From Monocular Image

Utilising a two stage multi-modal loss model along with rigid body transformations to regress 3D bounding boxes, orientations and coordinates from monocular images

File Structure

.
├── Kitti                           # Kitti Data Folder
        ├── camera_cal              # Store global calibration matrix
        ├── results                 # Training and Validation results
        ├── training                # Location of training data 
                ├── image_2         # Downloaded from KITTI's website 
                ├── calib           # Downloaded from KITTI's website
                └── label_2         # Downloaded from KITTI's website
        └── validation              # Location for validation/infference images
                ├── image_2         # Images you want to perform inferrence/validation on
                ├── calib           # Corresponding labels
                └── label_2         # Corresponding labels
├── lib                             # python functions and scripts
├── weights                         # store trained weights, yolo weights
├── yolo                            # yolo network scripts
├── Eval.py                         # script to evaluate quantitatively and qualitatively
├── Train.py                        # training script 
├── Run.py                          # run the model and see the visual examples
└── Run_with_2D.py                  # run the 2 stage model with yolonet and see the visual examples 
python run_with_2D.py

This script utilises the a yolo network to detect pedestrians and cars from and images. The 2D detections are crop out and fed it into the model so that it can predict a 3D bounding box for each of the 2D detections

The script will plot the results of the 3D bounding box onto the image and display it using cv2.show, press the space bar in order to move on to the next image

Images to be evaluated should be placed in Kitti/validation/image_2

FLAGS: --val-img-path Please specify the path to the images you wish to evaluate. Path default is Kitti/validation/image_2/

--calb-path Please specify the path containing camera calibration obtained from KITTI. Path default is Kitti/camera_cal/

--show-2D Shows yolonet's 2D BoundingBox detections of in a seperate image alongside the 3D regressed boxes

python run.py

This script utilises the ground truth label's 2D bounding box to crop out the the points of interest and feed it into the model so that it can predict a 3D bounding box for each of the 2D detections

The script will plot the results of the 3D bounding box onto the image and display them alongside the groundtruth image and it's 3D bounding box. This is to help with qualitative assesment.

Images to be evaluated should be placed in Kitti/validation/image_2

FLAGS:

--hide-imgs Hides Display of ground truth and bounding box

python Train.py --exp-no 5

This script trains the model using cropped images of cars and pedestrians obtained from the labels. The model produces 3 outputs the orientation,dimension and it's confidence of the 3D bounding box based on the 2D image.

The script requires you to specifiy an experiment no, so that it can create all the relevant folders to track metrics and weights in an ordered fashion.

For each epoch the weights will be saved to the weights folder with the exp_no as the subfolder

For each epoch the training metrics for the orientation loss,dimension loss, theta loss, overall interation loss, overall epoch loss is saved to help with debugging and monitor the convergence

ARGUMENTS:

--epochs type = int default = 2 No of training epochs

--exp-no type = int Experiment No. so we can save all the metrics and weights related to this experiment

Eg. python Train.py --exp-no 8

To download the YOLO Weights

Download the weights:

cd weights/
./get_weights.sh

3d-bounding-boxes-from-monocular-images's People

Contributors

sk-aravind avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.