Git Product home page Git Product logo

voxelnet's Introduction

Introduction

This is an unofficial inplementation of VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection in TensorFlow. A large part of this project is based on the work here. Thanks to @jeasinema. This work is a modified version with bugs fixed and better experimental settings to chase the results reported in the paper (still ongoing).

Dependencies

  • python3.5+
  • TensorFlow (tested on 1.4.1)
  • opencv
  • shapely
  • numba
  • easydict

Installation

  1. Clone this repository.
  2. Compile the Cython module
$ python3 setup.py build_ext --inplace
  1. Compile the evaluation code
$ cd kitti_eval
$ g++ -o evaluate_object_3d_offline evaluate_object_3d_offline.cpp
  1. grant the execution permission to evaluation script
$ cd kitti_eval
$ chmod +x launch_test.sh

Data Preparation

  1. Download the 3D KITTI detection dataset from here. Data to download include:

    • Velodyne point clouds (29 GB): input data to VoxelNet
    • Training labels of object data set (5 MB): input label to VoxelNet
    • Camera calibration matrices of object data set (16 MB): for visualization of predictions
    • Left color images of object data set (12 GB): for visualization of predictions
  2. In this project, we use the cropped point cloud data for training and validation. Point clouds outside the image coordinates are removed. Update the directories in data/crop.py and run data/crop.py to generate cropped data. Note that cropped point cloud data will overwrite raw point cloud data.

  3. Split the training set into training and validation set according to the protocol here. And rearrange the folders to have the following structure:

└── DATA_DIR
       ├── training   <-- training data
       |   ├── image_2
       |   ├── label_2
       |   └── velodyne
       └── validation  <--- evaluation data
       |   ├── image_2
       |   ├── label_2
       |   └── velodyne
  1. Update the dataset directory in config.py and kitti_eval/launch_test.sh

Train

  1. Specify the GPUs to use in config.py
  2. run train.py with desired hyper-parameters to start training:
$ python3 train.py --alpha 1 --beta 10

Note that the hyper-parameter settings introduced in the paper are not able to produce high quality results. So, a different setting is specified here.

Training on two Nvidia 1080 Ti GPUs takes around 3 days (160 epochs as reported in the paper). During training, training statistics are recorded in log/default, which can be monitored by tensorboard. And models are saved in save_model/default. Intermediate validation results will be dumped into the folder predictions/XXX/data with XXX as the epoch number. And metrics will be calculated and saved in predictions/XXX/log. If the --vis flag is set to be True, visualizations of intermediate results will be dumped in the folder predictions/XXX/vis.

  1. When the training is done, executing parse_log.py will generate the learning curve.
$ python3 parse_log.py predictions
  1. There is a pre-trained model for car in save_model/pre_trained_car.

Evaluate

  1. run test.py -n default to produce final predictions on the validation set after training is done. Change -n flag to pre_trained_car will start testing for the pre-trained model (only car model provided for now).
$ python3 test.py

results will be dumped into predictions/data. Set the --vis flag to True if dumping visualizations and they will be saved into predictions/vis.

  1. run the following command to measure quantitative performances of predictions:
$ ./kitti_eval/evaluate_object_3d_offline [DATA_DIR]/validation/label_2 ./predictions

Performances

The current implementation and training scheme are able to produce results in the tables below.

Bird's eye view detection performance: AP on KITTI validation set
Car Easy Moderate Hard
Reported 89.60 84.81 78.57
Reproduced 85.41 83.16 77.10
3D detection performance: AP on KITTI validation set
Car Easy Moderate Hard
Reported 81.97 65.46 62.85
Reproduced 53.43 48.78 48.06

TODO

  • improve the performances
  • reproduce results for Pedestrian and Cyclist
  • fix the deadlock problem in multi-thread processing in training
  • fix the infinite loop problem in test.py
  • replace averaged calibration matrices with correct ones

voxelnet's People

Contributors

qianguih avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.