kujason / avod Goto Github PK
View Code? Open in Web Editor NEWCode for 3D object detection for autonomous driving
License: MIT License
Code for 3D object detection for autonomous driving
License: MIT License
the test set of KITTI dataset did not provide labels, How did you get the AP in your ablation experiment?
I was wondering if anyone tried avod with other dataset. Also, if anyone can give instruction to setup avod with new dataset it would be great. Thank you.
Anyone who knows how to solve this problems when I compile protoc from the avod folder. Thanks.
avod/protos/kitti_utils.proto:24:5: Expected "required", "optional", or "repeated".
avod/protos/kitti_utils.proto:24:25: Missing field number.
avod/protos/kitti_dataset.proto: Import "avod/protos/kitti_utils.proto" was not found or had errors.
avod/protos/kitti_dataset.proto:39:14: "KittiUtilsConfig" is not defined.
Hey, my graphics card only has 2GB of VRAM is there anyway I can change the batch size for the training to work. I always get an error saying tensorflow ran out of memory. I've changed some settings in the configs but can't seem to get it to work. Or can someone maybe please upload a pretrained model I just wanted to test a few things.
Thanks
Read disk (image, calib, ground, pc) time:
Min: 0.01494
Max: 0.03513
Mean: 0.02009
Median: 0.02044
Load sample time:
Min: 0.05957
Max: 0.18219
Mean: 0.08599
Median: 0.07644
Fill anchor time:
Min: 0.0688
Max: 0.16987
Mean: 0.0908
Median: 0.07938
Preprocessing time profiled as above, much larger more than 0.02s, I don't think my cpu is super weak, do you have any suggestion ? For example, fill anchor time is so expensive ?
Hello,
Thank you for releasing the code to your paper!
"Background anchors are determined by calculating the 2D IoU in BEV between the anchors
and the ground truth bounding boxes. For the car class, anchors with IoU less than 0.3 are considered background anchors, while ones with IoU greater than 0.5"
I am trying to figure out how you overcame the problem of IoU calculation for non-axis aligned rectangles to determine negative and positive anchor predictions. The calculation uses 2 box_list objects.
Could you please point me towards the box_list generation for the ground truth labels. Or help me to understand the process with a few words about the content of these box_lists.
Is it an IoU-calculation between axis aligned bounding boxes around the ground truth box and anchor prediction boxes ?
Regards,
Johannes
"car_detection_3D AP: 82.047119 67.536583 66.807381" top performance on iteration 39000, this is the top ranked by the given script, there is still a gap on the moderate, I leave the config by default to run "avod_cars_examples.config", is there anything I am missing ?
gen_mini_batches.py
generate anchor_info for both training and validation data with ground truth label. However, how could we generate anchor_info for the testing data without a ground truth label?It would be great that you can clarify that procedure.
thanks!
Hi, thanks for sharing your code. I just finished the training and wonder how to do inference on our own data. I am looking into the evaluator.py, but I would be grateful if you could give me some hints. Thanks~
Width 4
Height 1
-5.445015e-02 -9.985028e-01 5.238001e-03 1.661760e+00
the four number stand for what?
thx
Hi,
Thanks for your great codes.
I run your codes with pyramid_cars_with_aug_example configuration, and final get the AP about (84.0, 74.0, 67.7) on val split, which is comparable with the results on your paper.
But when I use the best model to run on the test split, and I only get AP 56.4 on moderate split from the official testing service.
So do you know where is the problem?
Thanks very much.
Hi,
How much GPU memory do we need to train with your codes?
When training with your codes, the number of anchors among different images are different due to the distribution of lidar points, so will the GPU memory change in different batch?
Actually when I do the training, I found the GPU memory cost is always about 4GB, where I am curious.
Thanks very much.
Can some one explain to me with details the difference between the 3 types of fusion( deep , early , late)?
Hi, I successfully trained the avod to 120000 iterations, but when I ran the evaluator&inference script, they both stopped when processing sample 002908, and the wrong was like
libpng error: Read Error
Traceback (most recent call last):
File "avod/experiments/run_evaluation.py", line 130, in
tf.app.run()
File "/home/prp/anaconda2/envs/py35tf13/lib/python3.5/site-packages/tensorflow/python/platform/app.py", line 48, in run
_sys.exit(main(_sys.argv[:1] + flags_passthrough))
File "avod/experiments/run_evaluation.py", line 126, in main
evaluate(model_config, eval_config, dataset_config)
File "avod/experiments/run_evaluation.py", line 83, in evaluate
model_evaluator.repeated_checkpoint_run()
File "/home/prp/chrisli/myavod/avod/avod/core/evaluator.py", line 460, in repeated_checkpoint_run
self.run_checkpoint_once(checkpoint_to_restore)
File "/home/prp/chrisli/myavod/avod/avod/core/evaluator.py", line 199, in run_checkpoint_once
feed_dict = self.model.create_feed_dict()
File "/home/prp/chrisli/myavod/avod/avod/core/models/avod_model.py", line 655, in create_feed_dict
feed_dict = self._rpn_model.create_feed_dict()
File "/home/prp/chrisli/myavod/avod/avod/core/models/rpn_model.py", line 643, in create_feed_dict
shuffle=False)
File "/home/prp/chrisli/myavod/avod/avod/datasets/kitti/kitti_dataset.py", line 424, in next_batch
samples_in_batch.extend(self.load_samples(np.arange(start, end)))
File "/home/prp/chrisli/myavod/avod/avod/datasets/kitti/kitti_dataset.py", line 277, in load_samples
rgb_image = cv_bgr_image[..., :: -1]
TypeError: 'NoneType' object is not subscriptable
The system is Ubuntu 16.04 with python3.5 and tensorflow1.3.0.
Thanks for your help.
where can i find the version of avod-ssd?
thanks
During testing, how can we determine the best model to be used ?
For example, to repeat the results on the KITTI leading board, how can you determine the model to be used ? Do you just use the last model when 120,000 iterations are finished ?
Excuse me for interrupting you. I`m trying to use your code. When I start training, i cannot find the file named "planes",however i had been downloaded all Dataset from "http://www.cvlibs.net/datasets/kitti/eval_object.php?obj_benchmark=bev".
Could you please tell me the Website where i can download the "planes", thank you so much.
Hi
The project provided a planes directory for model's input. I'm a little curious about the way it was generated and It seems cannot be found in the paper. Could you please suggest some related resources? Thanks!
@kujason
Hi, thank you for sharing the code. Ifollowed all the instructions and now training for CAR begins. Without modifying any code and using default config, training loss seems not right. It fluctuates and something loss are like this. Do you have any idea why this is happening?
Step 500, Total Loss 2.645, Time Elapsed 6.096 s
Step 550, Total Loss 3.583, Time Elapsed 5.431 s
Step 560, Total Loss 11.722, Time Elapsed 5.333 s
Step 570, Total Loss 3.723, Time Elapsed 5.495 s
Step 580, Total Loss 1.895, Time Elapsed 5.473 s
Step 590, Total Loss 15.548, Time Elapsed 5.860 s
Step 600, Total Loss 2.417, Time Elapsed 5.647 s
Hi,
Thank you for making your code available.
I follow your instruction in the front page but have encountered a syntax error while invoking the gen_mini_batches.py
script as below.
Would you mind letting me know what I could do wrong?
tuan@mypc:~/avod$ python scripts/preprocessing/gen_mini_batches.py
Traceback (most recent call last):
File "scripts/preprocessing/gen_mini_batches.py", line 6, in <module>
from avod.builders.dataset_builder import DatasetBuilder
File "avod/avod/builders/dataset_builder.py", line 169
new_cfg=None) -> KittiDataset:
^
SyntaxError: invalid syntax
Code retrieved in 5/11/2018.
Python version
Python 2.7.12 (default, Dec 4 2017, 14:50:18)
Kitti data structure as instructed as:
Download the data and place it in your home folder at ~/Kitti/object
tuan@mypc:~/Kitti/object$ tree -L 2
.
├── training
│ ├── calib -> /opt/dataset/KITTI_3D/calib/training/calib
│ ├── image_2 -> /opt/dataset/KITTI_3D/image_2/training/image_2
│ ├── planes
│ └── velodyne -> /opt/dataset/KITTI_3D/velodyne/training/velodyne
├── train.txt
├── trainval.txt
└── val.txt
Ubuntu 16.04 LTS
GPU: nVidia 1080 Ti
Regards,
Tuan
Hi
Firstly, much thanks for your code release.
I successfully trained the avod-cars network to 120000 iteration, however, when i run the evaluation command:
python avod/experiments/run_evaluation.py --pipeline_config=avod/configs/pymarid_cars_with_aug_example.config --device='0' --data_split='val'
the result on 120000 iteration is as follows:
120000
done.
car_detection AP: 22.552376 24.332737 25.962851
car_detection_BEV AP: 22.371897 23.603966 25.545847
car_heading_BEV AP: 22.340508 23.495865 25.304886
car_detection_3D AP: 21.757717 19.721174 20.458136
car_heading_3D AP: 21.725494 19.660765 20.347580
which is much lower than the results in the paper(more than 70% basically). And I also run the evaluation on the checkpoint 110000 iteration, which is (26.233753 23.378477 27.482744) on car_detection_3D AP performance.
First of all, congratulations on your good paper and code release.
According to your paper, many problems that existed in existing 2d box recognition have been solved. But I am curious. Why is the 2d box recognition rate not ranked at the top from kitti? ( Compared to methods that do not use point clouds)
I am curious to see if there is any limit to this lidar technique about 2d detections.
Hi Team,
Thanks to your instruction, I am able to run your code to train, evaluate and inference on the Kitti dataset.
After running the demo generation for 2d image demos/show_predictions_2d.py
, I see lots of green bounding boxes as image below. Would you mind letting me know the color coding invention you are using? What is the difference between yello and red?
And, importantly, how could I disable those green boxes?
In addition, I am wondering if you have a script to generate the demo for 3d point cloud also. Any hint would be greatly appreciated.
Thank you,
To visualize these results, after running demos/show_predictions_2d.py. I only get two same images without 3D bounding box.
My system is:
ubuntu 16.04
Tensorflow 1.6
CUDA 9
Cudnn 7
Who can help me? Thanks in advance!
Firstly, thanks for sharing your work and codes, it definitely helps me a lot. After carefully comparing your work and MV3D, I have a few questions for the comparison result:
How can you get the 0.7-3D-IoU AP of validation data set (i.e. 83.87% 72.35% 64.56% in Table I) for MV3D? I did not find the same results presented in the original MV3D paper.
Actually MV3D also exploits 2x or 4x deconv operations to upsampling the last feature map to handle extra-small objects, though it cannot get a full size as you did. Therefore, except the different upsampling methods for feature map, could you help me to point out the major difference between these two works?
Hello ,Thank you for sharing the code of this great work
Why did you choose to use VGG16 as encoder instead of ResNet ?
Are you using Batch normalization ?
Another question how can we calculate the distance to the obstacles using the Lidar Points representation ?
Thank you
First of all, thank you for sharing your brilliant work.
When setting "overwrite_checkpoints: False" in the config, it means if you stop at any checkpoint and later you want to train with more iterations it will start from the last checkpoint you saved.
Here are my questions,
I can run all of the instructions, however, I'm not sure how to generate results on the testing dataset for Kitti benchmark submission.
Could you tell me how to do it?
thank you for sharing your code
When I ran 'python3 gen_mini_batches.py', I got the problem. Who knows the reason?
I am trying to reproduce the results in the paper, the ones in Table 1, but I can't. I keep getting values up to 8% less than the ones in the paper. So which configuration files (for preprocessing and training) and thresholds in the evaluation script to use to reproduce the values in the paper ? Or are there any modifications to the existing configurations (preprocessing and training) to do that ?
Your help would be much appreciated, I'm working on my masters thesis :)
Hi, thank you for sharing your code, we are doing relative research and it's very helpful for us!
Now we train the network using pyrimid_cars_with_aug_example.config, with fusion type "deep". After the validation, the best car_detection_3D results appears at around the iteration 52500 checkpoint, which is:
car_detection_3D : [52500, 120000, 70000, 100000, 102500]
In 52500 checkpoint, the result is:
car_detection_3D AP: 84.555695 74.843224 68.156281
Moreover, the result vibrates at the later checkpoints, even decrease to (car_detection_3D AP: 77.260490 68.040474 67.316147) at 117500 checkpoint, as you can see in the attachment figure.
Since it is our first time to train such a large network, my question is: is it a normal thing to have the best performance at 52500 checkpoint? Can we say that the result has already converged? Thanks and looking forward to your reply!
(py35) yanchao@yanchao:~/avod$ python scripts/preprocessing/gen_mini_batches.py
Clustering labels 1 / 3712Traceback (most recent call last):
File "scripts/preprocessing/gen_mini_batches.py", line 199, in
main()
File "scripts/preprocessing/gen_mini_batches.py", line 120, in main
car_dataset_config_path)
File "/home/yanchao/MyProjects/avod/avod/builders/dataset_builder.py", line 154, in load_dataset_from_config
use_defaults=False)
File "/home/yanchao/MyProjects/avod/avod/builders/dataset_builder.py", line 191, in build_kitti_dataset
return KittiDataset(cfg_copy)
File "/home/yanchao/MyProjects/avod/avod/datasets/kitti/kitti_dataset.py", line 131, in init
self.kitti_utils = KittiUtils(self)
File "/home/yanchao/MyProjects/avod/avod/datasets/kitti/kitti_utils.py", line 59, in init
self.label_cluster_utils.get_clusters()
File "/home/yanchao/MyProjects/avod/avod/core/label_cluster_utils.py", line 194, in get_clusters
img_idx)
File "/home/yanchao/MyProjects/avod/wavedata/wavedata/tools/obj_detection/obj_utils.py", line 125, in read_labels
obj.truncation = float(p[1])
ValueError: could not convert string to float: "b'0.00'"
Hi, thanks for your sharing. I have trained and validated your network in kitti object train/val split, it works great.
However, when I test the network on kitti raw dataset, it gives out the below results. Only the right side objects are detected in the whole sequence.
So any clues for pointing out the possible reason?
If I run this code remotely using SSH do I need to change some thing in the code or configuration ?
Hi!
Thank you for sharing your excellent work. In the paper there is a row in Table III where you use BEV only features (RPN BEV Only). I am interested in using this network, is there a configuration file available for this?
Also, would it be possible for you to share the trained models?
The preprocessing time is like:
Feed dict time:
Min: 0.12029
Max: 0.24133
Mean: 0.14371
Median: 0.14227
much larger than 0.02s reported ?
Which version of "protoc" are you using in this project?
I got below error
avod/protos/kitti_utils.proto:24:5: Expected "required", "optional", or "repeated".
avod/protos/kitti_utils.proto:24:25: Missing field number.
avod/protos/kitti_dataset.proto: Import "avod/protos/kitti_utils.proto" was not found or had errors.
avod/protos/kitti_dataset.proto:39:14: "KittiUtilsConfig" is not defined.
This command fails, with unknown publickey error from the wavedata submodule:
git clone https://github.com/kujason/avod --recurse-submodules
But this repo can correctly be cloned within the just cloned AVOD repo with:
cd avod
git clone https://github.com/kujason/wavedata
Maybe some setting is incorrect somewhere?
Thanks for sharing your code!
I have followed the procedure in the README and trained the model after 53000 step, So I made an experiment using the evaluator, below is the output
Step 53000: 450 / 3769, Inference on sample 001021
Step 53000: Eval RPN Loss: objectness 0.149, regression 0.095, total 0.244
Step 53000: Eval AVOD Loss: classification 0.038, regression 1.809, total 2.091
Step 53000: Eval AVOD Loss: localization 1.310, orientation 0.499
Step 53000: RPN Objectness Accuracy: 0.95703125
Step 53000: AVOD Classification Accuracy: 0.9880478087649402
Step 53000: Total time 0.577916145324707 s
Step 53000: 451 / 3769, Inference on sample 001022
Step 53000: Eval RPN Loss: objectness 0.026, regression 0.094, total 0.119
Step 53000: Eval AVOD Loss: classification 0.019, regression 0.942, total 1.080
Step 53000: Eval AVOD Loss: localization 0.897, orientation 0.045
Step 53000: RPN Objectness Accuracy: 0.9921875
Step 53000: AVOD Classification Accuracy: 0.9970443349753695
Step 53000: Total time 0.24765753746032715 s
Step 53000: 452 / 3769, Inference on sample 001025
Step 53000: Eval RPN Loss: objectness 0.172, regression 0.175, total 0.347
Step 53000: Eval AVOD Loss: classification 0.044, regression 3.892, total 4.282
Step 53000: Eval AVOD Loss: localization 3.537, orientation 0.354
Step 53000: RPN Objectness Accuracy: 0.970703125
Step 53000: AVOD Classification Accuracy: 0.9950884086444007
Step 53000: Total time 0.2989237308502197 s
Step 53000: 453 / 3769, Inference on sample 001026
I think the inference time(around 0.3s) is slow compared with 100ms claimed in the paper, any suggestion? I'm using a 1080Ti GPU.
After training and running the evaluation I can't find the results file
FileNotFoundError: [Errno 2] No such file or directory: 'results/avod_cars_example_results_0.1.txt'
so I am wondering what is the problem?, your help will be much appreciated because I am working on my master's Thesis
how to calculate run time on kitti is it the same inference time min on every check point?
Hi, in kitti_utils.get_point_cloud, it returns the lidar points which are projected onto camera coordinate. Why don't just return the raw lidar data? Thanks~
I am training this model and I have a slow gpu I wonder if I will have the some network performances if I cut the training to 9000 or 8000 with step of 50 ?if there any harm in that ?
because training + evaluation takes almost 54 hours for me
another question why there is no max element wise fusion option for RPN model and how "late" and "deep" fusion types perform in comparison to "early" fusion?
Thank you
I encounter the problem "avod/protos/kitti_utils.proto:24:5: Expected "required", "optional", or "repeated" when I execute "sh avod/protos/run_protoc.sh". I'm not familiar with protoc. Is there something wrong with the version of my protoc? (version is 2.5.0)
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.