Software implementation to detect impurity in tabacco, with 2 different methods (color histogram + SVM or XGBoost).
Author: Banghua Zhu, [email protected]
Liren Chen, [email protected]
Jinhui Song, [email protected]
ImpurityFinder is a image processing algorithms based solution for detecting impurity in tabacco. This is one of the projects for course Statistical Signal Processing in Tsinghua University.
Please notice that in order to run our algorithm, you only need to download the test folder. The train folder might be large because of the segmented images.
Easy Start:
One can go directly into test folder and run process_svm.sh or process_xgboost.sh after all the dependencies are installed.
Here we provide two packages of ImpurityFinder. In training folder, we provide all the source code for training. And in test folder, our trained classifier is provided to take a tabacco image as input, and give a processed image as output.
- MATLAB R2015b (May also run on other versions, but only MATLAB R2015b is tested.)
- Python 2.7
- OpenCV lib for Python (Homepage: http://opencv.org/ Tutorial for installation: http://docs.opencv.org/master/df/d65/tutorial_table_of_content_introduction.html)
The following python packages are necessary:
- numpy
- scipy
- sklearn
- skimage
- matplotlib
- xgboost
It may be easy for one to get the first 5 packages installed. For xgboost, please see https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_For_Anaconda_on_Windows?lang=zh and the source code is on https://github.com/dmlc/xgboost)
In this part, we use Segment.m
to segment the image into patches of size 250ร250, label them 1 if they're in a bounding box of certain colors, and 0 otherwise, and save all the segmented pieces into patch/out/ (or test) folder.
Then one of xgboosttrain.py
, svmtrain.m
runs and saves the trained model into model folder. rocplot.py
is used to plot the ROC curve.
P.S. We've also tried GAN based deep learning methods, and we present the code in folder train\deep
. However, we didn't report this work because the auc performance of this is not as good as xgboost and SVM.
In this part, we use the trained model from Training part to test on certain images. Take ..\image\20161121-04.bmp
as an example (we didn't put the image into folders because they're so large), the procedure can be done in command line as follows (Note that in windows, rm -r segmenteds
should be replace with del /F /S /Q segmented
):
matlab -nosplash -nojvm -nodesktop -r img2segment('..\image\20161121-04.bmp')
python xgboostclassify.py # This can be replaced with matlab -nosplash -nojvm -nodesktop -r svmtest.m
matlab -nosplash -nojvm -nodesktop -r segment2img('..\image\20161121-04.bmp')
rm -r segmented
Note that this set of command line can only deal with one image each time, and remember to delete the segmented folder before processing a second image. The .sh file for linux shell and .bat file for windows command line is provided. One only needs to change the variable 'filename' in the command line file to run ImpurityFinder on different images.
Most of the analysis of results can be seen in the pdf report. In train folder, we save the false positive rate and true positive rate as falsepos.npy and truepos.npy. One can utilize rocplot.py to read them and plot the corresponding ROC curve for our classifier.