Git Product home page Git Product logo

impurityfinder's Introduction

ImpurityFinder

Software implementation to detect impurity in tabacco, with 2 different methods (color histogram + SVM or XGBoost).

Author: Banghua Zhu, [email protected]

Liren Chen, [email protected]

Jinhui Song, [email protected]

Introduction

ImpurityFinder is a image processing algorithms based solution for detecting impurity in tabacco. This is one of the projects for course Statistical Signal Processing in Tsinghua University.

Please notice that in order to run our algorithm, you only need to download the test folder. The train folder might be large because of the segmented images.

Easy Start:

One can go directly into test folder and run process_svm.sh or process_xgboost.sh after all the dependencies are installed.

Pipeline and Details

Here we provide two packages of ImpurityFinder. In training folder, we provide all the source code for training. And in test folder, our trained classifier is provided to take a tabacco image as input, and give a processed image as output.

Dependencies

The following python packages are necessary:

  • numpy
  • scipy
  • sklearn
  • skimage
  • matplotlib
  • xgboost

It may be easy for one to get the first 5 packages installed. For xgboost, please see https://www.ibm.com/developerworks/community/blogs/jfp/entry/Installing_XGBoost_For_Anaconda_on_Windows?lang=zh and the source code is on https://github.com/dmlc/xgboost)

Training part

In this part, we use Segment.m to segment the image into patches of size 250ร—250, label them 1 if they're in a bounding box of certain colors, and 0 otherwise, and save all the segmented pieces into patch/out/ (or test) folder.

Then one of xgboosttrain.py, svmtrain.m runs and saves the trained model into model folder. rocplot.py is used to plot the ROC curve.

P.S. We've also tried GAN based deep learning methods, and we present the code in folder train\deep. However, we didn't report this work because the auc performance of this is not as good as xgboost and SVM.

Testing part

In this part, we use the trained model from Training part to test on certain images. Take ..\image\20161121-04.bmp as an example (we didn't put the image into folders because they're so large), the procedure can be done in command line as follows (Note that in windows, rm -r segmenteds should be replace with del /F /S /Q segmented):

matlab -nosplash -nojvm -nodesktop -r img2segment('..\image\20161121-04.bmp')
python xgboostclassify.py # This can be replaced with matlab -nosplash -nojvm -nodesktop -r svmtest.m
matlab -nosplash -nojvm -nodesktop -r segment2img('..\image\20161121-04.bmp')
rm -r segmented

Note that this set of command line can only deal with one image each time, and remember to delete the segmented folder before processing a second image. The .sh file for linux shell and .bat file for windows command line is provided. One only needs to change the variable 'filename' in the command line file to run ImpurityFinder on different images.

Results:

Analysis of Results

Most of the analysis of results can be seen in the pdf report. In train folder, we save the false positive rate and true positive rate as falsepos.npy and truepos.npy. One can utilize rocplot.py to read them and plot the corresponding ROC curve for our classifier.

impurityfinder's People

Watchers

BH Z avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.