Git Product home page Git Product logo

decaptcha's Introduction

decaptcha

A CAPTCHA Breaker using k-Nearest Neighbor Classifiers, Support Vector Machines, and Neural Networks.

Dependencies

Installation

The following steps assume that you have all the dependencies. First, load in the Chars74k data set labels into memory. This could take some time because there are approximately 62,000 array elements to be loaded:

list_English_Fnt;

If this operation fails, it's likely because you're in the wrong folder. When you downloaded Chars74k, you should have also downloaded a MATLAB compatible label set, which unzips itself to the folder ListsTXT. Either add that folder to your MATLAB path, or navigate to that directory.

Running the command places a struct variable called list in your workspace, which we'll make extensive use of. Next, create a character training set from the Chars74k images:

fontChars = genFontChars(list);

fontChars is a cell array of binary image data extracted. The indices of the cell array correspond to the indices in list.ALLlabels. From here, you can use fontChars to train a classifier. Pick your poison:

% k-Nearest Neighbor
knnRecognizer = genKnnRecognizer(fontChars, list.ALLlabels);

% Multi-class SVM
svmRecognizer = genSVMRecognizer(fontChars, list.ALLlabels);

% Pattern Recognition Neural Network
nnRecognizer = genNeuralRecognizer(fontChars, list.ALLlabels, list.NUMclasses);

Note that these operations all take a non-negligible amount of time and memory. It's recommended that you perform all training operations in parallel on a workstation or cluster rather than on a consumer machine. Once it's done, you can save out the recognizers and do as you please with them.

Cracking CAPTCHAs

Disclaimer: Use this for good, for research, and for realizing that CAPTCHAs don't stop well-written algorithms from abusing systems that you create. Don't use this to do things which cause warning flags to be written to real-world server logs, don't be evil, you get the idea.

Let's assume you have an image called captcha.jpg. This image contains a CAPTCHA image to be cracked:

% k-Nearest Neighbor
result = decaptcha('./captcha.jpg', true, 'knn', knnRecognizer);

% Multi-class SVM
result = decaptcha('./captcha.jpg', true, 'svm', svmRecognizer);

% Pattern Recognition Neural Network
result = decaptcha('./captcha.jpg', true, 'nn', nnRecognizer);

result is a string containing the recognized CAPTCHA. The boolean flag passed in as the second argument determines whether noise removal should be performed after thresholding the image. In many cases this is true, but some odd categories of CAPTCHA which already have salient, cleanly segmented characters deteriorate instead. Experiment, because after all you're not really using this to crack CAPTCHAs in the wild right?

Benchmarking Performance

This assumes that you have a directory of files containing CAPTCHA test images in JPG format, located at the directory ./test.

% k-Nearest Neighbor
result = benchmarkKnn(knnRecognizer. './test');

% Multi-class SVM
result = benchmarkSVM(svmRecognizer. './test');

% Pattern Recognition Neural Network
result = benchmarkNN(nnRecognizer. './test');

result is a vector of modified Levenshtein distances between the correct answer for the CAPTCHA and the prediction. A value of 1 indicates that the match was exact. Values between 1 and 0 indicate that some or all of the characters were wrong or missing. Values that are negative indicate that, not only was everything grossly wrong, the prediction contained more characters than the answer. That's likely because the image segmentation algorithm failed to do a good job. We're guilty, but not sorry :P.

decaptcha's People

Contributors

kenlimmj avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.