Git Product home page Git Product logo

jimut123 / captcha Goto Github PK

View Code? Open in Web Editor NEW
17.0 3.0 4.0 627.9 MB

CAPTCHA project for Machine Learning Course https://arxiv.org/pdf/2006.11373.pdf

Home Page: https://arxiv.org/abs/2006.11373

Jupyter Notebook 99.95% Python 0.04% Shell 0.01% HTML 0.01% QMake 0.01% Batchfile 0.01% C++ 0.01% Ruby 0.01% Java 0.01%
captcha deep deep-learning deep-learning-algorithms convolutonalneuralnetwork python37 keras tensorflow2 state-of-the-art-models sota-technique

captcha's Introduction


DOI

CAPTCHA

Abstract

It is increasingly becoming difficult for human beings to work on their day to day life without going through the process of reverse Turing test, where the Computers tests the users to be humans or not. Almost every website and service providers today have the process of checking whether their website is being crawled or not by automated bots which could extract valuable information from their site. In the process the bots are getting more intelligent by the use of Deep Learning techniques to decipher those tests and gain unwanted automated access to data while create nuisance by posting spam. Humans spend a considerable amount of time almost every day when trying to decipher CAPTCHAs. The aim of this investigation is to check whether the use of a subset of commonly used CAPTCHAs, known as the text CAPTCHA is a reliable process for verifying their human customers. We mainly focused on the preprocessing step for every CAPTCHA which converts them in binary intensity and removes the confusion as much as possible and developed various models to correctly label as many CAPTCHAs as possible. We also suggested some ways to improve the process of verifying the humans which makes it easy for humans to solve the existing CAPTCHAs and difficult for bots to do the same.

Note:

We have build many models to solve some of the difficult open sourced CAPTCHAs that are available on the internet. We have obtained about more than 99.5% accuracy on most of the models, which converges at about 5 epochs. The generators folder have some of the modified codes that we have used to generate the data to feed into the model. The pyfiles folder section have all of the models and their corresponding python codes.

Results

CAPTCHA name CAPTCHA img Algorithm used Accuracy Obtained Try out in Google Colab
JAM CAPTCHA img kNN 99.53% img PWC
CNN_c4l_16x16_550 img CNN - modified CIFAR 10 99.91% img PWC
captcha-1L img Own CNN model - multilabel classification 99.67% img PWC
captcha_4_letter img LSTM model - multilabel classification 99.87% img PWC
captcha_v2 img Own CNN - multilabel classification 90.102% img PWC
circle_captcha img Alex Net with multilabel classification 99.99% img PWC
faded img Alex Net with multilabel classification 99.44% img PWC
fish_eye img Alex Net with multilabel classification 99.46% img PWC
mini_captcha img Alex Net with multilabel classification 97.25% img PWC
multicolor img Alex Net with multilabel classification 95.69% img PWC
railway_captcha img Own CNN model 99.94% imgPWC
sphinx img Alex Net with multilabel classification 99.62% img PWC

Documentation

[Thesis - Deceiving computers in Reverse Turing Test through Deep Learning (Research paper)] | [Slides]

Advisor

Acknowledgements

  • nlACh [Help regarding data uploading]
  • 41x3n [Help regarding data uploading]
   Frequently Asked Questions
  • Are these the only notebooks?

  • Do we need to download the data?

    • No, it is automatically downloaded, you just need to plug and play for getting the job done in Google Collaboratory.
  • Training time is taking too long?

    • Yes, some of the CAPTCHAs really take long time to train, (over 10 hrs for just 10 epochs even in GPUs). It is good to have multiple GPUs when you are using this on your own machine.
  • Found a bug? or version issue?

    • PRs welcome, fork it, and send a pull request!

Contribution

Please feel free to raise issues and fix any existing ones. Further details can be found in our code of conduct.

While making a PR, please make sure you:

  • Always start your PR description with "Fixes #issue_number", if you're fixing an issue.
  • Briefly mention the purpose of the PR, along with the tools/libraries you have used. It would be great if you could be version specific.
  • Briefly mention what logic you used to implement the changes/upgrades.
  • Provide in-code review comments on GitHub to highlight specific LOC if deemed necessary.
  • Please provide snapshots if deemed necessary.
  • Update readme if required.

BibTeX and citations

@article{DBLP:journals/corr/abs-2006-11373,
  author    = {Jimut Bahan Pal},
  title     = {Deceiving computers in Reverse Turing Test through Deep Learning},
  journal   = {CoRR},
  volume    = {abs/2006.11373},
  year      = {2020},
  url       = {https://arxiv.org/abs/2006.11373},
  archivePrefix = {arXiv},
  eprint    = {2006.11373},
  timestamp = {Tue, 23 Jun 2020 17:57:22 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2006-11373.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}

captcha's People

Contributors

jimut123 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.