Git Product home page Git Product logo

confusables's Introduction

Confusables

Simple python 3 class for matching a strings that have letters that only look the same as original string. unicode.org provides a nice list of "confusable" letters.
This class uses that info to turn a string into a regular expression pattern that includes all these confusable variations.

E.g. "𝓗℮đĨ1āŗĻ" matches "Hello"

"Hello" gets turned into the following regex of character classes:

[H\īŧ¨\ℋ\ℌ\ℍ\𝐇\đģ\đ‘¯\𝓗\đ•ŗ\𝖧\𝗛\𝘏\𝙃\𝙷\Η\𝚮\𝛨\đœĸ\𝝜\𝞖\Ⲏ\Н\áŽģ\á•ŧ\ꓧ\𐋏\⹧\Ōĸ\ÄĻ\Ķ‰\Ķ‡]
[e\℮\īŊ…\ℯ\ⅇ\𝐞\𝑒\𝒆\𝓮\đ”ĸ\𝕖\𝖊\𝖾\𝗲\đ˜Ļ\𝙚\𝚎\ęŦ˛\Đĩ\ŌŊ\ɇ\Ōŋ]
[l\‎\|\âˆŖ\âŊ\īŋ¨1\‎\Ûą\𐌠\‎\𝟏\𝟙\đŸŖ\𝟭\𝟷I\īŧŠ\Ⅰ\ℐ\ℑ\𝐈\đŧ\𝑰\𝓘\𝕀\𝕴\𝖨\𝗜\𝘐\𝙄\𝙸\Ɩ\īŊŒ\â…ŧ\ℓ\đĨ\𝑙\𝒍\𝓁\đ“ĩ\𝔩\𝕝\𝖑\𝗅\𝗹\𝘭\𝙡\𝚕\Į€\Ι\𝚰\đ›Ē\𝜤\𝝞\𝞘\Ⲓ\І\Ķ€\‎\‎\‎\‎\‎\‎\‎\‎\âĩ\ᛁ\ꓲ\đ–ŧ¨\𐊊\𐌉\‎\‎\ł\É­\Ɨ\ƚ\ÉĢ\‎\‎\‎\‎\ŀ\Äŋ\ᒷ\🄂\⒈\‎\⒓\ãĢ\㋋\㍤\⒔\ãŦ\ãĨ\⒕\㏭\ãĻ\⒖\㏎\㍧\⒗\㏯\㍨\⒘\㏰\㍊\⒙\ãą\ãĒ\⒚\ã˛\ãĢ\Į‰\IJ\‖\âˆĨ\Ⅱ\Į\‎\𐆙\⒒\â…ĸ\𐆘\ãĒ\㋊\ãŖ\ĐŽ\⒑\㏊\㋉\ãĸ\ĘĒ\â‚ļ\â…Ŗ\Ⅸ\ÉŽ\ĘĢ\㏠\㋀\㍙]
[l\‎\|\âˆŖ\âŊ\īŋ¨1\‎\Ûą\𐌠\‎\𝟏\𝟙\đŸŖ\𝟭\𝟷I\īŧŠ\Ⅰ\ℐ\ℑ\𝐈\đŧ\𝑰\𝓘\𝕀\𝕴\𝖨\𝗜\𝘐\𝙄\𝙸\Ɩ\īŊŒ\â…ŧ\ℓ\đĨ\𝑙\𝒍\𝓁\đ“ĩ\𝔩\𝕝\𝖑\𝗅\𝗹\𝘭\𝙡\𝚕\Į€\Ι\𝚰\đ›Ē\𝜤\𝝞\𝞘\Ⲓ\І\Ķ€\‎\‎\‎\‎\‎\‎\‎\‎\âĩ\ᛁ\ꓲ\đ–ŧ¨\𐊊\𐌉\‎\‎\ł\É­\Ɨ\ƚ\ÉĢ\‎\‎\‎\‎\ŀ\Äŋ\ᒷ\🄂\⒈\‎\⒓\ãĢ\㋋\㍤\⒔\ãŦ\ãĨ\⒕\㏭\ãĻ\⒖\㏎\㍧\⒗\㏯\㍨\⒘\㏰\㍊\⒙\ãą\ãĒ\⒚\ã˛\ãĢ\Į‰\IJ\‖\âˆĨ\Ⅱ\Į\‎\𐆙\⒒\â…ĸ\𐆘\ãĒ\㋊\ãŖ\ĐŽ\⒑\㏊\㋉\ãĸ\ĘĒ\â‚ļ\â…Ŗ\Ⅸ\ÉŽ\ĘĢ\㏠\㋀\㍙]
[o\ā°‚\ā˛‚\ā´‚\āļ‚\āĨĻ\āŠĻ\āĢĻ\ā¯Ļ\āąĻ\āŗĻ\āĩĻ\āš\āģ\၀\‎\Ûĩ\īŊ\ℴ\𝐨\𝑜\𝒐\𝓸\đ”Ŧ\𝕠\𝖔\𝗈\đ—ŧ\𝘰\𝙤\𝚘\ᴏ\ᴑ\ęŦŊ\Îŋ\𝛐\𝜊\𝝄\𝝾\𝞸\Īƒ\𝛔\𝜎\𝝈\𝞂\đžŧ\ⲟ\Đž\áƒŋ\օ\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\‎\ā´ \ဝ\đ“Ē\đ‘Ŗˆ\đ‘Ŗ—\đŦ\‎\ø\ęŦž\Éĩ\ꝋ\ĶŠ\Ņŗ\ꮎ\ęŽģ\ę­´\‎\ÆĄ\œ\Éļ\∞\ꝏ\ꚙ\āĩŸ\တ]

Note: Some characters above may not render in your browser correctly.

Probably best to combine this with removing accented characters in the text to be searched. Several ways explained here: https://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string

Inspiration: https://stackoverflow.com/questions/9491890/is-there-a-list-of-characters-that-look-similar-to-english-letters/48555901#48555901

confusables's People

Contributors

wanderingstan avatar stickler-ci avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤ī¸ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.