Git Product home page Git Product logo

unidecoder's Introduction

Unidecoder

This library provides methods to transliterate Unicode characters to an ASCII approximation.

The functionality in this library was originally written by Russel Norris for his Stringex library. This gem is an extraction of the Unicode transliteration functionality from Stringex into a separate library with some added functionality.

The Unidecoder component of Stringex is itself a port of Sean M. Burke's Unidecode Perl module.

Installation

gem install unidecoder

Usage

"olá, mundo!".to_ascii                 #=> "ola, mundo!"
"你好".to_ascii                        #=> "Ni Hao "
"Jürgen Müller".to_ascii               #=> "Jurgen Muller"
"Jürgen Müller".to_ascii("ü" => "ue")  #=> "Juergen Mueller"

Extra stuff

If you also install either the Unicode or Active Support gems, Unidecoder will also perform Unicode normalization before attempting to transliterate strings to ASCII.

Warnings

While this is a neat trick, in practice many transliterations end up being fairly useless. For example, all Chinese characters are transliterated to Mandarin Chinese. Since Japanese uses Chinese characters writing, but pronounces them differently from Mandarin, this makes the transliteration of Japanese with this library useless.

Some languages, like Russian, would most correctly transliterate some letters based on context, rather than a 1-1 mapping with ASCII. This library does not do that.

Other languages, like Hebrew and Arabic, don't write vowels, but assume them from context, so the ASCII representation of these langages given by this library will look fairly ugly to native speakers.

Basically, your milage may vary. I don't speak every language used by this library, so there are certain to be limitations and errors. Your feedback is most appreciated!

unidecoder's People

Contributors

josephhalter avatar lifo avatar norman avatar rf- avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.