Git Product home page Git Product logo

Comments (3)

CharlesTerrell avatar CharlesTerrell commented on June 10, 2024 1

From looking at the table, the strange mapping is not exclusive to German.

Spanish: Ñ/ñ should be ny, not just n. Very obvious. Discarding the tilde is usually understandable in context, but could lead to misunderstandings (or at least.eye-rolling).

Esperanto: Ĉ/Ĵ/Ĝ/Ĥ/Ŝ/Ŭ can be converted with one of two systems. Either replace the letter with its bare form followed by x (Ĝ->Gx, ŭ->ux), or follow with h except for the ŭ (Ĝ->Gh, ŭ->u). Both systems have good and bad points. But simply using the bare Roman letter is almost always wrong, and usually changes the meaning in hilarious ways.

The table might need to be modified for specific use.cases. Which somewhat limits its universality.

from anyascii.

hunterwb avatar hunterwb commented on June 10, 2024

Diaresis/umlaut is used by dozens of languages and encoded identically so a language-neutral approach must be used. Adding the e is too disruptive for all text which is not German. If you know your input is German you may make this replacement yourself before calling anyascii.

See here

from anyascii.

jfinkhaeuser avatar jfinkhaeuser commented on June 10, 2024

Thank you for providing this context.

Umlaut and diaresis are different things, closer to opposites really. I get your point about the encoding; there are ways to encode the two differently in Unicode, but I suppose most inputs won't do that. In such a case, you can't really know what the right thing to do is. When they're encoded differently, I hope that the replacement takes that into account - or is that too much to hope for?

I don't agree with the conclusion in the linked article, though. The missing "e" is very confusing to Germans - more so to Germans that are not used to going back and forth between it and English. I would go so far as to suggest that this reads very clearly like a non-expert opinion (which the author all but admits he is).

Still, this has brought to light the problem you're facing.

Maybe the right conclusion is that the one-line description "Converts Unicode characters to their best ASCII representation" is wrong. I think there's great value in what the library does - this description, though, assumes a very particular interpretation of the word "best" that I'm certain not everybody shares.

How about changing it to a "Converts Unicode characters to a simple, readable ASCII representation without considering context or language"? It seems to avoid creating false expectations much better.

from anyascii.

Related Issues (13)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.