As a native German with an Umlaut in my name, I'm really surprised by the choice to, a

German replacement is weird about anyascii HOT 3 OPEN

jfinkhaeuser commented on June 10, 2024 4

German replacement is weird

from anyascii.

Comments (3)

CharlesTerrell commented on June 10, 2024 1

From looking at the table, the strange mapping is not exclusive to German.

Spanish: Ñ/ñ should be ny, not just n. Very obvious. Discarding the tilde is usually understandable in context, but could lead to misunderstandings (or at least.eye-rolling).

Esperanto: Ĉ/Ĵ/Ĝ/Ĥ/Ŝ/Ŭ can be converted with one of two systems. Either replace the letter with its bare form followed by x (Ĝ->Gx, ŭ->ux), or follow with h except for the ŭ (Ĝ->Gh, ŭ->u). Both systems have good and bad points. But simply using the bare Roman letter is almost always wrong, and usually changes the meaning in hilarious ways.

The table might need to be modified for specific use.cases. Which somewhat limits its universality.

from anyascii.

hunterwb commented on June 10, 2024

Diaresis/umlaut is used by dozens of languages and encoded identically so a language-neutral approach must be used. Adding the e is too disruptive for all text which is not German. If you know your input is German you may make this replacement yourself before calling anyascii.

See here

from anyascii.

jfinkhaeuser commented on June 10, 2024

Thank you for providing this context.

Umlaut and diaresis are different things, closer to opposites really. I get your point about the encoding; there are ways to encode the two differently in Unicode, but I suppose most inputs won't do that. In such a case, you can't really know what the right thing to do is. When they're encoded differently, I hope that the replacement takes that into account - or is that too much to hope for?

I don't agree with the conclusion in the linked article, though. The missing "e" is very confusing to Germans - more so to Germans that are not used to going back and forth between it and English. I would go so far as to suggest that this reads very clearly like a non-expert opinion (which the author all but admits he is).

Still, this has brought to light the problem you're facing.

Maybe the right conclusion is that the one-line description "Converts Unicode characters to their best ASCII representation" is wrong. I think there's great value in what the library does - this description, though, assumes a very particular interpretation of the word "best" that I'm certain not everybody shares.

How about changing it to a "Converts Unicode characters to a simple, readable ASCII representation without considering context or language"? It seems to avoid creating false expectations much better.

from anyascii.

German replacement is weird about anyascii HOT 3 OPEN

Comments (3)

Related Issues (13)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent