Pinyin (Pin Yin "spell sound") is a transliteration to handle Romanization for Chinese

<a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/us

Thanks for opening the issue (and yay, Issue <a class="issue-link js-issue-link" data-

How to handle transliteration in some languages? about abstracttext HOT 8 CLOSED

google commented on April 26, 2024

How to handle transliteration in some languages?

from abstracttext.

Comments (8)

thadguidry commented on April 26, 2024 1

@vrandezo Thanks Denny. I also agree with you, since Pinyin is an input format, not an output, so its not reversible. (I just wanted to make sure there wasn't something I was missing conceptually from the AbstractText effort regarding transliteration handling. Thanks for explaining!)

Regarding Chinese example... the talk, sleep, and tax are all pronounced the same in Chinese, and relies on sentence context. water and who have different pronunciation. All 5 use "shui" to type into input systems where a user is usually given a popup choice for which Chinese lexeme they are meaning from the English Pinyin input.

Closing this issue now, since we have the use case and our agreed probable handling for it as a reference.

from abstracttext.

vrandezo commented on April 26, 2024

Thanks for opening the issue (and yay, Issue #1!!)

And I have to admit, I am not sure I understand the issue. This is because I really don't understand how Chinese works, so my answer might be entirely besides the point.

So I will rephrase how I understand the question and then answer that question. Please don't let me get away with it if I entirely missed the point.

Serbian Wikipedia, for example, uses two scripts (so do a few others, such as Uzbek, Tatar, etc.). And the question is how would Abstract Wikipedia support both of those scripts?

In Serbian, the situation is particularly simple: the latin transliteration can be generated from a cyrillic input easily. So it is possible to simply generate a cyrillic output, and then, at the very end, just run a transliteration function over the resulting string that translates the string to latin.

This does not always work: for example, the reverse wouldn't be as trivial, because Њ transliterates to nj, but the two letters n and j transliterate to н and ј respectively. In that case we would need to retain the information whether these are the two letters n and j which happen to be next to each other or whether it is the digraph nj.

This can be done by either creating a slightly abstract output that retains this information with a special token, and then use a final pass over the result that removes these tokens and replaces them with the concrete letters, or by rewriting the functions so that they take the script as a parameter and push this knowledge deeper into the function stack.

Either of the solutions would be possible, and the respective language community can decide which one makes more sense for their particular language (in fact, this could get a far way to solve the differences between standard Croatian and Serbian).

So, I hope that this answer somehow applies to your question. If it doesn't please let me know and give me a bit more background. Thank you!

from abstracttext.

thadguidry commented on April 26, 2024

Yes it answers it partially.
I completely understand that functions could read information from lots of places.
The question is WHERE is the information stored (best).

So my only question is about Wikidata Lexeme's themselves storing that information of transliteration maps and how best to store it, so that Abstract Text functions can read it properly.

Where in the Lexeme ecosystem would the transliteration mapping be applied that functions could read from? Would it be on the ZH entities? or the EN entities? or both? or somewhere else?
Would P1721 "pinyin transliteration" be used always as a qualifier within the translation statement? Ex: https://www.wikidata.org/wiki/Lexeme:L3302

Or use P1721 "pinyin transleteration" as a direct statement on the ZH entity (which mimics how input systems work)? Ex: https://www.wikidata.org/wiki/Lexeme:L8219

from abstracttext.

vrandezo commented on April 26, 2024

As I said, I really am not sufficiently knowledgeable about Chinese.

If I understand it correctly, and the transliteration is always the same for a given Chinese lexeme, and does not differ based on Sense or Form, then I would think that it makes more sense as a statement on the Chinese lexeme (as in your last screenshot).

If it is on the translation of the English lexeme for water, it looks like it is a denormalization - that date should not be a qualifier on that translation, as in your first screenshot, that doesn't look right to me. This would lead to a lot of duplication.

from abstracttext.

thadguidry commented on April 26, 2024

date?

from abstracttext.

thadguidry commented on April 26, 2024

Here's a transliteration map from OpenVanilla.org
Hopefully this clarifies the question for you to offer good advice... and then we can close this issue out after you respond.

shui 水  - water
shui 说  - talk
shui 谁  - who
shui 睡  - sleep
shui 税  - tax

from abstracttext.

vrandezo commented on April 26, 2024

"date" - mistyped, I meant datum, snak, or piece of information.

from abstracttext.

vrandezo commented on April 26, 2024

Regarding the example you showed:

It looks there as if every Chinese character only has a single Pinyin transliteration into, but that the result of that is not reversible, i.e. the same string in latin script is ambiguous when translated back to Chinese characters.

That would indicate that it would make sense to have the render function for Chinese create Chinese characters, and if a transliteration into pinyin is desired, a function can run on top of that.

So I still think that my last comment holds: it looks like the pinyin form should be on the lexeme representing the Chinese character, not on the statement offering a translation coming from the English (or any other) noun.

Also, I think that this discussion probably would make more sense on Wikidata itself. I wouldn't want the modelling of Wikidata be affected by a possible future implementation of a project proposal. That seems premature :)

Feel free to close this if this satisfies your question.

from abstracttext.

How to handle transliteration in some languages? about abstracttext HOT 8 CLOSED

Comments (8)

Related Issues (15)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent