Git Product home page Git Product logo

Comments (2)

Animenosekai avatar Animenosekai commented on July 22, 2024

Thanks for reaching us!

I could indeed reproduce your problem.

Just checked the code, we do not seem to remove the strings intentionally. This might be done by any translator outside translatepy.

Because we don't expect all the translators to support HTML translation, we need to separate each component of the HTML to translate them apart and reassemble everything at the end.

This has the side effect that each component is treated as separate, thus any cleaning (stripping the spaces for example) is done on every component.

<p>I am a student and <strong>you are a teacher</strong>, incredible</p>
~~~^^^^^^^^^^^^^^^^^^^~~~~~~~~^^^^^^^^^^^^^^^^^~~~~~~~~~^^^^^^^^^^^^~~~~
           1                          2                       3

These are 3 separate components, which will each be translated separately

Now, the problem is that we don't know what kind of cleaning is done by the translators, and it might even be different translators translating the different components.

For some differently structured languages, the translator might be adding or removing some kind of specific symbols which has a meaning in the resulting language.

The order of symbols in a single phrase might also need to be different.

Now if we introduce a basic checking before translating to see if we need to re-add spaces after the translation or not

...
if tail_space_before_translation and not result.endswith(" "):
    result += " "
...

It might work for Latin-based languages translations, but the translator might have deleted the spaces for a reason :

(will take my native languages for simplicity)

<p>Je suis un étudiant <strong>et vous êtes un professeur</strong></p>

Should be translated in Japanese to

<p>僕は生徒で<strong>あなたは先生です</strong></p>

Notice that we removed the space, because we usually don't use lots of spaces in Japanese

We see that this behavior is also found when translating with translatepy

>>> from translatepy import Translate
>>> t = Translate()
>>> r = t.translate_html("<p>Je suis un étudiant <strong>et vous êtes un professeur</strong></p>", "Japanese")
>>> r
'<p>私は学生です<strong>そして、あなたは先生です</strong></p>'

(which is a weird translation because of the component separation, but that's another topic)

I would need to come up with a better algorithm to translate HTML content without losing the context (language wise and HTML wise) but I guess that would require complex NLP

If you have any idea, I would welcome them.

If you have any question or issue, feel free to ask them!

Oh, and sorry for being a bit inactive lately, but school work is way busier compared to what I previously had...

from translate.

Animenosekai avatar Animenosekai commented on July 22, 2024

Closing this for now, since it's been a while since this got any activity.

I partly continued this discussion in #93 if you are interested.

Feel free to reply if you want to reopen it!

from translate.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.