Comments (2)
Thanks for reaching us!
I could indeed reproduce your problem.
Just checked the code, we do not seem to remove the strings intentionally. This might be done by any translator outside translatepy
.
Because we don't expect all the translators to support HTML translation, we need to separate each component of the HTML to translate them apart and reassemble everything at the end.
This has the side effect that each component is treated as separate, thus any cleaning (stripping the spaces for example) is done on every component.
<p>I am a student and <strong>you are a teacher</strong>, incredible</p>
~~~^^^^^^^^^^^^^^^^^^^~~~~~~~~^^^^^^^^^^^^^^^^^~~~~~~~~~^^^^^^^^^^^^~~~~
1 2 3
These are 3 separate components, which will each be translated separately
Now, the problem is that we don't know what kind of cleaning is done by the translators, and it might even be different translators translating the different components.
For some differently structured languages, the translator might be adding or removing some kind of specific symbols which has a meaning in the resulting language.
The order of symbols in a single phrase might also need to be different.
Now if we introduce a basic checking before translating to see if we need to re-add spaces after the translation or not
...
if tail_space_before_translation and not result.endswith(" "):
result += " "
...
It might work for Latin-based languages translations, but the translator might have deleted the spaces for a reason :
(will take my native languages for simplicity)
<p>Je suis un étudiant <strong>et vous êtes un professeur</strong></p>
Should be translated in Japanese to
<p>僕は生徒で<strong>あなたは先生です</strong></p>
Notice that we removed the space, because we usually don't use lots of spaces in Japanese
We see that this behavior is also found when translating with translatepy
>>> from translatepy import Translate
>>> t = Translate()
>>> r = t.translate_html("<p>Je suis un étudiant <strong>et vous êtes un professeur</strong></p>", "Japanese")
>>> r
'<p>私は学生です<strong>そして、あなたは先生です</strong></p>'
(which is a weird translation because of the component separation, but that's another topic)
I would need to come up with a better algorithm to translate HTML content without losing the context (language wise and HTML wise) but I guess that would require complex NLP
If you have any idea, I would welcome them.
If you have any question or issue, feel free to ask them!
Oh, and sorry for being a bit inactive lately, but school work is way busier compared to what I previously had...
from translate.
Closing this for now, since it's been a while since this got any activity.
I partly continued this discussion in #93 if you are interested.
Feel free to reply if you want to reopen it!
from translate.
Related Issues (20)
- The `example` function in `YandexTranslate` without translation HOT 7
- Is this project dead? Can't transliterate using most translators... HOT 3
- NoResult: No service has returned a valid result HOT 8
- Bing Translator throwing error code 205 HOT 5
- Not accurate source language autodetection HOT 14
- Language auto detecting is broken in MyMemory HOT 3
- example() not working HOT 1
- Next: 3.0 HOT 24
- Different Translation on Yandex HOT 11
- Result mismatch Google Translate HOT 5
- MicrosoftTranslate.text_to_speech is not working HOT 8
- Is the DeepL split text correct? HOT 3
- [SERVER]: cannot import name 'General' from 'nasse.config' HOT 1
- the lang tr is not supported by deepl error (even though it has) HOT 1
- How to set timeout for Translate requesting? HOT 8
- Links and hashtags seem to change after translation HOT 10
- ReversoTranslator BUG HOT 7
- No module named 'translatepy.utils' (on Windows) HOT 2
- Traditional Chinese HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from translate.