Comments (10)
@reddere, Use GoogleTranslateV2 and specify all your "static" links/hashtags into specific span
tag:
<span class="notranslate">TAGS OR LINKS THERE</span>
For more information visit: https://cloud.google.com/translate/troubleshooting
In [5]: from translatepy.translators.google import GoogleTranslateV2
In [6]: dl = GoogleTranslateV2()
In [9]: dl.translate('Kado Thorne es un Vampiro y viajó en el tiempo desde el año 2020 cuando se presentó a la skin Oro.\n\n<span class="notranslate">#Fortnite</span> <span class="notran
...: slate">#FortniteLastResort</span> <span class="notranslate">https://t.co/m1cE9sSrNb</span>', 'it')
Out[9]: TranslationResult(service=Translator(Google), source='Kado Thorne es un Vampiro y viajó en el tiempo desde el año 2020 cuando se presentó a la skin Oro.\n\n<span class="notranslate">#Fortnite</span> <span class="notranslate">#FortniteLastResort</span> <span class="notranslate">https://t.co/m1cE9sSrNb</span>', source_lang=Language(Spanish), dest_lang=Language(Italian), translation='Kado Thorne è un vampiro e ha viaggiato indietro nel tempo a partire dall\'anno 2020 quando gli è stata presentata la skin Oro.\n\n<span class="notranslate">#Fortnite</span> <span class="notranslate">#FortniteLastResort</span> <span class="notranslate">https://t.co/m1cE9sSrNb</span>')
from translate.
Do you have an example to reproduce ?
from translate.
Do you have an example to reproduce ?
Absolutely @Animenosekai ! Here is a text I got from a tweet. Notice how both hashtags and the tweet link letters are alterated. In the second hashtag, a letter even gets added out of nowhere.
from translatepy.translators.google import GoogleTranslate
text = 'Kado Thorne es un Vampiro y viajó en el tiempo desde el año 2020 cuando se presentó a la skin Oro.\n\n#Fortnite #FortniteLastResort https://t.co/m1cE9sSrNb'
translator = GoogleTranslate()
italian_text = translator.translate(text, 'Italian')
print(italian_text)
Result:
Kado Thorne è un vampiro e ha viaggiato nel tempo dal 2020 quando apparve nell'oro della pelle.\n\n#FORTNITE #FORTNITLelasTResort https://t.co/M1ce9SSRNB
Even if the normal text got translated fine, hashtags and link got alterated:
- Hashtag n.1 went from
#Fortnite
to#FORTNITE
(letters alteration) - Hashtag n.2 went from
#FortniteLastResort
to#FORTNITLelasTResort
(letters alteration + missing letter E + somehow "Last" got totally distorted and "Lelas", which doesnt mean anything in Italian) - Link went from
https://t.co/m1cE9sSrNb
tohttps://t.co/M1ce9SSRNB
. This alteration breaks entirely the link.
Any ideas on how to fix this?
from translate.
Parsing with a Regex maybe ?
from translate.
what do you mean? theres params I can pass to the GoogleTranslate() instance that allow me to hide parts of the passed text using regex?
from translate.
what do you mean? theres params I can pass to the GoogleTranslate() instance that allow me to hide parts of the passed text using regex?
Nope not for now but should I ?
Here is the major problem coming with this and HTML translation though :
TLDR: Might work for Latin based languages, but different languages have different structures and the order of words might need to change from one language to another. (this is also one of the reasons why when we translate stuff we don't translate each word individually and put back the pieces)
from translate.
Yeah I mean implement what I said would actually make it way better. The issue you mentioned kinda relates to the topic, and yeah thats easily fixable by just add a space in the final result after the dots or commas, if missing, but yeah implementing regex or any other way to hide certain parts of text would be awesome as it's frequent to alterate them
from translate.
Yes, this issue might be easier to handle than normal translations, as links don't exactly mean anything and don't need to be translated.
But, here is the problem :
First, it is not possible to separately translate things because it might not result in the best translation (because words have different meanings as a whole rather than individually). Also, as said before, there is no telling the position of the link should change, thus we can't just pin the position of the link and replace it after the translation:
(French) Je voudrais changer le lien https://google.com parce qu'il me semble y avoir trouvé une erreur
(Japanese) https://google.comのリンクに問題があると思うから変えたいです
Notice the change of position of the link
Now, if we let the translator translate everything and it ends up having issues with the links, we might want to find the link in the translated text and replace it with the previous one.
Something like this would be imaginable:
def link_correction(translated_text: str, links: list[str]) -> str:
"""A simple link correction function to keep the same links as before translation"""
processing_text = translated_text.lower()
for link in links:
index = processing_text.find(link.lower()) # try to find the link in the translated text
translated_text = translated_text[:index] + link + translated_text[len(link) + 1:] # just replace the link with the one before translation
return translated_text
Note
This is an oversimplification of what could be done
Now, as you mentioned previously:
Link went from https://t.co/m1cE9sSrNb to https://t.co/M1ce9SSRNB. This alteration breaks entirely the link.
So if we have two links similar lower cased, they might be both replaced by the same link.
Now what should I do ?
- Should I implement something which takes a Regex expression and tries to split the original text, then translates each parts individually and puts the pieces back together at the end, successfully leaving the Regex'ed parts untouched, but which comes with the first issue mentioned ?
- Should I implement the oversimplified algorithm written herebefore ?
- Also, should I implement the thing to add back spaces after dots, but this would work on languages using spaces after dots only (Latin-based for example) and might break the other ones ?
- Also, what if for some reason, the user wants to translate the links ?
Note
Even if I'm only talking about links here the same thing applies to the hashtags, with the exception that hashtags are even harder to correct after the translation as they might carry some meaning and might need to be translated
from translate.
Thank you so much @ZhymabekRoman @Animenosekai . Haven't tested the workaround yet, but I kept my old GoogleTranslator until just 2 days ago when I tried the ReversoTranslator, which to me, seems to work even better than GoogleTranslator. Both on a lexical and choice of word level, in Italian seems to work decently.
Somehow though, I did find an issue for that one as well, as it throws error when word like única
are in the source text, but I find better to open a separate issue for that one: #96
from translate.
Was talking with Venom on Discord about possible workarounds and support for notranslate
or other HTML parsing ways of not translating certain parts of a given input. Might consider this soon.
from translate.
Related Issues (20)
- The `example` function in `YandexTranslate` without translation HOT 7
- Bulk Translation support? HOT 2
- NoResult: No service has returned a valid result HOT 8
- Missing spaces between tags when using translate_html HOT 2
- Bing Translator throwing error code 205 HOT 5
- Not accurate source language autodetection HOT 14
- Language auto detecting is broken in MyMemory HOT 3
- example() not working HOT 1
- Next: 3.0 HOT 24
- Different Translation on Yandex HOT 11
- Result mismatch Google Translate HOT 5
- MicrosoftTranslate.text_to_speech is not working HOT 8
- Is the DeepL split text correct? HOT 3
- [SERVER]: cannot import name 'General' from 'nasse.config' HOT 1
- the lang tr is not supported by deepl error (even though it has) HOT 1
- How to set timeout for Translate requesting? HOT 8
- ReversoTranslator BUG HOT 7
- No module named 'translatepy.utils' (on Windows) HOT 2
- Traditional Chinese HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from translate.