diaoul / babelfish Goto Github PK

BabelFish is a Python library to work with countries and languages

License: BSD 3-Clause "New" or "Revised" License

Python 99.74% Shell 0.26%

babelfish's Introduction

Hi there 👋

Somehow you've landed here 🤔

I am probably building something or learning some stuff. Feel free to look around, check my contributions below, website and LinkedIn profile.

babelfish's People

Contributors

Stargazers

Watchers

Forkers

nvbn toilal bcse ratoaq2 dimotsai scbwin hugovk sreekanth370 labrys miigotu mgorny chapmanjacobd ahm-forks

babelfish's Issues

Language.fromname('Pashto') fails with LanguageReverseError

Language.fromname('Pashto') fails. 'Pashto' is a valid alternate spelling (per ISO 639-2) of 'Pushto' which works correctly.

Steps to recreate:

from babelfish import Language
Language.fromname('Pashto')

Tested on Win 10 / Python 3.5 x64

ImportError: No module named alpha2 when calling babelfish.Language.fromietf('EN')

Using you're subliminal backend, I construct the languages (with babelfish) ahead of time. But I had one report from a user using Python 2.7.3 who gets an import error (of alpha2) by the time it gets to babelfish's sub libraries.

I don't expect you to support my script at all... I was just curious on how to interpret the last part of the exception being thrown. Maybe it's a bug with babel fish? Maybe it isn't? Have you seen something like that? Perhaps an obvious issue I'm overlooking or a syntax error on my part?

This syntax works for me, so i can't reproduce it

import babelfish
babelfish.Language.fromietf('EN')

Here is the output passed along to me in it's entirety:

./Subliminal.py -S /media/data/Movies/ -l EN -f
2015-10-09 09:26:55,195 - 7241 - INFO - Found 43 matched file(s).
2015-10-09 09:26:55,196 - 7241 - INFO - Using advanced search mode
2015-10-09 09:26:55,201 - 7241 - ERROR - Fatal Exception:
  Traceback (most recent call last):
    File "./Subliminal/nzbget/ScriptBase.py", line 2398, in run
    exit_code = main_function(*args, **kwargs)
    File "./Subliminal.py", line 1478, in main
    use_nzbheaders=False,
    File "./Subliminal.py", line 955, in subliminal_fetch
    lang = set( babelfish.Language.fromietf(l) for l in lang )
    File "./Subliminal.py", line 955, in <genexpr>
    lang = set( babelfish.Language.fromietf(l) for l in lang )
    File "./Subliminal/babelfish/language.py", line 124, in fromietf
    language = cls.fromalpha2(language_subtag)
    File "./Subliminal/babelfish/language.py", line 110, in fromcode
    return cls(*language_converters[converter].reverse(code))
    File "./Subliminal/babelfish/converters/__init__.py", line 250, in __getitem__
    plugin = ep.load(require=False)
    File "./Subliminal/pkg_resources.py", line 1948, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  ImportError: No module named alpha2

Thoughts?

DeprecationWarning with latest setuptools

Hi @Diaoul,

I've fixed an issue in guessit (guessit-io/guessit#183) and find out babelfish is also impacted. In newer setuptools version, load(require=False) display a deprecation warning at babelfish/init.py.

I'll make a PR to fix this deprecation warning the same way it's done in guessit.

Add new converters which uses country demonyms to convert languages

I'm looking for a way to better identify languages in media tracks (mainly audio and subtitle tracks). Usually default tags from media tracks are not precise. Very rarely you get an audio or subtitle track with the correct IETF tag for pt-BR or es-MX or other languages. 99% of the time they are just marked as pt or es and it's very common to have 2 or more tracks with the same language code:

        {
            "codec": "SubRip/SRT",
            "id": 19,
            "properties": {
                "codec_id": "S_TEXT/UTF8",
                "codec_private_length": 0,
                "default_track": false,
                "enabled_track": true,
                "encoding": "UTF-8",
                "forced_track": false,
                "language": "por",
                "language_ietf": "pt",
                "number": 20,
                "text_subtitles": true,
                "track_name": "Português",
                "uid": 1602227994484803173
            },
            "type": "subtitles"
        },
        {
            "codec": "SubRip/SRT",
            "id": 20,
            "properties": {
                "codec_id": "S_TEXT/UTF8",
                "codec_private_length": 0,
                "default_track": false,
                "enabled_track": true,
                "encoding": "UTF-8",
                "forced_track": false,
                "language": "por",
                "language_ietf": "pt",
                "number": 21,
                "text_subtitles": true,
                "track_name": "Português (Brasil)",
                "uid": 17784914655403220205
            },
            "type": "subtitles"
        },

In order to solve this, most likely an approach like guessit is needed. While analysing a large dataset from audio tracks and subtitle tracks, part of them use the official language name in english with the country demonym:

Brazilian Portuguese
British English
American English
French Canadian

I know babelfish is a very concise library that does one thing and it does it well. And to solve this issue I'll need to create extensions (language and country converters) that are outside babelfish scope.

But this little piece related to country demonyms seems a nice feature to be included in babelfish. Maybe something like this:

>>> import babelfish
>>> babelfish.Country.fromname('France')
<Country [FR]>
>>> babelfish.Country.fromdemonym('French')
<Country [FR]>

>>> import babelfish
>>> babelfish.Language.fromname('Portuguese')
<Language [pt]>
>>> babelfish.Language.fromname('Brazilian Portuguese')
<Language [pt-BR]>
>>> babelfish.Language.fromname('Swiss German')
<Language [de-CH]>

I believe babelfish could have at least the demonyms in English and use that to parse the language.
I could try to contribute with this part if you think it makes sense to be part of babelfish.

Some references:
https://en.wikipedia.org/wiki/List_of_adjectival_and_demonymic_forms_for_countries_and_nations
https://github.com/porimol/countryinfo#demonym
https://gist.github.com/consti/e2c7ddc64f0aa044a8b4fcd28dba0700
https://github.com/mledoze/countries/blob/master/countries.json

newbie question

I can't get modules from babelfish to do anything. From python see:

from nltk.misc import babelfish #this part works, did pip install bablefish

print babelfish.translate('cookbook', 'english', 'spanish')

AttributeError Traceback (most recent call last)
in ()
----> 1 print babelfish.translate('cookbook', 'english', 'spanish')

AttributeError: 'module' object has no attribute 'translate'

import inspect
inspect.getmembers(babelfish, predicate=inspect.ismethod)
[] # no modules loaded from babelfish.

Installation problem

Hi Diaoul I'm trying to play with babelfish code for future import on sb.
Trying the code in docs and I have this problem:

language = babelfish.Language('por', 'BR')
language return:

Traceback (most recent call last):
File "<pyshell#7>", line 1, in
language
File "C:\Python27\lib\site-packages\babelfish\language.py", line 150, in repr
return '<Language [%s]>' % self
File "C:\Python27\lib\site-packages\babelfish\language.py", line 154, in str
s = self.alpha2
File "C:\Python27\lib\site-packages\babelfish\language.py", line 132, in getattr
raise AttributeError(name)
AttributeError: alpha2

I've installed babelfish without using setup but copying babelfish directory in my python site-packaged directory....I know this is not the best way to install but in this way I can test if the library can work without problem in sickbeard.

thanks.

Mr_Orange.

Partial IETF Support

Partial IETF representation of a Language should be possible: fr-FR, en-US, de
It is currently used in __str__ with alpha3

Once #3 is implemented, the script subtag can be represented aswell: be-Cyrl, fr-Latn

PyPi release outdated

With Python 3.10 on the horizon and #29 implemented to fix the issue of importing from collections instead of from collections.abc PyPi could use a new release that reflects this fix.

babelfish.exceptions.ReverseError: u'fra'

(sub)hadim boromir ~ $ subliminal /media/hadim/MediaHadi2/movies/Un.Prophete.2009/ -l fr -v
INFO:subliminal.video:Scanning directory u'/media/hadim/MediaHadi2/movies/Un.Prophete.2009/'
INFO:subliminal.video:Scanning video u'Un.Prophete.2009.mkv' in u'/media/hadim/MediaHadi2/movies/Un.Prophete.2009'
INFO:enzyme.mkv:Reading Segment element
INFO:enzyme.parsers.ebml.core:MasterElement EBML ignored
INFO:enzyme.parsers.ebml.core:Maximum level 0 reached for children of MasterElement Segment
INFO:enzyme.mkv:Reading SeekHead element
INFO:enzyme.mkv:Processing element Info from SeekHead at position 4410
INFO:enzyme.mkv:Processing element Tracks from SeekHead at position 4485
Traceback (most recent call last):
  File "/home/hadim/local/virtualenvs/sub/bin/subliminal", line 9, in <module>
    load_entry_point('subliminal==0.7.1', 'console_scripts', 'subliminal')()
  File "build/bdist.linux-x86_64/egg/subliminal/cli.py", line 70, in subliminal
  File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 279, in scan_videos
  File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 219, in scan_video
  File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 219, in <setcomp>
  File "/home/hadim/local/virtualenvs/sub/local/lib/python2.7/site-packages/babelfish/language.py", line 51, in fromcode
    return cls(*CONVERTERS[converter].reverse(code))
  File "/home/hadim/local/virtualenvs/sub/local/lib/python2.7/site-packages/babelfish/converters/alpha3b.py", line 34, in reverse
    raise ReverseError(alpha3b)
babelfish.exceptions.ReverseError: u'fra'

Shouldn't you catch systematic error ? I guess mkv have a lot of possible code for language... You should only display an error.

[Suggestion] Maybe OpenSubtitlesConverter should move to subliminal?

pkg_resources is deprecated

Friendly reminder that pkg_resources is deprecated in favor of importlib.resources.

https://setuptools.pypa.io/en/latest/pkg_resources.html
https://docs.python.org/3/library/importlib.resources.html

Another newbie question

I just installed SCALe on a Ubuntu 17.10 VM. I think. I say that because when I do:
cd $SCALE_HOME/scale.app
bundle exec thin start --port 8080
on the VM and went to http://127.0.0.1 on the same VM (on firefox) I get a prompt that says:
----->http://127.0.0.1 is requesting your username and password. The site says: "Application"

I can't tell if this is coming from SCALe or something else. If it is from SCALe, what username and password should I give this prompt.

Language.fromname('Divehi') fails with LanguageReverseError

Language.fromname('Divehi') fails. 'Divehi' is a valid alternate spelling (per ISO 639-2) of 'Dhivehi' which works correctly.

Steps to recreate:

from babelfish import Language
Language.fromname('Divehi')

Tested on Win 10 / Python 3.5 x64

Support for scripts (ISO 15924)

Useful to differentiate the writing of the same language

Improve test coverage

https://coveralls.io/r/Diaoul/babelfish

Python 3.4 and inspect

Calling some inspect functions with Python 3.4 on babelfish objects crash (maximum recursion depth exceeded). see guessit-io/guessit#109

>>> from babelfish.language import Language
>>> from inspect import ismethoddescriptor
>>> lang = Language("fra")
>>> ismethoddescriptor(lang)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\x64\Python34\Lib\inspect.py", line 113, in ismethoddescriptor
    return hasattr(tp, "__get__") and not hasattr(tp, "__set__")
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
[....]
RuntimeError: maximum recursion depth exceeded while calling a Python object

[0.6.0] No tests in pypi tarball

As per subject. Can they be included?

Import error and weirdness

There are some tricks with babelfish currently.
I'm not sure it is the right thing to do to load the entry points during import.

Workflow:

import babelfish loads entry point subliminal.converter.addic7ed:Addic7edConverter
Loading entry point subliminal.converter.addic7ed:Addic7edConverter triggers import subliminal
import subliminal triggers import babelfish if there is this statement in subliminal/__init__.py

Boom.

Should we use lazy loading?

Add dut to nld as alternative

In ISO 639-2 dut is used as alternative to nld for Dutch language.
Can that be added to babelfish?

DeprecationWarning: Using or importing the ABCs from 'collections' instead of from 'collections.abc' is deprecated

As of python 3.8 this will no longer work.

Python 3.4 and inspect

Calling some inspect functions with Python 3.4 on babelfish objects crash with stack overflow. see guessit-io/guessit#109

>>> from babelfish.language import Language
>>> from inspect import ismethoddescriptor
>>> lang = Language("fra")
>>> ismethoddescriptor(lang)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\x64\Python34\Lib\inspect.py", line 113, in ismethoddescriptor
    return hasattr(tp, "__get__") and not hasattr(tp, "__set__")
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)

Syntax error with Python 2.6

I have a syntax error with Python 2.6

  File "/home/travis/virtualenv/python2.6/lib/python2.6/site-packages/babelfish/converters/alpha3b.py", line 14
    SYMBOLS = {iso_language.alpha3: iso_language.alpha3b for iso_language in LANGUAGE_MATRIX if iso_language.alpha3b}
                                                           ^
SyntaxError: invalid syntax

See https://travis-ci.org/wackou/guessit/jobs/15229553

Fix resource_stream and with statement

Because resource_stream may return a StringIO that does not support the with statement we need to use close explicitly.

Seems like: http://ridingpython.blogspot.fr/2011/08/stream-from-pkgresourcesresourcestream.html

Python3 Converter Exception

I have an error in guessit with Python3 (3.3.3 x86 on windows). It runs without any problem on Python 2.7.

For: Movies/Persepolis (2007)/[XCT] Persepolis [H264+Aac-128(Fr-Eng)+ST(Fr-Eng)+Ind].mkv
Traceback (most recent call last):
  File "D:\devel\workspace\babelfish\babelfish\converters\__init__.py", line 156, in convert
    return self.to_symbol[alpha3]
KeyError: 'xct'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Python\x86\Python33\Lib\runpy.py", line 160, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "D:\Python\x86\Python33\Lib\runpy.py", line 73, in _run_code
    exec(code, run_globals)
  File ".\guessit\__main__.py", line 160, in <module>
    main()
  File ".\guessit\__main__.py", line 154, in main
    advanced=options.advanced)
  File ".\guessit\__main__.py", line 37, in detect_filename
    print('GuessIt found:', guess_file_info(filename, filetype, info).nice_string(advanced))
  File ".\guessit\__init__.py", line 134, in guess_file_info
    result.append(_guess_filename(filename, filetype))
  File ".\guessit\__init__.py", line 105, in _guess_filename
    mtree = IterativeMatcher(filename, filetype=filetype)
  File ".\guessit\matcher.py", line 120, in __init__
    self._apply_transfo(transformer)
  File ".\guessit\matcher.py", line 132, in _apply_transfo
    transformer.process(self.match_tree, *all_args, **all_kwargs)
  File ".\guessit\transfo\guess_language.py", line 118, in process
    SingleNodeGuesser(self.guess_language, None, self.log, *args, **kwargs).process(mtree)
  File ".\guessit\transfo\__init__.py", line 151, in process
    find_and_split_node(node, strategy, self.skip_nodes, self.logger)
  File ".\guessit\transfo\__init__.py", line 80, in find_and_split_node
    matcher_result = matcher(*all_args)
  File ".\guessit\transfo\guess_language.py", line 37, in guess_language
    guess = search_language(string)
  File ".\guessit\language.py", line 366, in search_language
    if language != 'mul' and not hasattr(language, 'alpha2'):
  File ".\guessit\language.py", line 221, in alpha2
    return self.lang.alpha2
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 130, in __getattr__
    return get_language_converter(name).convert(alpha3, country, script)
  File "D:\devel\workspace\babelfish\babelfish\converters\__init__.py", line 158, in convert
    raise LanguageConvertError(alpha3, country, script)
babelfish.exceptions.LanguageConvertError: xct

IETF region support

Is there any plan for babelfish to support IETF regions?

https://en.wikipedia.org/wiki/IETF_language_tag

An optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;

The main point here is to be able to handle es-419 (Spanish (Latin America)) which is a very common way to classify media tracks (in movies or series) for these Spanish languages

Immutability issues

Currently babelfish implements the hash function to override default behavior and make it possible for objects to be keys in dictionaries and other useful features.
While this is a good thing, the way it is implemented makes it prone to some weird errors as explained in this lyft blog post:

>>> import babelfish
>>> fr = babelfish.Language("fra")
>>> fr_fr = babelfish.Language("fra", "FR")
>>> s = set([fr])
>>> fr in s
True
>>> fr_fr in s
False

All that is great, but if we modify the objects, things get weird because python expect the result of hash not to change:

>>> fr.country = babelfish.Country("FR")
>>> fr
<Language [fr-FR]>
>>> fr in s
False
>>> list(s)[0]
<Language [fr-FR]>
>>> fr_fr in s
False

I want to have true immutability of babelfish objects by making use of tuples (and derivatives) or at least faking it maybe with dataclasses frozen options.
This will surely be a breaking change.

Language.fromname('Greek') fails with LanguageReverseError

Language.fromname('Greek') fails. While 'Greek' by itself is not listed specifically, without qualifiers it would generally be assumed to refer to modern Greek. The current way to get the language from the name would be using the string 'Modern Greek (1453-)' which is fairly cumbersome and counterintuitive.

Steps to recreate:

from babelfish import Language
Language.fromname('Greek')

Tested on Win 10 / Python 3.5 x64

diaoul / babelfish Goto Github PK

babelfish's Introduction

Hi there 👋

babelfish's People

Contributors

Stargazers

Watchers

Forkers

babelfish's Issues

Recommend Projects

Recommend Topics

Recommend Org