Git Product home page Git Product logo

babelfish's Introduction

Hi there 👋

Somehow you've landed here 🤔

I am probably building something or learning some stuff. Feel free to look around, check my contributions below, website and LinkedIn profile.

babelfish's People

Contributors

celestianx avatar diaoul avatar dimotsai avatar hugovk avatar mgorny avatar ratoaq2 avatar toilal avatar wackou avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

babelfish's Issues

ImportError: No module named alpha2 when calling babelfish.Language.fromietf('EN')

Using you're subliminal backend, I construct the languages (with babelfish) ahead of time. But I had one report from a user using Python 2.7.3 who gets an import error (of alpha2) by the time it gets to babelfish's sub libraries.

I don't expect you to support my script at all... I was just curious on how to interpret the last part of the exception being thrown. Maybe it's a bug with babel fish? Maybe it isn't? Have you seen something like that? Perhaps an obvious issue I'm overlooking or a syntax error on my part?

This syntax works for me, so i can't reproduce it

import babelfish
babelfish.Language.fromietf('EN')

Here is the output passed along to me in it's entirety:

./Subliminal.py -S /media/data/Movies/ -l EN -f
2015-10-09 09:26:55,195 - 7241 - INFO - Found 43 matched file(s).
2015-10-09 09:26:55,196 - 7241 - INFO - Using advanced search mode
2015-10-09 09:26:55,201 - 7241 - ERROR - Fatal Exception:
  Traceback (most recent call last):
    File "./Subliminal/nzbget/ScriptBase.py", line 2398, in run
    exit_code = main_function(*args, **kwargs)
    File "./Subliminal.py", line 1478, in main
    use_nzbheaders=False,
    File "./Subliminal.py", line 955, in subliminal_fetch
    lang = set( babelfish.Language.fromietf(l) for l in lang )
    File "./Subliminal.py", line 955, in <genexpr>
    lang = set( babelfish.Language.fromietf(l) for l in lang )
    File "./Subliminal/babelfish/language.py", line 124, in fromietf
    language = cls.fromalpha2(language_subtag)
    File "./Subliminal/babelfish/language.py", line 110, in fromcode
    return cls(*language_converters[converter].reverse(code))
    File "./Subliminal/babelfish/converters/__init__.py", line 250, in __getitem__
    plugin = ep.load(require=False)
    File "./Subliminal/pkg_resources.py", line 1948, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  ImportError: No module named alpha2

Thoughts?

Add new converters which uses country demonyms to convert languages

I'm looking for a way to better identify languages in media tracks (mainly audio and subtitle tracks). Usually default tags from media tracks are not precise. Very rarely you get an audio or subtitle track with the correct IETF tag for pt-BR or es-MX or other languages. 99% of the time they are just marked as pt or es and it's very common to have 2 or more tracks with the same language code:

        {
            "codec": "SubRip/SRT",
            "id": 19,
            "properties": {
                "codec_id": "S_TEXT/UTF8",
                "codec_private_length": 0,
                "default_track": false,
                "enabled_track": true,
                "encoding": "UTF-8",
                "forced_track": false,
                "language": "por",
                "language_ietf": "pt",
                "number": 20,
                "text_subtitles": true,
                "track_name": "Português",
                "uid": 1602227994484803173
            },
            "type": "subtitles"
        },
        {
            "codec": "SubRip/SRT",
            "id": 20,
            "properties": {
                "codec_id": "S_TEXT/UTF8",
                "codec_private_length": 0,
                "default_track": false,
                "enabled_track": true,
                "encoding": "UTF-8",
                "forced_track": false,
                "language": "por",
                "language_ietf": "pt",
                "number": 21,
                "text_subtitles": true,
                "track_name": "Português (Brasil)",
                "uid": 17784914655403220205
            },
            "type": "subtitles"
        },

In order to solve this, most likely an approach like guessit is needed. While analysing a large dataset from audio tracks and subtitle tracks, part of them use the official language name in english with the country demonym:

Brazilian Portuguese
British English
American English
French Canadian

I know babelfish is a very concise library that does one thing and it does it well. And to solve this issue I'll need to create extensions (language and country converters) that are outside babelfish scope.

But this little piece related to country demonyms seems a nice feature to be included in babelfish. Maybe something like this:

>>> import babelfish
>>> babelfish.Country.fromname('France')
<Country [FR]>
>>> babelfish.Country.fromdemonym('French')
<Country [FR]>
>>> import babelfish
>>> babelfish.Language.fromname('Portuguese')
<Language [pt]>
>>> babelfish.Language.fromname('Brazilian Portuguese')
<Language [pt-BR]>
>>> babelfish.Language.fromname('Swiss German')
<Language [de-CH]>

I believe babelfish could have at least the demonyms in English and use that to parse the language.
I could try to contribute with this part if you think it makes sense to be part of babelfish.

Some references:
https://en.wikipedia.org/wiki/List_of_adjectival_and_demonymic_forms_for_countries_and_nations
https://github.com/porimol/countryinfo#demonym
https://gist.github.com/consti/e2c7ddc64f0aa044a8b4fcd28dba0700
https://github.com/mledoze/countries/blob/master/countries.json

newbie question

I can't get modules from babelfish to do anything. From python see:

from nltk.misc import babelfish #this part works, did pip install bablefish

print babelfish.translate('cookbook', 'english', 'spanish')


AttributeError Traceback (most recent call last)
in ()
----> 1 print babelfish.translate('cookbook', 'english', 'spanish')

AttributeError: 'module' object has no attribute 'translate'

import inspect
inspect.getmembers(babelfish, predicate=inspect.ismethod)
[] # no modules loaded from babelfish.

Installation problem

Hi Diaoul I'm trying to play with babelfish code for future import on sb.
Trying the code in docs and I have this problem:

language = babelfish.Language('por', 'BR')
language return:

Traceback (most recent call last):
File "<pyshell#7>", line 1, in
language
File "C:\Python27\lib\site-packages\babelfish\language.py", line 150, in repr
return '<Language [%s]>' % self
File "C:\Python27\lib\site-packages\babelfish\language.py", line 154, in str
s = self.alpha2
File "C:\Python27\lib\site-packages\babelfish\language.py", line 132, in getattr
raise AttributeError(name)
AttributeError: alpha2

I've installed babelfish without using setup but copying babelfish directory in my python site-packaged directory....I know this is not the best way to install but in this way I can test if the library can work without problem in sickbeard.

thanks.

Mr_Orange.

Partial IETF Support

Partial IETF representation of a Language should be possible: fr-FR, en-US, de
It is currently used in __str__ with alpha3

Once #3 is implemented, the script subtag can be represented aswell: be-Cyrl, fr-Latn

PyPi release outdated

With Python 3.10 on the horizon and #29 implemented to fix the issue of importing from collections instead of from collections.abc PyPi could use a new release that reflects this fix.

babelfish.exceptions.ReverseError: u'fra'

(sub)hadim boromir ~ $ subliminal /media/hadim/MediaHadi2/movies/Un.Prophete.2009/ -l fr -v
INFO:subliminal.video:Scanning directory u'/media/hadim/MediaHadi2/movies/Un.Prophete.2009/'
INFO:subliminal.video:Scanning video u'Un.Prophete.2009.mkv' in u'/media/hadim/MediaHadi2/movies/Un.Prophete.2009'
INFO:enzyme.mkv:Reading Segment element
INFO:enzyme.parsers.ebml.core:MasterElement EBML ignored
INFO:enzyme.parsers.ebml.core:Maximum level 0 reached for children of MasterElement Segment
INFO:enzyme.mkv:Reading SeekHead element
INFO:enzyme.mkv:Processing element Info from SeekHead at position 4410
INFO:enzyme.mkv:Processing element Tracks from SeekHead at position 4485
Traceback (most recent call last):
  File "/home/hadim/local/virtualenvs/sub/bin/subliminal", line 9, in <module>
    load_entry_point('subliminal==0.7.1', 'console_scripts', 'subliminal')()
  File "build/bdist.linux-x86_64/egg/subliminal/cli.py", line 70, in subliminal
  File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 279, in scan_videos
  File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 219, in scan_video
  File "build/bdist.linux-x86_64/egg/subliminal/video.py", line 219, in <setcomp>
  File "/home/hadim/local/virtualenvs/sub/local/lib/python2.7/site-packages/babelfish/language.py", line 51, in fromcode
    return cls(*CONVERTERS[converter].reverse(code))
  File "/home/hadim/local/virtualenvs/sub/local/lib/python2.7/site-packages/babelfish/converters/alpha3b.py", line 34, in reverse
    raise ReverseError(alpha3b)
babelfish.exceptions.ReverseError: u'fra'

Shouldn't you catch systematic error ? I guess mkv have a lot of possible code for language... You should only display an error.

Another newbie question

I just installed SCALe on a Ubuntu 17.10 VM. I think. I say that because when I do:
cd $SCALE_HOME/scale.app
bundle exec thin start --port 8080
on the VM and went to http://127.0.0.1 on the same VM (on firefox) I get a prompt that says:
----->http://127.0.0.1 is requesting your username and password. The site says: "Application"

I can't tell if this is coming from SCALe or something else. If it is from SCALe, what username and password should I give this prompt.

Python 3.4 and inspect

Calling some inspect functions with Python 3.4 on babelfish objects crash (maximum recursion depth exceeded). see guessit-io/guessit#109

>>> from babelfish.language import Language
>>> from inspect import ismethoddescriptor
>>> lang = Language("fra")
>>> ismethoddescriptor(lang)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\x64\Python34\Lib\inspect.py", line 113, in ismethoddescriptor
    return hasattr(tp, "__get__") and not hasattr(tp, "__set__")
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
[....]
RuntimeError: maximum recursion depth exceeded while calling a Python object

Import error and weirdness

There are some tricks with babelfish currently.
I'm not sure it is the right thing to do to load the entry points during import.

Workflow:

  • import babelfish loads entry point subliminal.converter.addic7ed:Addic7edConverter
  • Loading entry point subliminal.converter.addic7ed:Addic7edConverter triggers import subliminal
  • import subliminal triggers import babelfish if there is this statement in subliminal/__init__.py

Boom.

Should we use lazy loading?

Python 3.4 and inspect

Calling some inspect functions with Python 3.4 on babelfish objects crash with stack overflow. see guessit-io/guessit#109

>>> from babelfish.language import Language
>>> from inspect import ismethoddescriptor
>>> lang = Language("fra")
>>> ismethoddescriptor(lang)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "D:\Python\x64\Python34\Lib\inspect.py", line 113, in ismethoddescriptor
    return hasattr(tp, "__get__") and not hasattr(tp, "__set__")
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 55, in __getattr__
    return getattr(cls, name)

Python3 Converter Exception

I have an error in guessit with Python3 (3.3.3 x86 on windows). It runs without any problem on Python 2.7.

For: Movies/Persepolis (2007)/[XCT] Persepolis [H264+Aac-128(Fr-Eng)+ST(Fr-Eng)+Ind].mkv
Traceback (most recent call last):
  File "D:\devel\workspace\babelfish\babelfish\converters\__init__.py", line 156, in convert
    return self.to_symbol[alpha3]
KeyError: 'xct'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Python\x86\Python33\Lib\runpy.py", line 160, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "D:\Python\x86\Python33\Lib\runpy.py", line 73, in _run_code
    exec(code, run_globals)
  File ".\guessit\__main__.py", line 160, in <module>
    main()
  File ".\guessit\__main__.py", line 154, in main
    advanced=options.advanced)
  File ".\guessit\__main__.py", line 37, in detect_filename
    print('GuessIt found:', guess_file_info(filename, filetype, info).nice_string(advanced))
  File ".\guessit\__init__.py", line 134, in guess_file_info
    result.append(_guess_filename(filename, filetype))
  File ".\guessit\__init__.py", line 105, in _guess_filename
    mtree = IterativeMatcher(filename, filetype=filetype)
  File ".\guessit\matcher.py", line 120, in __init__
    self._apply_transfo(transformer)
  File ".\guessit\matcher.py", line 132, in _apply_transfo
    transformer.process(self.match_tree, *all_args, **all_kwargs)
  File ".\guessit\transfo\guess_language.py", line 118, in process
    SingleNodeGuesser(self.guess_language, None, self.log, *args, **kwargs).process(mtree)
  File ".\guessit\transfo\__init__.py", line 151, in process
    find_and_split_node(node, strategy, self.skip_nodes, self.logger)
  File ".\guessit\transfo\__init__.py", line 80, in find_and_split_node
    matcher_result = matcher(*all_args)
  File ".\guessit\transfo\guess_language.py", line 37, in guess_language
    guess = search_language(string)
  File ".\guessit\language.py", line 366, in search_language
    if language != 'mul' and not hasattr(language, 'alpha2'):
  File ".\guessit\language.py", line 221, in alpha2
    return self.lang.alpha2
  File "D:\devel\workspace\babelfish\babelfish\language.py", line 130, in __getattr__
    return get_language_converter(name).convert(alpha3, country, script)
  File "D:\devel\workspace\babelfish\babelfish\converters\__init__.py", line 158, in convert
    raise LanguageConvertError(alpha3, country, script)
babelfish.exceptions.LanguageConvertError: xct

IETF region support

Is there any plan for babelfish to support IETF regions?

https://en.wikipedia.org/wiki/IETF_language_tag

An optional region subtag based on a two-letter country code from ISO 3166-1 alpha-2 (usually written in upper case), or a three-digit code from UN M.49 for geographical regions;

The main point here is to be able to handle es-419 (Spanish (Latin America)) which is a very common way to classify media tracks (in movies or series) for these Spanish languages

Immutability issues

Currently babelfish implements the hash function to override default behavior and make it possible for objects to be keys in dictionaries and other useful features.
While this is a good thing, the way it is implemented makes it prone to some weird errors as explained in this lyft blog post:

>>> import babelfish
>>> fr = babelfish.Language("fra")
>>> fr_fr = babelfish.Language("fra", "FR")
>>> s = set([fr])
>>> fr in s
True
>>> fr_fr in s
False

All that is great, but if we modify the objects, things get weird because python expect the result of hash not to change:

>>> fr.country = babelfish.Country("FR")
>>> fr
<Language [fr-FR]>
>>> fr in s
False
>>> list(s)[0]
<Language [fr-FR]>
>>> fr_fr in s
False

I want to have true immutability of babelfish objects by making use of tuples (and derivatives) or at least faking it maybe with dataclasses frozen options.
This will surely be a breaking change.

Language.fromname('Greek') fails with LanguageReverseError

Language.fromname('Greek') fails. While 'Greek' by itself is not listed specifically, without qualifiers it would generally be assumed to refer to modern Greek. The current way to get the language from the name would be using the string 'Modern Greek (1453-)' which is fairly cumbersome and counterintuitive.

Steps to recreate:

from babelfish import Language
Language.fromname('Greek')

Tested on Win 10 / Python 3.5 x64

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.