Git Product home page Git Product logo

trans's Introduction

The trans module

This module translates national characters into similar sounding latin characters (transliteration). At the moment, Czech, Greek, Latvian, Polish, Turkish, Russian, Ukrainian, Kazakh and Farsi alphabets are supported (it covers 99% of needs).

Python 3:

>>> from trans import trans
>>> trans('Привет, Мир!')

Python 2:

>>> import trans
>>> u'Привет, Мир!'.encode('trans')
u'Privet, Mir!'
>>> trans.trans(u'Привет, Мир!')
u'Privet, Mir!'
>>> 'Hello World!'.encode('trans')
Traceback (most recent call last):
    ...
TypeError: trans codec support only unicode string, <type 'str'> given.
>>> s = u'''\
...    -- Раскудрить твою через коромысло в бога душу мать
...             триста тысяч раз едрену вошь тебе в крыло
...             и кактус в глотку! -- взревел разъяренный Никодим.
...    -- Аминь, -- робко добавил из склепа папа Пий.
...                 (c) Г. Л. Олди, "Сказки дедушки вампира".'''
>>>
>>> print s.encode('trans')
   -- Raskudrit tvoyu cherez koromyslo v boga dushu mat
            trista tysyach raz edrenu vosh tebe v krylo
            i kaktus v glotku! -- vzrevel razyarennyy Nikodim.
   -- Amin, -- robko dobavil iz sklepa papa Piy.
                (c) G. L. Oldi, "Skazki dedushki vampira".

Use the table "slug", leaving only the Latin characters, digits and underscores:

>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans')
1 2 3 4 5
6 7 8 9 0
>>> print u'1 2 3 4 5 \n6 7 8 9 0'.encode('trans/slug')
1_2_3_4_5__6_7_8_9_0
>>> s.encode('trans/slug')[-42:-1]
u'_c__G__L__Oldi___Skazki_dedushki_vampira_'

Table id is deprecated and renamed to slug. Old name also available, but not recommended.

>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
Traceback (most recent call last):
    ...
ValueError: Table "my" not found in tables!
>>> trans.tables['my'] = {u'1': u'A', u'2': u'B'};
>>> u'1 2 3 4 5 6 7 8 9 0'.encode('trans/my')
u'A_B________________'
>>>

Table can consist of two parts - the map of diphthongs and the map of characters. Diphthongs are processed first by simple replacement in the substring. Then each character of the received string is replaced according to the map of characters. If character is absent in the map of characters, key None are checked. If key None is not present, the default character u'_' is used.

>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-',
...               u'A': u'A', u'B': u'B'}  # See below...
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'AAzyxBBxyz--'

The characters are created by processing of diphthongs also processed by the map of the symbols:

>>> diphthongs = {u'11': u'AA', u'22': u'BB'}
>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['test'] = (diphthongs, characters)
>>> u'11abc22cbaCC'.encode('trans/test')
u'--zyx--xyz--'

These two tables are equivalent:

>>> characters = {u'a': u'z', u'b': u'y', u'c': u'x', None: u'-'}
>>> trans.tables['t1'] = characters
>>> trans.tables['t2'] = ({}, characters)
>>> u'11abc22cbaCC'.encode('trans/t1') == u'11abc22cbaCC'.encode('trans/t2')
True

2.1 2016-09-19

  • Add Farsi alphabet (thx rodgar-nvkz)
  • Use pytest
  • Some code style refactoring

2.0 2013-04-01

  • Python 3 support
  • class Trans for create different tables spaces

1.5 2012-09-12

  • Add support of kazakh alphabet.

1.4 2011-11-29

  • Change license to BSD.

1.3 2010-05-18

  • Table "id" renamed to "slug". Old name also available.
  • Some speed optimizations (thx to AndyLegkiy <andy.legkiy at gmail.com>).

1.2 2010-01-10

  • First public release.
  • Translate documentation to English.

trans's People

Contributors

crucl0 avatar endeveit avatar rodgar-nvkz avatar zzzsochi avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

trans's Issues

SyntaxError in "trans.py" in python3

$ pip install trans
Collecting trans
Using cached trans-2.0.1.tar.bz2
Complete output from command python setup.py egg_info:
Traceback (most recent call last):
File "", line 20, in
File "/tmp/pip-build-hb_r_f/trans/setup.py", line 5, in
import trans
File "trans.py", line 24
"""
^
SyntaxError: invalid syntax

----------------------------------------

Command "python setup.py egg_info" failed with error code 1 in /tmp/pip-build-hb_r_f/trans

error -> u'

конфликт буквы И

Привет, Александр!

Хороший у тебя модуль, спасибо, но обнаружился 1 вопрос, связанный с транслитерацией с украинского.

Штука в том, что транслитерация буквы "И" с русского должна быть "I", а транслитерация буквы "И" с украинского должна быть "Y" и не иначе:
Буква И в украинском ну очень часто встречается и звучить на русском как Ы

Во 1-х русский словарь уже не получится использовать так, как сейчас, а можно что-то вроде

ukrainian_dict = {
u'Є': u'E', u'І': u'I', u'Ї': u'I', u'Ґ': u'G', u'И': u'Y',
u'є': u'e', u'і': u'i', u'ї': u'i', u'ґ': u'g', u'и': u'y'
}
ukrainian = russian
ukrainian[1].update(ukrainian_dict)

Во 2-х, в зависимости от того, с русского транслитерация или с украинского, можно , например поменять местами
for t in [latin, russian, ukrainian]:
или
for t in [latin, ukrainian, russian ]:

Но учитывая, что у тебя нет указания языка оригинала, на взмах не знаю, как это лучше реализовать.
И всё из-за 1 буквы ;), хотя я подозреваю, что может всплыть что-то еще.

Спасибо и успехов

aiia

соответствие стандартам

Для транслитерации кириллического алфавита в латинский придумано несколько стандартов (http://ru.wikipedia.org/wiki/%D0%A2%D1%80%D0%B0%D0%BD%D1%81%D0%BB%D0%B8%D1%82%D0%B5%D1%80%D0%B0%D1%86%D0%B8%D1%8F_%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%BE%D0%B3%D0%BE_%D0%B0%D0%BB%D1%84%D0%B0%D0%B2%D0%B8%D1%82%D0%B0_%D0%BB%D0%B0%D1%82%D0%B8%D0%BD%D0%B8%D1%86%D0%B5%D0%B9). Александр, каким именно стандартам соответствует ваш модуль? Мне больше всего нравится стандарт ISO-9 (таблица Б), потому как он позволяет транслитерировать слова в обе стороны без потерь (http://ru.wikipedia.org/wiki/ISO_9)

Polish: Ó transliterates to small letter o

Hi,

In the Polish transliteration the letter "Ó" is transliterated to a small letter "o".

This is the case with version 2.0 and also at least with 1.4.2.

Regards,
Johannes

CyrTranslit

input a .txt and execute it through a command "CyrTranslit -i text.txt"

License clarification?

Hello,

I've seen it mentioned in the changelog that this is "BSD" licensed but there are 4 BSD licenses and the ISCL which is equivalient to BSD 2 Clause. I'm trying to label it correctly in FreeBSD ports.

Thanks!

setup.py wants non-existent README.rst

$ virtualenv v2
New python executable in v2/bin/python
Installing setuptools............done.
Installing pip...............done.

$ v2/bin/pip install trans
Downloading/unpacking trans
Downloading trans-1.4.1.tar.gz
Running setup.py egg_info for package trans
Traceback (most recent call last):
File "", line 14, in
File "/home/temoto/project/py-avia/v2/build/trans/setup.py", line 11, in
long_description = open('README.rst', 'rb').read()
IOError: [Errno 2] No such file or directory: 'README.rst'
Complete output from command python setup.py egg_info:
Traceback (most recent call last):

File "", line 14, in

File "/home/temoto/project/py-avia/v2/build/trans/setup.py", line 11, in

long_description = open('README.rst', 'rb').read()

IOError: [Errno 2] No such file or directory: 'README.rst'


Command python setup.py egg_info failed with error code 1
Storing complete log in /home/temoto/.pip/pip.log

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.