Git Product home page Git Product logo

iuliia-py's Introduction

Iuliia

Transliterate Cyrillic → Latin in every possible way

PyPI Version Build Status Code Coverage Code Quality

Transliteration means representing Cyrillic data (mainly names and geographic locations) with Latin letters. It is used for international passports, visas, green cards, driving licenses, mail and goods delivery etc.

Iuliia makes transliteration as easy as:

>>> import iuliia
>>> iuliia.translate("Юлия Щеглова", schema=iuliia.WIKIPEDIA)
'Yuliya Shcheglova'

Why use Iuliia

  • 20 transliteration schemas (rule sets), including all main international and Russian standards.
  • Correctly implements not only the base mapping, but all the special rules for letter combinations and word endings (AFAIK, Iuliia is the only library which does so).
  • Simple API and zero third-party dependencies.

For schema details and other information, see iuliia.ru (in Russian).

Issues and limitations

Installation

pip install iuliia

Usage

List all supported schemas:

>>> import iuliia
>>> import iuliia
>>> for name, schema in iuliia.Schemas.items():
...     print("{0:<20}{1}".format(name, schema.description))
...
ala_lc              ALA-LC transliteration schema.
ala_lc_alt          ALA-LC transliteration schema.
bgn_pcgn            BGN/PCGN transliteration schema
bgn_pcgn_alt        BGN/PCGN transliteration schema
bs_2979             British Standard 2979:1958 transliteration schema
bs_2979_alt         British Standard 2979:1958 transliteration schema
gost_16876          GOST 16876-71 (aka GOST 1983) transliteration schema
gost_16876_alt      GOST 16876-71 (aka GOST 1983) transliteration schema
gost_52290          GOST R 52290-2004 transliteration schema
gost_52535          GOST R 52535.1-2006 transliteration schema
gost_7034           GOST R 7.0.34-2014 transliteration schema
gost_779            GOST 7.79-2000 (aka ISO 9:1995) transliteration schema
gost_779_alt        GOST 7.79-2000 (aka ISO 9:1995) transliteration schema
icao_doc_9303       ICAO DOC 9303 transliteration schema
iso_9_1954          ISO/R 9:1954 transliteration schema
iso_9_1968          ISO/R 9:1968 transliteration schema
iso_9_1968_alt      ISO/R 9:1968 transliteration schema
mosmetro            Moscow Metro map transliteration schema
mvd_310             MVD 310-1997 transliteration schema
mvd_310_fr          MVD 310-1997 transliteration schema
mvd_782             MVD 782-2000 transliteration schema
scientific          Scientific transliteration schema
telegram            Telegram transliteration schema
ungegn_1987         UNGEGN 1987 V/18 transliteration schema
wikipedia           Wikipedia transliteration schema
yandex_maps         Yandex.Maps transliteration schema
yandex_money        Yandex.Money transliteration schema

Transliterate using specified schema:

>>> source = "Юлия Щеглова"
>>> iuliia.translate(source, schema=iuliia.ICAO_DOC_9303)
'Iuliia Shcheglova'

Or pick schema by name

>>> schema = iuliia.Schemas.get("wikipedia")
>>> iuliia.translate(source, schema)
'Yuliya Shcheglova'

Command line:

$ iuliia icao_doc_9303 "Юлия Щеглова"
Iuliia Shcheglova

Development setup

$ python3 -m venv env
$ . env/bin/activate
$ make deps schemas
$ tox

Development tasks:

$ make help
Usage: make [task]

task                 help
------               ----
changelog            Generate changelog
coverage             Run tests with coverage
deps                 Install dependencies
lint                 Lint and static-check code
pull                 Pull code and schemas
push                 Push commits and tags
schemas              Update schemas
test                 Run tests
help                 Show help message

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Make sure to add or update tests as appropriate.

Use Black for code formatting and Conventional Commits for commit messages.

License

MIT

iuliia-py's People

Contributors

nalgeon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

iuliia-py's Issues

split() requires a non-empty pattern match

Приветствую.
Выполняю:

#!/usr/bin/python3

#import tensorflow as tf
#print(tf.__version__)
import iuliia

source = "Юлия Щеглова"
iuliia.translate(source, schema=iuliia.MOSMETRO)

Результат:

Traceback (most recent call last):
File "./test.py", line 8, in
iuliia.translate(source, schema=iuliia.MOSMETRO)
File "/usr/local/lib/python3.6/site-packages/iuliia/engine.py", line 17, in translate
translated = (_translate_word(word, schema) for word in _split_sentence(source))
File "/usr/local/lib/python3.6/site-packages/iuliia/engine.py", line 22, in _split_sentence
return (word for word in SPLITTER.split(source) if word)
ValueError: split() requires a non-empty pattern match.

Python 3.6.8

major changes

Я набросал альтернативку, она совсем мелкая. Гляньте, может что пригодится.
python >= 3.8

import json, re
from itertools import chain
from functools import partial


def factory(path):
    def translate_word(m):
        word = w = (Word := m.group(0)).lower()
        if ending := len(word) > 2 and ending_mapping(word[-2:]):
            w = w[:-2]
        it, buf = chain(w, ('',)), []
        a, b = '', next(it)
        for c in it:
            buf.append(prev_mapping(a + b) or next_mapping(b + c) or mapping(b, b))
            a, b = b, c
        if ending:
            buf.append(ending)
        w = ''.join(buf)
        if word == Word:
            return w
        return w.capitalize() if len(Word) == 1 or Word[-1].islower() else w.upper()

    with open(path, encoding='utf-8') as fi:
        data = json.load(fi)
    mapping = data['mapping'].get
    prev_mapping = (data['prev_mapping'] or {}).get
    next_mapping = (data['next_mapping'] or {}).get
    ending_mapping = (data['ending_mapping'] or {}).get
    return partial(re.compile(r'\w+').sub, translate_word)


f = factory('mosmetro.json')
print(f('Юлия, съешь ещё этих мягких французских булок из Йошкар-Олы, да выпей алтайского чаю'))

Ошибка при импорте

При попытке импортировать пакет прилетает исключение AttributeError: ala_lc.

>>> import iuliia
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/tmp/gittmp_gleb/tmp.yZ1EPjt5eG_105510_/iuliia/__init__.py", line 13, in <module>
    ALA_LC = Schemas.ala_lc.value  # type: ignore
  File "/usr/lib/python3.8/enum.py", line 341, in __getattr__
    raise AttributeError(name) from None
AttributeError: ala_lc
$ python --version
Python 3.8.3
$ uname -a
Linux GlebPC 5.7.2-arch1-1 #1 SMP PREEMPT Wed, 10 Jun 2020 20:36:24 +0000 x86_64 GNU/Linux

wrong transliteration

Я попробовал схему WIKIPEDIA и MOSMETRO, и получил такой результат:
маленький >> malenkiĭ
бесплатный >> besplatnyĭ

По документации же должно быть на конце "y", а не "ĭ"

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.