Git Product home page Git Product logo

turkish-deasciifier's Introduction

turkish-deasciifier: Turkish deasciifier

This is a deasciifier Python library and command line utility for Turkish that solves the problem of diacritics restoration (also known as diacritics reconstruction). It takes a Turkish string containing only ASCII characters (that is, without proper diacritics) and replaces the relevant characters with their corresponding Turkish letters.

The web-based, online version of this system is available at:

http://turkceyap.appspot.com/

Keep in mind that diacritics restoration (deasciification) for Turkish doesn't work 100% of the time; it is an active research topic! Still, this library is good enough for many practical purposes, and served many people and projects in the last 10 years.

This system is based on the turkish-mode for GNU Emacs by Prof. Deniz Yüret.

Table of Contents

  1. Installation
  2. Example Python Library Usage
  3. Example CLI (Command Line Interface) Usage
  4. Other Programming Languages and Systems
  5. Advanced Research

Installation

Python 3

For now, the recommended way to install is to use pip and install direcly from the project's GitHub repository:

pip install git+https://github.com/emres/turkish-deasciifier.git

Python 2

Keep in mind that switching to Python 3 is strongly recommended! If you insist on using Python 2.x, you can install using the following command:

pip install Turkish-Deasciifier

Example Python Library Usage

Python 3

from turkish.deasciifier import Deasciifier

my_ascii_turkish_txt = "Opusmegi cagristiran catirtilar."
deasciifier = Deasciifier(my_ascii_turkish_txt)
my_deasciified_turkish_txt = deasciifier.convert_to_turkish()
print(my_deasciified_turkish_txt)

Python 2

Keep in mind that switching to Python 3 is strongly recommended! If you insist on using Python 2.x, you can use the library in the following manner:

from turkish.deasciifier import Deasciifier

my_ascii_turkish_txt = "Opusmegi cagristiran catirtilar."
deasciifier = Deasciifier(my_ascii_turkish_txt.decode("utf-8"))
my_deasciified_turkish_txt = deasciifier.convert_to_turkish()
print my_deasciified_turkish_txt.encode("utf-8")

Example CLI (Command Line Interface) Usage

Python 3

Example tested in a Bash shell:

$ echo "Opusmegi cagristiran catirtilar." | turkish-deasciify
$ cat somefile.txt | turkish-deasciify

Python 2

Keep in mind that switching to Python 3 is strongly recommended!

Example tested in a Bash shell:

$ echo "Opusmegi cagristiran catirtilar." | turkish-deasciify-python2
$ cat somefile.txt | turkish-deasciify-python2

Other Programming Languages and Systems

Advanced Research

For recent advanced scientific research articles, please see the following:

turkish-deasciifier's People

Contributors

emres avatar roktas avatar faraday avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.