Git Product home page Git Product logo

libtashkeel's Introduction

Libtashkeel

Libtashkeelis a cross-platform library for diacritic restoration of Arabic text.

Libtashkeel is written in Rust, and provides both a standalone linkable library and a command line tool.

The library uses models trained mainly on MSA data, from Hareef.

Getting Libtashkeel

You need to build the project yourself, see the Building section for a step-by-step guide.

Usage

Using the library

To use Libtashkeel from your C/C++ project, just include libtashkeel.h and you are good to go.

The API consists of a single entry point for diacritizing a **utf-8 ** encoded string. Please take a look at ffi_usage_example.py for sample usage.

From Python

Python bindings are also provided.

After building the wheels (see the Building section), install the wheel using pip:

pip install ./target/wheels/pylibtashkeel*.whl

and then:

>>> from pylibtashkeel import tashkeel
>>> tashkeel("إن روعة اللغة العربية لا تتبدى إلا لعشاقها")
'إِنَّ رَوْعَةَ اللُّغَةِ الْعَرَبِيَّةِ لَا تَتَبَدَّى إِلَّا لِعُشَّاقِهَا'

Command line tool

Libtashkeel provides a standalone executable called tashkeel for diacritizing text from the command line.

$ tashkeel --help
Arabic-text diacritic restoration using neural networks

Usage: tashkeel [OPTIONS]

Options:
  -f, --input-file <INPUT_FILE>    Input file (default `stdin`)
  -o, --output-file <OUTPUT_FILE>  Output file (default `stdout`)
  -i, --interactive                Use interactive mode (useful for testing)
  -t, --taskeen                    Use sukoon for case-ending diacritic if the model is uncertain
  -p, --prob <PROB>                Taskeen threshold probability [default: 0.95]
  -x, --onnx <ONNX_MODEL>          ONNX model (default: use bundled model if available)
  -h, --help                       Print help
  -V, --version                    Print version

Building

Libtashkeel is written in Rust, you need to install Rust first

To build the linkable library libtashkeel, and the command line tool tashkeel, run the following command from the root of the repository:

$ cargo build --release

Then, the built library and executable is found under target directory.

To build Python bindings as a wheel, you need to install maturin

Run the following to build the wheel:

$ cd pylibtashkeel
$ python3 -m venv .venv
$ source .venv/bin/activate
$ pip install maturin
$ maturin build --release --strip -i .venv/bin/python

Then, the built wheel is found under target/wheels directory.

Licence

Copyright (c) Musharraf Omer. This project is licenced under the terms of The MIT License

libtashkeel's People

Contributors

kentoseth avatar mush42 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.