Git Product home page Git Product logo

python-arabic-reshaper's Introduction

Python Arabic Reshaper

Build Status

Reconstruct Arabic sentences to be used in applications that don't support Arabic script.

Works with Python 3.x

Description

Arabic script is very special with two essential features:

  1. It is written from right to left.
  2. The characters change shape according to their surrounding characters.

So when you try to print text written in Arabic script in an application – or a library – that doesn’t support Arabic you’re pretty likely to end up with something that looks like this:

Arabic text written from left to right with no reshaping

We have two problems here, first, the characters are in the isolated form, which means that every character is rendered regardless of its surroundings, and second is that the text is written from left to right.

To solve the latter issue all we have to do is to use the Unicode bidirectional algorithm, which is implemented purely in Python in python-bidi. If you use it you’ll end up with something that looks like this:

Arabic text written from right to left with no reshaping

The only issue left to solve is to reshape those characters and replace them with their correct shapes according to their surroundings. Using this library helps with the reshaping so we can get the proper result like this:

Arabic text written from right to left with reshaping

Installation

pip install --upgrade arabic-reshaper

If you're using Anaconda you can use

conda install -c mpcabd arabic-reshaper

Usage

import arabic_reshaper

text_to_be_reshaped = 'اللغة العربية رائعة'
reshaped_text = arabic_reshaper.reshape(text_to_be_reshaped)

Example using PIL Image

PIL Image does not support reshaping out of the box, so to draw Arabic text on an Image instance you would need to reshape the text for sure.

For this example to work you need to run pip install --upgrade arabic-reshaper python-bidi pillow

import arabic_reshaper

text_to_be_reshaped = 'اللغة العربية رائعة'
reshaped_text = arabic_reshaper.reshape(text_to_be_reshaped)

# At this stage the text is reshaped, all letters are in their correct form
# based on their surroundings, but if you are going to print the text in a
# left-to-right context, which usually happens in libraries/apps that do not
# support Arabic and/or right-to-left text rendering, then you need to use
# get_display from python-bidi.
# Note that this is optional and depends on your usage of the reshaped text.

from bidi.algorithm import get_display
bidi_text = get_display(reshaped_text)

# At this stage the text in bidi_text can be easily rendered in any library
# that doesn't support Arabic and/or right-to-left, so use it as you'd use
# any other string. For example if you're using PIL.ImageDraw.text to draw
# text over an image you'd just use it like this...

from PIL import Image, ImageDraw, ImageFont

# We load Arial since it's a well known font that supports Arabic Unicode
font = ImageFont.truetype('Arial', 40)

image = Image.new('RGBA', (800, 600), (255,255,255,0))
image_draw = ImageDraw.Draw(image)
image_draw.text((10,10), bidi_text, fill=(255,255,255,128), font=font)

# Now the text is rendered properly on the image, you can save it to a file or just call `show` to see it working
image.show()

# For more details on PIL.Image and PIL.ImageDraw check the documentation
# See http://pillow.readthedocs.io/en/5.1.x/reference/ImageDraw.html?#PIL.ImageDraw.PIL.ImageDraw.Draw.text

Settings

You can configure the reshaper to your preferences, it has the following settings defined:

  • language (Default: 'Arabic'): Ignored, the reshaper works with Arabic, Farsi, and Urdu, and most probably all other languages that use Arabic script.
  • support_ligatures (Default: True): When this is set to False, the reshaper will not replace any ligatures, otherwise it will replace enabled ligatures.
  • delete_harakat (Default: True): When this is set to False the reshaper will not delete the harakat from the text, this will result in them showing up in the reshaped text, you should enable this option if you are going to pass the reshaped text to bidi.algorithm.get_display because it will reverse the text and you'd end up with harakat applied to the next letter instead of the previous letter.
  • delete_tatweel (Default False): When this is set to True the reshaper will delete the Tatweel character (U+0640) from the text before reshaping, this can be useful when you want to support ligatures and don't care about Tatweel getting deleted.
  • shift_harakat_position (Default False): Whether to shift the Harakat (Tashkeel) one position so they appear correctly when string is reversed, this might solve the problem of Tashkeel in some systems, although for PIL.Image for example, this is not needed.
  • support_zwj (Default True): Whether to support ZWJ (U+200D) or not.
  • use_unshaped_instead_of_isolated (Default False): Use unshaped form instead of isolated form, useful in some fonts that are missing the isolated form of letters.

Besides the settings above, you can enable/disable supported ligatures. For a full list of supported ligatures and their default status check the file default-config.ini.

There are multiple ways that you can configure the reshaper in, choose the one that suits your deployment:

Via ArabicReshaper instance configuration

Instead of directly using arabic_reshaper.reshape function, define an instance of arabic_reshaper.ArabicReshaper, and pass your config dictionary to its constructor's configuration parameter like this:

from arabic_reshaper import ArabicReshaper
configuration = {
    'delete_harakat': False,
    'support_ligatures': True,
    'RIAL SIGN': True,  # Replace ر ي ا ل with ﷼
}
reshaper = ArabicReshaper(configuration=configuration)
text_to_be_reshaped = 'سعر المنتج ١٥٠ ر' + 'يال'  # had to split the string for display
reshaped_text = reshaper.reshape(text_to_be_reshaped)

Via ArabicReshaper instance configuration_file

You can separte the configuration from your code, by copying the file default-config.ini and change its settings, then save it somewhere in your project, and then you can tell the reshaper to use your new config file, just pass the path to your config file to its constructor's configuration_file parameter like this:

from arabic_reshaper import ArabicReshaper
reshaper = ArabicReshaper(configuration_file='/path/to/your/config.ini')
text_to_be_reshaped = 'سعر المنتج ١٥٠ ر' + 'يال'  # had to split the string for display
reshaped_text = reshaper.reshape(text_to_be_reshaped)

Where in you config.ini you can have something like this:

[ArabicReshaper]
delete_harakat = no
support_ligatures = yes
RIAL SIGN = yes

Via PYTHON_ARABIC_RESHAPER_CONFIGURATION_FILE environment variable

Instead of having to rewrite your old code to configure it like above, you can define an environment variable with the name PYTHON_ARABIC_RESHAPER_CONFIGURATION_FILE and in its value put the full path to the configuration file. This way the reshape function will pick it automatically, and you won't have to change your old code.

Settings based on a TrueType® font

If you intend to render the text in a TrueType® font, you can tell the library to generate its configuration by reading the font file to figure out what's supported in the font and what's not.

To use this feature you need to install the library with an extra option (not necessary when you install it with conda):

pip install --upgrade arabic-reshaper[with-fonttools]

Then you can use the reshaper like this:

import arabic_reshaper

reshaper = arabic_reshaper.ArabicReshaper(
    arabic_reshaper.config_for_true_type_font(
        '/path/to/true-type-font.ttf',
        arabic_reshaper.ENABLE_ALL_LIGATURES
    )
)

This will parse the font file, and figure out what ligatures it supports and enable them, as well as whether it has isolated forms or use_unshaped_instead_of_isolated should be enabled.

The second parameter to config_for_true_type_font can be one of

  • ENABLE_NO_LIGATURES
  • ENABLE_SENTENCES_LIGATURES
  • ENABLE_WORDS_LIGATURES
  • ENABLE_LETTERS_LIGATURES
  • ENABLE_ALL_LIGATURES (default)

which controls what ligatures to look for, depending on your usage, see default-config.ini to know what ligatures are there.

Tashkeel/Harakat issue

Harakat or Tashkeel might not show up properly in their correct place, depending on the application or the library that is doing the rendering for you, so you might want to enable the shift_harakat_position option if you face this problem.

License

This work is licensed under MIT License.

Demo

Online Arabic Reshaper: http://pydj.mpcabd.xyz/arabic-reshaper/

Download

https://github.com/mpcabd/python-arabic-reshaper/tarball/master

Version History

3.0.0

  • Stop supporting Python 2.7
  • Remove dependency on future. See #88.

2.1.4

  • Fix unparseable version bound for fonttools under Python 2

2.1.3

  • Remove dependency on __version__.py and default-config.ini files, as they were causing problems for people who package their apps using pyinstaller or buildozer.

2.1.1

  • Fix a warning. See #57. Thanks @fbernhart

2.1.0

  • Added support for settings based on a TrueType® font

2.0.14

  • New option use_unshaped_instead_of_isolated to get around some fonts missing the isolated form for letters.

2.0.13

BROKEN please make sure not to use this version.

2.0.12

  • Updated letters and ligatures
  • New option shift_harakat_position to try to get around the Tashkeel problem

2.0.11

  • Proper support for ZWJ

2.0.10

  • Fix Shadda ligatures

2.0.9

  • Added support for ZWJ (Zero-width Joiner) (U+200D)

2.0.8

  • Added delete_tatweel
  • Added more test cases

2.0.7

  • Fix tests for Python 2.7

2.0.6

  • Fixed a bug with Harakat breaking the reshaping
  • Wrote two small unit tests, more to come
  • Moved letters and ligatures to separate files for readability/maintainability
  • Moved package to its own folder for readability/maintainability

2.0.5

Fix error message formatting

2.0.4

Fix error message formatting

2.0.3

Use Exception instead of Error.

2.0.2

Use pkg_resources.resource_filename instead of depending on __file__ to access default-config.ini.

2.0.1

Include default-config.ini in setup.py

2.0.0

  • Totally rewrote the code;
  • Faster and better performance;
  • Added the ability to configure and customise the reshaper.

1.0.1

  • New glyphs for Farsi;
  • Added setup.py;
  • Bugfixes.

1.0

Contact

Abdullah Diab (mpcabd) Email: [email protected] Blog: http://mpcabd.xyz

For more info visit my blog post here

python-arabic-reshaper's People

Contributors

assem-ch avatar fbernhart avatar hoechenberger avatar hyperfekt avatar joekohlsdorf avatar kifcaliph avatar m-macaskill avatar mohamadmansourx avatar mpcabd avatar ravarage avatar rkcf avatar steveveepee avatar techmix avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

python-arabic-reshaper's Issues

arabic_reshaper not working on Android

arabic_reshaper working fine with windows and linux but when I run it as apk file in android, App crashed.
My errors is:

python  :  Traceback (most recent call last):
python  :    File "/media/mehdi/2436ef15-26cf-4be8-9eed-6befb73eddd8/mehdi/Documents/BuildozerTest/.buildozer/android/app/main.py", line 18, in <module>
python  :    File "/media/mehdi/2436ef15-26cf-4be8-9eed-6befb73eddd8/mehdi/Documents/BuildozerTest/.buildozer/android/platform/build-armeabi-v7a/build/python-installs/FinalAPP/arabic_reshaper/__init__.py", line 12, in <module>
python  :  FileNotFoundError: [Errno 2] No such file or directory: '/data/user/0/org.test.finalapp/files/app/_python_bundle/site-packages/arabic_reshaper/__version__.py'
python  : Python for android ended.

In line 18, imported arabic_reshaper
I would greatly appreciate it if you kindly help me

[FR] Support ANSI color codes

I am using this package to be able to use my terminal in the rare cases where some RTL text is involved. I currently have this script:

#!/usr/bin/env python3

import arabic_reshaper
import sys
from bidi.algorithm import get_display

text_to_be_reshaped = sys.stdin.read()
reshaped_text = arabic_reshaper.reshape(text_to_be_reshaped)
bidi_text = get_display(reshaped_text)
print(bidi_text, end='')

And it works great:
image

But it would be even more awesome if it could support ANSI color codes:
image

import error

Thanks for great package. I am using python 32 with windows 7 and successfully installed arabic-reshaper using python setup.py install command but fail to import it.
Using import arabic-reshaper in my code shows syntax error while using import arabic_reshaper showing error module not found. pip freeze showing package install with name arabic_reshaper. Note that how python convert dash into underscore
For workaround I have replace all occurrence of arabic-reshaper with arabicreshaper in .py file and compile again. Now import arabicreshaper import work.
I search net about using dash in python package name and found that many people are facing problem.

Do you have solution for import arabic-reshaper error?

How I can undo reshaped text?

Hi

I used this library for many use but now i need to undo reshaped text to original format, I do some search and i can't find any snippet about this action, This library have any function/class for undo reshaped arabic text?

Thank you for this great library

Missing Font

@mpcabd Assalam Alaikum, thanks for a great tool, I was planning on converting this tool to Ruby but I have noticed that when I use KFGQPC Uthmanic Script HAFS font the letters are not get displayed,
is there a way to fix this?

# -*- coding: utf-8 -*-

from __future__ import unicode_literals

import sys
from arabic_reshaper import ArabicReshaper
from bidi.algorithm import get_display
from PIL import Image, ImageFont, ImageDraw, ImageEnhance

# text_to_be_reshaped = 'اللغة العربية رائعة'
text_to_be_reshaped = "مَٰلِكِ يَوۡمِ ٱلدِّينِ"


configuration = {
    'delete_harakat': False,
    'delete_tatweel': False,
    'support_ligatures': True,
    'support_zwj': False,
    'shift_harakat_position': False,
    'use_unshaped_instead_of_isolated': False
}
reshaper = ArabicReshaper(configuration=configuration)

reshaped_text = reshaper.reshape(text_to_be_reshaped)

bidi_text = get_display(reshaped_text)

image = Image.new('RGBA', (500,500), "black")
image_draw = ImageDraw.Draw(image)
image_draw.text((10,10), bidi_text, fill=(255,255,255,128), font=ImageFont.truetype("/Users/user/Github/Personal/gq-tool/UthmanicHafs1 Ver09.otf", 50))
image.save("ImageDraw.png")

output:

imagedraw

Font: http://fonts.qurancomplex.gov.sa/?page_id=42

ConfigParser 2 vs 3 import issue

Hi Abdullah,

python-arabic-reshaper is now a valuable dependency in the PsychoPy project https://github.com/psychopy/psychopy, many thanks for your work.

Some (but oddly, not all) of our Python 2 users are reporting an import issue though:

File "/Applications/PsychoPy3.app/Contents/Resources/lib/python2.7/arabic_reshaper/arabic_reshaper.py", line 31, in <module>
from configparser import ConfigParser
ImportError: No module named configparser

I can't replicate this myself but the import of this particular library does seem to be used as a classic example for maintaining compatibility between Python 2 and 3 code. Some suggestions include first attempting to import the Python 3 module name and if that fails, fall back on the Python 2 name (http://python3porting.com/noconv.html):

try:
    import configparser
except ImportError:
    import ConfigParser as configparser

but a more specific example gives this (https://stackoverflow.com/questions/32507715/standard-solution-for-supporting-python-2-and-python-3):

from six.moves import configparser
import six

if six.PY2:
  ConfigParser = configparser.SafeConfigParser
else:
  ConfigParser = configparser.ConfigParser

which might be more useful here maybe.

Any chance of incorporating something like this into arabic_reshaper? I'm not keen on submitting a pull request myself, as not being able to replicate the name conflict issue, I wouldn't be able to test it properly.

مشكلة مع pyinstaller

السلام عليكم أخي
واجهت مشكلة أثناء تحويل مشروعي لملف تنفيذي
وبعد فترة علمت أن المشكلة من استيراد arabic-reshaper
هل هناك ما علي فعله لأحل المشكلة؟

Reshaping not as it should be, ltr direction not rtl

Hey
I have installed arabic_reshaper, I am writing kurdish texts and arabic texts, both will not reshaped well, just their characters change not direction.

import arabic_reshaper
arabic = "اللغة العربية"
chakkraw = arabic_reshaper.reshape(chakkrdnywshe)
ckb_kurdish = "چۆنی"
chakyw = arabic_reshaper.reshape(chakkrdnywshe1)
print(arabic)
print(ckb_kurdish)

It shows the like this:

reshape
Iam missing anything?

"Arabic typesetting" font doesn't draw letters properly [as shown].

the font "Arabic typesetting" does work with the arabic reshaper library as expected it shows instead [a question mark] replacing many letters that ruins the afterall result and here is an example of drawing random Arabic words like "أمتحان رياضيات".
and if i used the option "use_unshaped_instead_of_isolated " it corrects some letters and damage another letters i wonder if there is a way to get around this issue cuz this font if really famous one in the arabic fonts !
https://ibb.co/m6Y1VRh

[Error2]:No such file or directory arabic_reshaper/__version__.py

arabic_reshaper working fine with linux but not with Android.
It show error with __version__.py file not found.
I traced the problem till i find the issue in the
__init__.py in the line
exec(open(os.path.join(os.path.dirname(__file__), '__version__.py')).read())

I did solution for it by replacing that line by
__version__ = '2.1.0'

And now it is working fine. So I like to share my solution with you .
Thank you

A new feature that should be added.

there are some fonts with unique way for merging two letters in a row into one letter here is an example:
the letters: "نجـ" the ordinary way to visualize it is:
image
but there is a way used in many fonts that automatically merges any certain two letters in a row like:
the letters: "نجـ" the other cool way to visualize it is:
image
and there are too many letters that go into one letter with unique look like this,
for example :
image

font type not applied to image

Hi
I used reshaper library as below:

reshaper_ = arabic_reshaper.ArabicReshaper(
arabic_reshaper.config_for_true_type_font(
'./path/to/font.ttf',
arabic_reshaper.ENABLE_ALL_LIGATURES
)
)
axis.set_title(reshaper_.reshape( title )[::-1] )

but the font not applied to image title.

Two letters break this library

for an example lets say you want to put in "ام محمد" the word "محمد" works well but the word "ام" comes in boxes I'm using reshaper for putting text into images by using PIL import Image, ImageDraw, ImageFont from bidi.algorithm import get_display and I'm also using Tajawal-Bold.ttf

Urdu language support

Urdu is a subset of the Arabic language. Can you add the support for the Urdu language?

Not working with Python 2.7

Salamu Alikom Abdullah,

When trying using the library with Pytohn 2.7 I got the following errors:

File "arabic_reshaper.py", line 26,
in
from builtins import range
ImportError: No module named builtins

Fixed, After changing the following line
from builtins import range - To :> from (double-underscore)builtin(double-underscore) import range

Then, got the following error:
from configparser import ConfigParser
ImportError: No module named configparser

Again, changes the it to : import ConfigParser ,only

Error in ConfigParser () line, then changes to ConfigParser.ConfigParser(),
Got other errors
!!
Knowing that when trying to run it using Python 3, it works with no problem,

PyInstaller and Python-arabic-reshaper (Python 2.7)

I am trying to use PyInstaller to put an exe wrapper around my code that imports Python-arabic-reshaper, but I got this error,

Traceback (most recent call last): File "<string>", line 20, in <module> File "C:\Users\Ownerc\Downloads\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\lo ader\pyi_importers.py", line 270, in load_module File "C:\Python27\build\hhr_video_maker - Copy\out00-PYZ.pyz\arabic_reshaper", line 1, in <module> File "C:\Users\Ownerc\Downloads\PyInstaller-2.1\PyInstaller-2.1\PyInstaller\lo ader\pyi_importers.py", line 270, in load_module File "C:\Python27\build\hhr_video_maker - Copy\out00-PYZ.pyz\arabic_reshaper.a rabic_reshaper", line 1377, in <module> File "C:\Python27\build\hhr_video_maker - Copy\out00-PYZ.pyz\arabic_reshaper.a rabic_reshaper", line 1248, in __init__ ValueError: Invalid configuration: A section with the name ArabicReshaper was no t found

The config file I was using is default-config.ini, I have also had configparser package installed.

Thank you,

Just an update...
We made a very simple Python 2.7 script that uses the Python-arabic-reshaper library. The script runs successfully but fails when packaged into an exe using Pyinstaller.

The Pyinstaller (tool that packages the Python script and libraries into an exe) documentation says suggests to use the archive viewer to see what library files may have been missed in being included in the exe package.

Below is the archive viewer output that shows all the assets included in the pyinstaller "exe" package. Can you identify any missing files from the exe package that the Python-arabic-reshaper needs in order to run?

E:\TAMU>python "E:\Anaconda2\Lib\site-packages\PyInstaller\utils\cliutils\archive_viewer.py" "e:\TAMU\dist\bob_test_1.exe"
 pos, length, uncompressed, iscompressed, type, name
[(0, 170, 235, 1, 'm', u'struct'),
 (170, 1153, 2704, 1, 'm', u'pyimod01_os_path'),
 (1323, 4222, 11804, 1, 'm', u'pyimod02_archive'),
 (5545, 6034, 18956, 1, 'm', u'pyimod03_importers'),
 (11579, 1589, 4450, 1, 's', u'pyiboot01_bootstrap'),
 (13168, 347, 504, 1, 's', u'bob_test_1'),
 (13515, 48403, 89416, 1, 'b', u'VCRUNTIME140.dll'),
 (61918, 39529, 87552, 1, 'b', u'_bz2.pyd'),
 (101447, 624405, 1443840, 1, 'b', u'_hashlib.pyd'),
 (725852, 76667, 146432, 1, 'b', u'_lzma.pyd'),
 (802519, 28814, 66048, 1, 'b', u'_socket.pyd'),
 (831333, 888894, 2045440, 1, 'b', u'_ssl.pyd'),
 (1720227, 10439, 19136, 1, 'b', u'api-ms-win-core-console-l1-1-0.dll'),
 (1730666, 10253, 18624, 1, 'b', u'api-ms-win-core-datetime-l1-1-0.dll'),
 (1740919, 10265, 18624, 1, 'b', u'api-ms-win-core-debug-l1-1-0.dll'),
 (1751184, 10322, 18624, 1, 'b', u'api-ms-win-core-errorhandling-l1-1-0.dll'),
 (1761506, 11406, 22208, 1, 'b', u'api-ms-win-core-file-l1-1-0.dll'),
 (1772912, 10289, 18624, 1, 'b', u'api-ms-win-core-file-l1-2-0.dll'),
 (1783201, 10419, 18624, 1, 'b', u'api-ms-win-core-file-l2-1-0.dll'),
 (1793620, 10290, 18624, 1, 'b', u'api-ms-win-core-handle-l1-1-0.dll'),
 (1803910, 10469, 19136, 1, 'b', u'api-ms-win-core-heap-l1-1-0.dll'),
 (1814379, 10302, 18624, 1, 'b', u'api-ms-win-core-interlocked-l1-1-0.dll'),
 (1824681, 10532, 19136, 1, 'b', u'api-ms-win-core-libraryloader-l1-1-0.dll'),
 (1835213, 11178, 21184, 1, 'b', u'api-ms-win-core-localization-l1-2-0.dll'),
 (1846391, 10461, 19136, 1, 'b', u'api-ms-win-core-memory-l1-1-0.dll'),
 (1856852, 10395, 18624, 1, 'b', u'api-ms-win-core-namedpipe-l1-1-0.dll'),
 (1867247,
  10555,
  19648,
  1,
  'b',
  u'api-ms-win-core-processenvironment-l1-1-0.dll'),
 (1877802, 11078, 20672, 1, 'b', u'api-ms-win-core-processthreads-l1-1-0.dll'),
 (1888880, 10498, 19136, 1, 'b', u'api-ms-win-core-processthreads-l1-1-1.dll'),
 (1899378, 10215, 18112, 1, 'b', u'api-ms-win-core-profile-l1-1-0.dll'),
 (1909593, 10486, 19136, 1, 'b', u'api-ms-win-core-rtlsupport-l1-1-0.dll'),
 (1920079, 10347, 18624, 1, 'b', u'api-ms-win-core-string-l1-1-0.dll'),
 (1930426, 10870, 20672, 1, 'b', u'api-ms-win-core-synch-l1-1-0.dll'),
 (1941296, 10524, 19136, 1, 'b', u'api-ms-win-core-synch-l1-2-0.dll'),
 (1951820, 10598, 19648, 1, 'b', u'api-ms-win-core-sysinfo-l1-1-0.dll'),
 (1962418, 10376, 18624, 1, 'b', u'api-ms-win-core-timezone-l1-1-0.dll'),
 (1972794, 10274, 18624, 1, 'b', u'api-ms-win-core-util-l1-1-0.dll'),
 (1983068, 10607, 19648, 1, 'b', u'api-ms-win-crt-conio-l1-1-0.dll'),
 (1993675, 11729, 22720, 1, 'b', u'api-ms-win-crt-convert-l1-1-0.dll'),
 (2005404, 10429, 19136, 1, 'b', u'api-ms-win-crt-environment-l1-1-0.dll'),
 (2015833, 11063, 20672, 1, 'b', u'api-ms-win-crt-filesystem-l1-1-0.dll'),
 (2026896, 10584, 19648, 1, 'b', u'api-ms-win-crt-heap-l1-1-0.dll'),
 (2037480, 10540, 19136, 1, 'b', u'api-ms-win-crt-locale-l1-1-0.dll'),
 (2048020, 13628, 27840, 1, 'b', u'api-ms-win-crt-math-l1-1-0.dll'),
 (2061648, 10654, 19648, 1, 'b', u'api-ms-win-crt-process-l1-1-0.dll'),
 (2072302, 11901, 23232, 1, 'b', u'api-ms-win-crt-runtime-l1-1-0.dll'),
 (2084203, 12357, 24768, 1, 'b', u'api-ms-win-crt-stdio-l1-1-0.dll'),
 (2096560, 12530, 24768, 1, 'b', u'api-ms-win-crt-string-l1-1-0.dll'),
 (2109090, 11174, 21184, 1, 'b', u'api-ms-win-crt-time-l1-1-0.dll'),
 (2120264, 10601, 19136, 1, 'b', u'api-ms-win-crt-utility-l1-1-0.dll'),
 (2130865, 485, 1035, 1, 'b', u'bob_test_1.exe.manifest'),
 (2131350, 74629, 189952, 1, 'b', u'pyexpat.pyd'),
 (2205979, 1637554, 3938304, 1, 'b', u'python35.dll'),
 (3843533, 9127, 19968, 1, 'b', u'select.pyd'),
 (3852660, 446584, 982720, 1, 'b', u'ucrtbase.dll'),
 (4299244, 341035, 865792, 1, 'b', u'unicodedata.pyd'),
 (4640279,
  0,
  0,
  0,
  'o',
  u'pyi-windows-manifest-filename bob_test_1.exe.manifest'),
 (4640279, 197523, 761033, 1, 'x', u'base_library.zip'),
 (4837802, 1198430, 1198430, 0, 'z', u'out00-PYZ.pyz')]
?

Here is the simple Python script that I used to test...

# -*- coding: utf-8 -*-

import sys
from arabic_reshaper import ArabicReshaper

config1={
    'delete_harakat':False,
    'support_ligatures':True,
    'RIAL SIGN':True,
}

reshaper=ArabicReshaper(configuration=config1)
text=u"عربى"
reshaped_text=reshaper.reshape(text)

print(sys.stdout.encoding)
print(reshaped_text.encode('utf-8'))

front page example is not reproducible

Hello,

Thanks for this package.

The example on the front page is not reproducible, even after:

pip3 install --upgrade arabic-reshaper bidi
pip3 install python-bidi

Please provide a minimal working example for us to better use this program.

Best,

Figure out what ligatures to use and in what order in certain fonts

the image will explain what i mean:
example
highlighted:
image
the issue continues with some other certain letters connect to each other this way.

the snippet code i used:


from PIL import Image , ImageDraw , ImageFont  
from bidi.algorithm import get_display
import arabic_reshaper
import os


txt = "سيمبا"
fnt_path = os.path.join(os.getcwd()+ "/ar.ttf") #Arabic typesetting font.
fnt = ImageFont.truetype(fnt_path , 200)

reshaper = arabic_reshaper.ArabicReshaper(
    arabic_reshaper.config_for_true_type_font(
        fnt_path,
        arabic_reshaper.ENABLE_ALL_LIGATURES
    )
)

reshaped_text = reshaper.reshape(txt)
bidi_text = get_display(reshaped_text)
image = Image.new(mode = "RGBA" , size = (300, 200) , color="white")
draw = ImageDraw.Draw(image)
draw.text( (0, 0), text=bidi_text, fill="black", font = fnt)
image.save("example.png")
os.startfile("example.png")

python-arabic-reshaper does not recognize all UTF-8 Arabic characters?

I'm trying to use python-arabic-reshaper to help make the Python library wordcloud (#amueller/word_cloud#70) work with Arabic text, but am having trouble.

Here's where I use arabic reshaper. Each key is UTF-8, and may use Latin, Arabic or other language characters. They display correctly outside of this wordcloud.

Situation 1

    print 'Reshaping words...'
      reshaped_words = {}
      for key in words.keys():
          decoded = key.decode('utf-8')
          reshaped = arabic_reshaper.reshape(decoded)
          reshaped_words[get_display(reshaped)] = words[key]

When I run the above (and create a wordcloud out of the reshaped_words frequencies), I get:
wcfeg3

Situation 2

    print 'Reshaping words...'
      reshaped_words = {}
      for key in words.keys():
          reshaped_words[get_display(key)] = words[key]

When I run the above, I get:

wcfeg4

You'll see in the first situation, the script is correct but there are missing characters (the question boxes). In the second situation, the script is incorrect (letters are all disjointed) but none of the characters are missing.

It seems like arabic reshaper is having trouble recognizing and encoding certain Arabic (or other) characters.

Any ideas? I am not sure what I'm doing wrong. I don't speak Arabic, so I have trouble debugging or recognizing what's wrong here - I just know that it is wrong!

'ARABIC LETTER MEEM FINAL FORM' (U+FEE2) (and others) not part of CP864

I know this isn't related to this library per se, but I wonder if maybe you have any information about this.

I am trying to add Arabic script support to python-escpos. So I tried converting the letters to their presentational forms, and sent them to a POS printer supporting CP864 (among others). This doesn't quite work - the codepage doesn't include this particular character. Others that are missing: for example 'ARABIC LETTER TEH MEDIAL FORM' (U+FE98).

I was under the impression that CP864 was designed to have separate glyphs for the different forms. I guess not for every form, though.

How would this commonly be handled? (How was this handled in the olden times?) Would I use some custom logic to fall back on an unshaped form if necessary?

Lam-Alef incorrect glyphs

The LAM_ALEF_GLYPHS table includes chinese characters \u3BA6 and \u3BA7. This appears to be due to a bug in the original java version at https://github.com/agawish/Better-Arabic-Reshaper/blob/master/src/org/amr/arabic/ArabicReshaper.java#L60

In python, the table should be:

LAM_ALEF_GLYPHS = [
        [u'\u0622', u'\uFEF6', u'\uFEF5'],
        [u'\u0623', u'\uFEF8', u'\uFEF7'],
        [u'\u0627', u'\uFEFC', u'\uFEFB'],
        [u'\u0625', u'\uFEFA', u'\uFEF9']
]

In the java, I believe the table should be

public static char[][] LAM_ALEF_GLPHIES=
        {{1570,65270,65269},
         {1571,65272,65271},
         {1575, 65276,65275},
         {1573, 65274,65273}
       };

Lines are in reverse order when a paragraph is rendered in a block with multiple lines after using bidi

I'm having an issue with writing fetched data from google sheets over an image. the code works all perfectly but the data is written in LFT. and base_dir doesn't change it. الكلام بيتدي من فوق لتحت

`confixed = arabic_reshaper.reshape(confessionFetch)
finalconf = get_display(confixed, base_dir='R')

Using Arial font, also tried Cairo and Tajweel but still same issue.
Also I use Nider in order to print the fetched data over the image.

amazing work tho. really helped me out but I'd appreciate your input on my issue. thank you`

Custom harakat postion

Hi, is it possible to set custom harakat position, when I use ZekrQuran font the harakat are positioned really high or is possible to hide Arabic letters so that I can redraw the letters twice one for the Harakat and the other for the letters with custom text position

imagedraw

# -*- coding: utf-8 -*-

from __future__ import unicode_literals

import unittest
import sys
from arabic_reshaper import ArabicReshaper
from bidi.algorithm import get_display
from PIL import Image, ImageFont, ImageDraw, ImageEnhance

# text_to_be_reshaped = 'اللغة العربية رائعة'
text_to_be_reshaped = u"إِيَّاكَ نَعۡبُدُ وَإِيَّاكَ نَسۡتَعِينُ"


configuration = {
    'delete_harakat': False,
    'delete_tatweel': False,
    'support_ligatures': True,
    'support_zwj': True,
    'shift_harakat_position': False,
    'use_unshaped_instead_of_isolated': False
}
reshaper = ArabicReshaper(configuration=configuration)

reshaped_text = reshaper.reshape(text_to_be_reshaped)
font = ImageFont.truetype("/Users/aal29/Github/Personal/gq-tool/ZekrQuran.ttf", 50)

bidi_text = get_display(reshaped_text)

image = Image.new('RGBA', (500,500), "black")
image_draw = ImageDraw.Draw(image)
image_draw.text((50,50), bidi_text, fill=(255,255,255,128), font=font)
# image_draw.text((50,75), bidi_text, fill=(255,255,255,128), font=font)
image.save("ImageDraw.png")

How can we handle such case?

original text : علامات المحبة

1

My code is below:

from PIL import ImageFont, ImageDraw, Image
import arabic_reshaper
from bidi.algorithm import get_display

def create_text_image(base_image_path=settings.MEDIA_ROOT + '/creative/a1.jpg',
                      output_image_path=settings.MEDIA_ROOT + '/images/output.jpg', text_start_height=820,
                      font_size=40, text_color="#ffffff", text_arabic='', text_english=''):
    # Load & Draw Image
    image = Image.open(base_image_path)
    draw = ImageDraw.Draw(image)
    print('Hello')
    # use a truetype font
    reshaped_text = arabic_reshaper.reshape(text_arabic)
    bidi_text = get_display(reshaped_text)
    font = ImageFont.truetype(settings.MEDIA_ROOT + 'fonts/regular.ttf', font_size)

    # prepare text
    iw, ih = image.size
    w_a, h_a = font.getsize(bidi_text)
    w_e, h_e = font.getsize(text_english)
    textX_a = int((iw - w_a) / 2)
    textX_e = int((iw - w_e) / 2)

    draw.text((textX_a, text_start_height), bidi_text, font=font, fill=text_color)
    draw.text((textX_e, text_start_height+h_a+10), text_english, font=font, fill=text_color)

    image.save(output_image_path)

Thanks in advance. Pillow rtl is not working, so I am stuck :(

problem with text size after reshaping.

for many fonts there are many words that when reshaping it appears to have wrong text size .
when i try to get the text size for most of words i get it wrong as shown:
image
the bounding box shows the text size found by "[pillow object] draw.textsize(text, font)" there is extra space to the right that make it hard to use the reshaped result in any work !

Extending for Urdu

Is an extension for Urdu available?

If not, it would be great if you extended this neat library for Urdu. Thanks!

Issue with Arabic Typesetting font

Hi there,

I'm facing an issue with this script, some letter will appear as symbol.

I tried to change the font and it work, but using the "Arabic Typesetting" there will be symbols always!

image
image
I have attached the font for further investigation

ArabicTypesetting .zip

Thanks,

dropping diacritics?

Hi,

I am trying to use this package to render images of Arabic text with pygame.
I'm observing that the reshaper package appears to drops diacritics -- I'd particularly like to keep them in if possible, and I think this is unexpected from your original blog post.

Use:

import arabic_reshaper
from bidi.algorithm import get_display

text_to_be_reshaped = 'أَنا كَنَدِيَّةٍ ، وَأَنا أَصْغَرِ إِخْوانِي السَبْعَةِ '
reshaped_text = arabic_reshaper.reshape(text_to_be_reshaped)
bidi_text = get_display(reshaped_text)

Printing before and after (using NotoSansArabic, which supports diacritics etc):
Screen Shot 2021-05-17 at 12 46 27 PM

Similarly in the demo linked from your blog post:
Screen Shot 2021-05-17 at 12 44 44 PM

Thank you for any pointers!
--Liz

problem with english code

hi thanks for your great work im using arabic reshaper and its work perefect in all of my text
but wen i try to convert this text (i forgot to tell i yse bidi too)
[0.1s]
[2.6s]
ماشین شماره 11کی، درگیر یه تیراندازی شدم. [2.5s]
وضعیت کد 4 هست ولی مظنون از پا دراومده. [3.9s]
تو خیابون 522 هیل جنوبی به یه آمبولانس نیاز هست،کافه برادران بوس.

its move [?.?s] from end of all line to start of line.
[0.1s]
[2.6s]
[2.5s] .ﻡﺪﺷ ﯼﺯﺍﺪﻧﺍﺮﯿﺗ ﻪﯾ ﺮﯿﮔﺭﺩ ،ﯽﮐ11 ﻩﺭﺎﻤﺷ ﻦﯿﺷﺎﻣ
[3.9s] .ﻩﺪﻣﻭﺍﺭﺩ ﺎﭘ ﺯﺍ ﻥﻮﻨﻈﻣ ﯽﻟﻭ ﺖﺴﻫ 4 ﺪﮐ ﺖﯿﻌﺿﻭ

what can i do with this?

Import error with python 2.7.14 32bits

Hi,

I have installed arabic-reshaper (version 28.8.0) using pip with a python 2.7.14 32bits installation.

However, I got the following error on import:

import arabic_reshaper

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "c:\Python27\ArcGIS10.6\lib\site-packages\arabic_reshaper\__init__.py", l
ine 3, in <module>
    from .arabic_reshaper import reshape, default_reshaper, ArabicReshaper
  File "c:\Python27\ArcGIS10.6\lib\site-packages\arabic_reshaper\arabic_reshaper
.py", line 242, in <module>
    default_reshaper = ArabicReshaper()
  File "c:\Python27\ArcGIS10.6\lib\site-packages\arabic_reshaper\arabic_reshaper
.py", line 64, in __init__
    self.configuration = auto_config(configuration, configuration_file)
  File "c:\Python27\ArcGIS10.6\lib\site-packages\arabic_reshaper\reshaper_config
.py", line 80, in auto_config
    if 'ArabicReshaper' not in configuration_parser:
TypeError: argument of type 'instance' is not iterable

About HARAKAT_RE

Hi,
Thanks for your great project.
In arabic_reshaper.py, both line 31 and 32 start with "08d4", is that correct?
And the last range in HARAKAT_RE, "\u08e3-\u08ff" maybe a mistake too?

Letter ڵ is not reshaped

Thank you again for this great project.

I see that the letter ڵ for some reason is not reshped when using Kurdish. Do I need to change the configuration to support this letter?

Thanks

Wrong order when English is used inside the text

I want to print a Persian phrase (right to left) in a python console application. It's okay if all chars are in Persian. However, if it's mixed with English (including the ending dot (.), it shows it incorrectly.

Examples:

این خوب است # this is okay
این خوب است. # the dot must be on the left-most side not right (the problem exists even in this editor)

این متن شامل English است.

The last one must be printed as:

.است English این متن شامل

To type the above, I typed it in the wrong order to show the right order!

===========
I used your package with the same problem.

ResourceWarning: unclosed file in __version__.py

hi,

today I updated xhtml2pdf and since this new version depends on arabic-resharper I had to install this too.

When starting my app with the python option -Wd I get the following warning:

C:\python38\lib\site-packages\arabic_reshaper\__init__.py:12: ResourceWarning: unclosed file <_io.TextIOWrapper name='C:\\python38\\lib\\site-packages\\arabic_reshaper\\__version__.py' mode='r' encoding='cp1252'> exec(open(os.path.join(os.path.dirname(__file__), '__version__.py')).read()) ResourceWarning: Enable tracemalloc to get the object allocation traceback

could you have a quick fix for this ?

Thanks :-)

arabic_reshaper/__version__.py filenotfound error

Asalamu Alaikum.

Thanks for the wonderful package. I am trying to use your package in a small apk to show details of quran verses when surah number is given. When i run in python, it is working fine. When i build the apk(builozer), and ran the app in mobile, it crashed with below error. I got the error from ADB logs.

05-12 17:07:57.625 20960 21128 I python : [INFO ] [Logger ] Record log in /data/user/0/org.pynew.pynew/files/app/.kivy/logs/kivy_20-05-12_0.txt
05-12 17:07:57.625 20960 21128 I python : [INFO ] [Kivy ] v1.11.1
05-12 17:07:57.625 20960 21128 I python : [INFO ] [Kivy ] Installed at "/data/user/0/org.pynew.pynew/files/app/_python_bundle/site-packages/kivy/init.pyc"
05-12 17:07:57.625 20960 21128 I python : [INFO ] [Python ] v3.8.1 (default, May 12 2020, 02:14:46)
05-12 17:07:57.625 20960 21128 I python : [Clang 8.0.2 (https://android.googlesource.com/toolchain/clang 40173bab62ec7462
05-12 17:07:57.625 20960 21128 I python : [INFO ] [Python ] Interpreter at ""
05-12 17:07:58.226 20960 21128 I python : [INFO ] [Factory ] 184 symbols loaded
05-12 17:07:58.471 20960 21128 I python : [INFO ] [Image ] Providers: img_tex, img_dds, img_sdl2, img_gif (img_pil, img_ffpyplayer ignored)
05-12 17:07:58.506 20960 21128 I python : [INFO ] [Text ] Provider: sdl2
05-12 17:07:58.857 20960 21128 I python : Traceback (most recent call last):
05-12 17:07:58.857 20960 21128 I python : File "/home/kivy/pynew/.buildozer/android/app/main.py", line 11, in
05-12 17:07:58.858 20960 21128 I python : File "/home/kivy/pynew/.buildozer/android/platform/build-armeabi-v7a/build/python-installs/pynew/arabic_reshaper/init.py", line 6, in
05-12 17:07:58.859 20960 21128 I python : FileNotFoundError: [Errno 2] No such file or directory: '/data/user/0/org.pynew.pynew/files/app/_python_bundle/site-packages/arabic_reshaper/version.py'
05-12 17:07:58.859 20960 21128 I python : Python for android ended.

Can you please help to fix this issue?

Package Not working when code is deployed using docker

I'm building an application using this package.
The objective of the app is to write Arabic text on images. My app works fine on my local machine but once deployed via docker, it breaks. Characters appear isolated/disjointed.

Code running inside Inside Docker Image
image

Correct Image (Working Locally)
image

requirements.txt

arabic-reshaper==2.1.0
certifi==2020.6.20
cffi==1.14.0
click==7.1.2
cryptography==2.9.2
Flask==1.1.2
Flask-Cors==3.0.8
fonttools==4.12.1
fpdf==1.7.2
future==0.18.2
gunicorn==20.0.4
img2pdf==0.3.6
itsdangerous==1.1.0
Jinja2==2.11.2
olefile==0.46
Pillow==7.2.0
pycodestyle==2.6.0
pycparser==2.20
pyOpenSSL==19.1.0
python-bidi==0.4.2
six==1.15.0

Werkzeug==1.0.1
FROM nethacker/ubuntu-18-04-python-3:latest

RUN apt-get update -y
RUN apt-get upgrade -y

RUN apt-get install -y libtiff5-dev libjpeg62-turbo-dev libopenjp2-7-dev zlib1g-dev \
    libfreetype6-dev liblcms2-dev libwebp-dev tcl8.6-dev tk8.6-dev python3-tk \
    libharfbuzz-dev libfribidi-dev libxcb1-dev



COPY ./src/requirements.txt /root

RUN pip install -r /root/requirements.txt && useradd -m ubuntu


ENV HOME=/home/ubuntu

USER ubuntu


COPY ./src /home/ubuntu/

COPY ./src/fonts/* /usr/share/fonts/

RUN fc-cache -f -v

RUN pip install --upgrade arabic-reshaper[with-fonttools]



WORKDIR /home/ubuntu

EXPOSE 5000

CMD ["gunicorn", "-c", "gunicorn_config.py", "wsgi:app"]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.