Skip to content

Reconstruct Arabic sentences to be used in applications that don't support Arabic

Notifications You must be signed in to change notification settings

EssaAlshammri/python-arabic-reshaper

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Python Arabic Reshaper

Reconstruct Arabic sentences to be used in applications that don't support Arabic

Based on Better Arabic Reshaper, ported to Python, tweaked a little bit.

Arabic is a very special script language with two essential features:

  1. It is written from right to left.
  2. The characters change shape according to their surrounding characters.

So when you try to print Arabic text in an application – or a library – that doesn’t support Arabic you’re pretty likely to end up with something that looks like this:

Arabic text written from left to right with no reshaping

We have two problems here, first, the characters are in the isolated form, which means that every character is rendered regardless of its surroundings, and second is that the text is written from left to right.

To solve the latter issue all we have to do is to use the Unicode bidirectional algorithm, which is implemented purely in Python in python-bidi. If you use it you’ll end up with something that looks like this:

Arabic text written from right to left with no reshaping

The only issue left to solve is to reshape those characters and replace them with their correct shapes according to their surroundings. Using this library helps with the reshaping so we can get the proper result like this:

Arabic text written from right to left with reshaping

Usage

import arabic_reshaper
from bidi.algorithm import get_display
 
#...
reshaped_text = arabic_reshaper.reshape(u'اللغة العربية رائعة')
bidi_text = get_display(reshaped_text)
pass_arabic_text_to_render(bidi_text)
#...

The pass_arabic_text_to_render function here is an imaginary function, it is just here to say that the variable bidi_text is the variable that you would need to use in your code afterwards, for example to print it in PDF, or to write it in an Image, etc.

For more info visit my blog post here

Known Issue

Harakat or Tashkeel are not supported, and I think that they can't be supported as their unicode characters are non-spacing marks (i.e. they don't take space, they are rendered in the same space of the character before them), which means that when used in a reshaper, they will be rendered on the next character as the text is reversed.

License

This work is licensed under GNU General Public License v3.

Demo

Online Arabic Reshaper: http://pydj.igeex.biz/arabic-reshaper/

Download

https://github.com/mpcabd/python-arabic-reshaper/tarball/master

Contact

Abdullah Diab (mpcabd) Email: [email protected] Blog: http://mpcabd.xyz

About

Reconstruct Arabic sentences to be used in applications that don't support Arabic

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 100.0%