Skip to content

A collection of tools for working with the Thai language in Python

License

Notifications You must be signed in to change notification settings

hermanschaaf/pythai

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PyThai

Some basic python functions for working with the Thai language. For example:

import pythai

pythai.split(u"การที่ได้ต้องแสดงว่างานดี")
>>> u"การ ที่ ได้ ต้อง แสดง ว่า งาน ดี"

pythai.word_count(u"การที่ได้ต้องแสดงว่างานดี")
>>> 8

pythai.contains_thai(u"hello")
>>> False

pythai.contains_thai(u"helloการที่ไ")
>>> True

It's meant to be fast and efficient enough to handle large documents without breaking a sweat.

Includes

Currently the library supports these functions:

  • Word segmentation (split)
  • Word count (word_count) (faster than counting the result of split)
  • Whether a string contains Thai or not (contains_thai)

Installation

PyThai requires libthai-dev to work. You can install it quite easily:

sudo apt-get install libthai-dev

And then you can simply install pythai through pip:

pip install pythai==0.1.3

More

Special thanks to Vee Satayamas for the original python bindings of libthai from C.

This library was written for use in Gengo. It's free and open-source under the GNU lesser public license. Any contributions are welcome!

About

A collection of tools for working with the Thai language in Python

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published