Skip to content

PyThaiNLP Corpus Downloader

Wannaphong Phatthiyaphaibun edited this page Jan 10, 2021 · 1 revision

PyThaiNLP Corpus Downloader

code name: pythainlp-data

Rational Criterion

Because PyThaiNLP is developing very quickly Including a data warehouse that changes with every new release. Therefore, there was a need to update and save a new system to replace the old system called pythainlp-data.

Development

  • We used a TinyDB database for local catalog. (User)
  • We used a json file for store. The available corpus names can be seen in this file: pythainlp.github.io/pythainlp-corpus/db.json
  • We used a GitHub releases for store a corpus/model
  • By default, downloaded corpus and model will be saved in $HOME/pythainlp-data/ (e.g. /Users/bact/pythainlp-data/wiki_lm_lstm.pth).

Used

You can view a corpus at pythainlp.github.io/pythainlp-corpus/

Clone this wiki locally