Skip to content
/ emt Public

Emacs macOS Tokenizer, tokenizing CJK words with macOS's built-in NLP tokenizer.

License

Notifications You must be signed in to change notification settings

roife/emt

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

emt.el

Introduction

EMT stands for Emacs MacOS Tokenizer.

This package use macOS’s built-in NLP tokenizer to tokenize and operate on CJK words in Emacs.

Installation

Requirements

  • macOS 10.15 or later
  • Emacs 26.1 or later, built with dynamic module support (use --with-modules during compilation)

Build dynamic module

Pre-built (recommendation)

If you enable emt-mode and the module cannot be found, it will prompt whether to automatically download it from GitHub. Or you can manually retrieve the pre-built module from the releases section and place the dylib file in the emacs-macos-tokenizer-lib-path (by default, it is located at modules/libEMT.dylib within your personal configuration folder, normally ~/.emacs.d/modules/libEMT.dylib).

Current version of the dynamic module is v2.0.0, make sure you have updated to latest module.

Manually build

  • Install Xcode.
  • Build the module using emt-compile-module, which compiles and copies the module to emt-lib-path.

If you enconter the folloing error:

No such module “PackageDescription”

run the following command and try again:

sudo xcode-select --switch /Applications/Xcode.app/Contents/Developer

Install package

Install with straight and use-package:

(use-package emt
  :straight (:host github :repo "roife/emt"
                   :files ("*.el" "module/*" "module"))
  :hook (after-init . emt-mode))

Customization

emt-use-cache

Caches for results of tokenization if non-nil. Default is t.

emt-cache-lru-size

The size of LRU cache. Default is 50.

emt-lib-path

The path to the directory of dynamic library for emt. Default is ~/.emacs.d/modules/libEMT.dylib.

Usage

keymap: emt-mode-map

It remaps forward-word, backward-word, kill-word and backward-kill-word to use emt’s version.

Minor mode

It calls emt-ensure, which load dynamic modeuls and set emt-mode-map.

Functions

emt-word-at-point-or-forward

Return the word at point. If current point is at bound of a word, return the one forward.

emt-word-at-point-or-backward

Return the word at point. If current point is at bound of a word, return the one backward.

emt-compile-module

Compile and copy the module to emt-lib-path.

It takes an optional argument path, which is the path to the directory of dynamic library. By default, path is set to emt-lib-path.

emt-download-module

Download dynamic module from https://github.com/roife/emt/releases/download/<VERSION>/libEMT.dylib.

If PATH is non-nil, download the module to PATH.

emt-ensure

Load dynamic module.

emt-split

Split string into a list of words.

Return a list of word bounds (a cons of the beginning position and the ending position of a word)

emt-forward-word

CJK compatible version of forward-word.

emt-backward-word

CJK compatible version of backward-word.

emt-kill-word

CJK compatible version of kill-word.

emt-backward-kill-word

CJK compatible version of backward-kill-word.

emt-mark-word

CJK compatible version of mark-word.

Acknowledgements

This package is inspired by jieba.el which is a Chinese tokenizer for Emacs using jieba.

The dynamic module uses emacs-swift-module, which provides an interface for writing Emacs dynamic modules in Swift.

About

Emacs macOS Tokenizer, tokenizing CJK words with macOS's built-in NLP tokenizer.

Resources

License

Stars

Watchers

Forks

Packages

No packages published