Skip to content

Small library that provides functions to tokenize a string into an array of words with or without punctuation

License

Notifications You must be signed in to change notification settings

unfoldingWord/string-punctuation-tokenizer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

c4c088e · Aug 25, 2021
Aug 24, 2021
Jan 15, 2019
Sep 16, 2019
Sep 17, 2019
Aug 7, 2018
Jun 9, 2020
Jul 9, 2021
Sep 16, 2019
Aug 25, 2021
Sep 16, 2019
Sep 16, 2019
Apr 24, 2020

Repository files navigation

npm npm

string-punctuation-tokenizer

Small library that provides functions to tokenize a string into an array of words with or without punctuation

Setup

npm install string-punctuation-tokenizer

Usage

var stringTokenizer = require('string-punctuation-tokenizer');

or ES6

import {tokenize} from 'string-punctuation-tokenizer';

Tokenize with punctuation

import {tokenize} from './src/tokenizers'; // use the import from above instead of this
let words = tokenize({text: 'Hello world, my name is Manny!', includePunctuation: true});
// words = ["Hello", "world", ",", "my", "name", "is", "Manny", "!"]

Tokenize without punctuation

import {tokenize} from './src/tokenizers'; // use the import from above instead of this
let words = tokenize({text: 'Hello world, my name is Manny!'});
// words = ["Hello", "world", "my", "name", "is", "Manny"]

Documentation

See detailed documentation and live WYSIWYG playground here: https://string-punctuation-tokenizer.netlify.app/#/Tokenize