-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
node.js port ? #94
Comments
Hi @chopinml Thanks for the kind words I'm not aware of any JS port of it, you may want to look at ruby versions readme for that. As you must've seen pySBD is heavily inspired from pragmatic_segmenter so all regex rules are implemented in python as well there could be difference in working of them as regex engine differs across both - python and ruby. In addition to exisiting functionalities of pragmatic_segmenter, I've added few more functionalities beneficial for NLP community like getting character offsets of sentences. Such functionalities you may not find in other implementations. Now the way further pySBD developments are happening are on rolling basis, as new issues/PRs are submitted, I review them and get it merge in consecutive release. Such issues/PRs may or may not be related to pySBD's missing functionalities which exists in other implementations. Hope above answers helps! Thanks |
I haven't seen any JS port, first found the Ruby repo and found yours and the other C# port from that README section 😃 I guess the NLP community is heavily using Python therefore node.js libraries are not very common for NLP tasks, but I'm actually trying to build a bilingual corpus from the web. So I saw that I will need a sentence aligner and browsing GitHub. Do you have any experience in node.js, is it faster for web crawling and text operations than Python ? |
@chopinml you can always use OS shell outputs to grab results from Python. #ftfy-wrapper.py
import sys
import ftfy
text = sys.argv[1]
clean = ftfy.fix_text(text)
print(clean)
sys.stdout.flush() This is my NodeJS code: import { execFile, spawn } from 'node:child_process';
import path from 'path';
import { fileURLToPath } from 'url';
const __filename = fileURLToPath(import.meta.url)
const __dirname = path.dirname(__filename)
let arg_ftfy = path.join(__dirname, './ftfy-wrapper.py')
export function mojibakeFixer(arg_text) {
return new Promise((resolve, reject) => {
execFile('python-ftfy/.venv/Scripts/activate', function () {
const ftfyApp = spawn('python', [arg_ftfy, arg_text], { env: { PYTHONIOENCODING: 'utf8' } });
ftfyApp.stdout.on('data', (data) => {
resolve(data.toString())
});
ftfyApp.stderr.on('data', (err) => {
reject(err.toString())
});
ftfyApp.on('close', (code) => {
reject(`child process exited with code ${code}`)
});
})
})
} Making pySBD runnable by a simple script like my I hope this helps. |
Hello @nipunsadvilkar ,
Thank you for your efforts to port Ruby library to Python.
Do you see any benefit it to port JavaScript (node.js) library as well? And I wonder three things
Congrats for your effort !
The text was updated successfully, but these errors were encountered: