Skip to content

Mirobit/Labelit

Repository files navigation

Labelit

Labelit is a simple and straightforward tool for fast anonymization or pseudonymization and labeling of text data. This is particularly relevant if you're planning to work with text corpora (e.g. twitter posts) that contain a lot of personal references. Labelit allows you to remove personal information while maintaining the informational and semantical content. On top of that, with Labelit you can label text corpora with self-selected classes - a necessary preprocessing step for all kinds of quantitive analyses.
Just create a project, import your data, create categories for named entities and start replacing sensitive information with your predefined words.

Warning: The software is still in early development and currently only works in a local enviroment. Do not use it on a remote server, there is no file upload yet. Labelit does not work in Safari and IE (no support planned).

Features

  • Create projects with custom categories, classifications and wordlists
  • Import a CSV/JSON file or a folder with raw text files
  • Texts and replaced words are stored AES-256 encrypted in the database
  • Easy to use texteditor: single click selection, keyboard shortcuts and duplicate detection
  • All texts can be checked against previously found sensitive words (project based wordlist)
  • Lightweight single-page application

Installation

Requirements

  • Node.js 12.13.0+
  • MongoDB 4.2+

Make sure that you installed Node.js and MongoDB is running:

git clone https://github.com/Mirobit/Labelit.git
cd Labelit
npm install --production

Now open the .env.example file and follow the instructions.

To start the app, simply run:

npm start

Go to your browser and enter http://localhost:8000/.

Use admin as username and the password from the .env file to sign in.

How To

See GUIDE.md and check the examples folder, to see how the input data needs to be structured. See MongoDB.md to find out how to change the MongoDB database location.

ToDo

  • User system
  • Be able to host server remotely and upload text data
  • Schema validation
  • Enable classification of texts (for ML)
  • Better error handling
  • Subfolder support
  • Import single file (CSV)
  • Testing with ava