Gensim provides platform to train word embedding on your custom data. Please follow following steps.
Create Virtual Environment
virtualenv --python=<your python path> <name_env>
Activate the Environment
source <name of environment>/bin/activate
- Install the requirements.
pip install -r requirements.txt
- Make sure you have made an .env file with PROJECT_PATH, MODEL_NAME, LANGUAGE (language of text you are training) and path of your data file.
Your data should be a text file with clean, preprocessed texts, a sentence per line.
- Now, train by running the run file.
## Loading Gensim Model ##
After running the file, your model will be saved and you can load the gensim model with the module provided inside the file itself.
## Fetching Word Vectors##
After loading the gensim model, you can fetch vector og single word as follow.
## Getting Similar Words ##
Refere the module return_similar.
For more features. DO follow the following tutorial.
## Visualization of Gensim Word Vector Modules in Tensorflow
To visualize the trained embedding in the tensorboard. Run the file. Make sure you have valid model path.
After creating model checkpoint of the embedding and making a corresponding metadata file containing words.
To visualize:
tensorboard --logdir=tensorboard/