Skip to content

Snippet of code to train word embedding using gensim and visualizing the result using Tensorboard.

Notifications You must be signed in to change notification settings

sarmilaupadhyaya/WordEmbeddingTrainingVisualization

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

29 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Training Word Embedding using Gensim

Gensim provides platform to train word embedding on your custom data. Please follow following steps.

  • Create Virtual Environment

    
    virtualenv --python=<your python path> <name_env>
    
    
  • Activate the Environment

    
    source <name of environment>/bin/activate
    
    

- Install the requirements.

pip install -r requirements.txt


- Make sure you have made an .env file with PROJECT_PATH, MODEL_NAME, LANGUAGE (language of text you are training) and path of your data file.

Your data should be a text file with clean, preprocessed texts, a sentence per line.

- Now, train by running the run file.




## Loading Gensim Model ##
After running the run.py file, your model will be saved and you can load the gensim model with the module provided inside the file itself.

## Fetching Word Vectors##
After loading the gensim model, you can fetch vector og single word as follow.

## Getting Similar Words ##

Refere the module return_similar.

For more features. DO follow the following tutorial.
[Gensim](https://radimrehurek.com/gensim/auto_examples/tutorials/run_word2vec.html#sphx-glr-auto-examples-tutorials-run-word2vec-py)


## Visualization of Gensim Word Vector Modules in Tensorflow

To visualize the trained embedding in the tensorboard. Run the get_tensorboard_representation.py file. Make sure you have valid model path. 

After creating model checkpoint of the embedding and making a corresponding metadata file containing words. 
To visualize:

tensorboard --logdir=tensorboard/




About

Snippet of code to train word embedding using gensim and visualizing the result using Tensorboard.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages