Release word2vec-google-news-300 · piskvorky/gensim-data

Pre-trained vectors trained on a part of the Google News dataset (about 100 billion words). The model contain vectors for 3 million words and phrases. The phrases were obtained using a simple data-driven approach described in "Distributed Representations of Words and Phrases and their Compositionality".

Feature	Description
File size	1.6GB
Number of vectors	3000000
Dimension	300

https://code.google.com/archive/p/word2vec/
Efficient Estimation of Word Representations in Vector Space
Distributed Representations of Words and Phrases and their Compositionality
Linguistic Regularities in Continuous Space Word Representations

Example

import gensim.downloader as api

model = api.load("word2vec-google-news-300")
model.most_similar(positive=["king", "woman"], negative=["man"])

"""
Output:

[(u'queen', 0.7118192911148071),
 (u'monarch', 0.6189674139022827),
 (u'princess', 0.5902431011199951),
 (u'crown_prince', 0.5499460697174072),
 (u'prince', 0.5377321243286133),
 (u'kings', 0.5236844420433044),
 (u'Queen_Consort', 0.5235945582389832),
 (u'queens', 0.518113374710083),
 (u'sultan', 0.5098593235015869),
 (u'monarchy', 0.5087411999702454)]

"""

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

word2vec-google-news-300