word2vec-google-news-300
menshikh-iv
released this
09 Nov 08:50
·
28 commits
to master
since this release
Pre-trained vectors trained on a part of the Google News dataset (about 100 billion words). The model contain vectors for 3 million words and phrases. The phrases were obtained using a simple data-driven approach described in "Distributed Representations of Words and Phrases and their Compositionality".
Feature | Description |
---|---|
File size | 1.6GB |
Number of vectors | 3000000 |
Dimension | 300 |
Read more:
- https://code.google.com/archive/p/word2vec/
- Efficient Estimation of Word Representations in Vector Space
- Distributed Representations of Words and Phrases and their Compositionality
- Linguistic Regularities in Continuous Space Word Representations
Example
import gensim.downloader as api
model = api.load("word2vec-google-news-300")
model.most_similar(positive=["king", "woman"], negative=["man"])
"""
Output:
[(u'queen', 0.7118192911148071),
(u'monarch', 0.6189674139022827),
(u'princess', 0.5902431011199951),
(u'crown_prince', 0.5499460697174072),
(u'prince', 0.5377321243286133),
(u'kings', 0.5236844420433044),
(u'Queen_Consort', 0.5235945582389832),
(u'queens', 0.518113374710083),
(u'sultan', 0.5098593235015869),
(u'monarchy', 0.5087411999702454)]
"""