Fig. 1 generation also produces Fig. 8 and Table 2.
python src/
🚨 There was a typo in the definition of
Typo (in the paper):
python src/
python src/
python src/
python src/
python src/
python src/
python src/
python src/
Calculate statistical measures of word embeddings
python src/
Generate the word-to-token count dictionary:
python src/
Place the downloaded data in the following structure:
└── word_stats
└── bookcorpus_train_lt64_pct001_seed0
├── bert-base-uncased.pkl
└── bookcorpus_bert-base-uncased_word2token_count.pkl
Plot Fig. 22:
python src/
🚨 A bug was fixed, and the dots are now plotted in the order of 1, 2, 3, and 4+. As a result, the figure differs slightly from the one in the paper.