Norm-and-Variance

Norm of Mean Contextualized Embeddings Determines their Variance
Hiroaki Yamagiwa, Hidetoshi Shimodaira
COLING 2025

Setup

This repository is intended to be run in a Docker environment. If you are not familiar with Docker, please install the packages listed in requirements.txt.

Docker build

Create a Docker image as follows:

$ bash script/docker/build.sh

Environment variable

Set the DOCKER_HOME environment variable to specify the path of the directory to be mounted as the home directory inside the Docker container.

export DOCKER_HOME="path/to/your/docker_home"

Docker run

Run the Docker container by passing the GPU ID as an argument:

$ bash script/docker/run.sh 0

Code

Saving statistical measures of $X_t$

Using Preprocessed Data from the Experiments

Place the downloaded data in the following structure:

output/
├── datasets
│   └── bookcorpus_train_lt64_pct001_seed0.pkl
└── token_stats
    └── bookcorpus_train_lt64_pct001_seed0
        ├── bert-base-uncased.pkl
        ├── bert-large-uncased.pkl
        ├── gpt2-medium.pkl
        ├── gpt2.pkl
        ├── roberta-base.pkl
        └── roberta-large.pkl

For Reproducibility

To regenerate statistical measures:

python src/save_token_stats.py --model_name model_name

The model_name values supported are bert-base-uncased, bert-large-uncased, roberta-base, roberta-large, gpt2, gpt2-medium.

PCA Plot in Fig. 1

python src/Fig1_make_pca_scatterplot.py

This script also generates Fig. 8. and Table 2. See README.Appendix.md for more details.

Trade-off between $M(X_t)$ and $V(X_t)$ in Fig.2

python src/Fig2_make_VXt_on_MXt_scatterplot.py

🚨 Note: The color bar range in the published figure was incorrect. While the color bar for BERT was shown, the ranges were not unified across models. This issue has been fixed, and its impact is minimal.

C.V. of $Q(X_t)$, regression slopes of $V (X_t)$ on $M(X_t)$, and the corresponding $R^2$ in Fig4

python src/Fig4_make_QXtCV_MXtVXtSlope_MXtVXtR2_plot.py

Bar Graphs for $M(X)/Q(X)$, $V_W(X)/Q(X)$, $V_B(X)/Q(X)$ in Fig. 5

python src/Fig5_make_MXVwXVbX_per_QX_bargraph.py

Plot of $V_W(X)/V(X)$ in Fig.6

python src/Fig6_make_VwX_per_VX_plot.py

Scatter plots of $Q(X_t)$, $M(X_t)$, and $V(X_t)$ against $\textrm{log}_{10}n_t$ in Fig.7

python src/Fig7_make_BERTbase_QXt_MXt_VXt_scatterplot.py

References

The code for generating embeddings was inspired by:

Wannasuphoprasit et al. Solving Cosine Similarity Underestimation between High Frequency Words by $\ell_2$ Norm Discounting. ACL 2023 Findings.

We sincerely thank the authors for sharing their LivNLP/cosine-discounting codebase.

Appendix

See README.Appendix.md for the experiments in the Appendix.

Note

Since the URLs of published datasets may change, please refer to the GitHub repository URL instead of the direct URL when referencing in papers, etc.
This directory was created by Hiroaki Yamagiwa.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.github/images		.github/images
scripts/docker		scripts/docker
src		src
.gitignore		.gitignore
Dockerfile		Dockerfile
README.Appendix.md		README.Appendix.md
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Norm-and-Variance

Setup

Docker build

Environment variable

Docker run

Code

Saving statistical measures of $X_t$

Using Preprocessed Data from the Experiments

For Reproducibility

PCA Plot in Fig. 1

Trade-off between $M(X_t)$ and $V(X_t)$ in Fig.2

C.V. of $Q(X_t)$, regression slopes of $V (X_t)$ on $M(X_t)$, and the corresponding $R^2$ in Fig4

Bar Graphs for $M(X)/Q(X)$, $V_W(X)/Q(X)$, $V_B(X)/Q(X)$ in Fig. 5

Plot of $V_W(X)/V(X)$ in Fig.6

Scatter plots of $Q(X_t)$, $M(X_t)$, and $V(X_t)$ against $\textrm{log}_{10}n_t$ in Fig.7

References

Appendix

Note

About

Releases

Packages

Languages

ymgw55/Norm-and-Variance

Folders and files

Latest commit

History

Repository files navigation

Norm-and-Variance

Setup

Docker build

Environment variable

Docker run

Code

Saving statistical measures of $X_t$

Using Preprocessed Data from the Experiments

For Reproducibility

PCA Plot in Fig. 1

Trade-off between $M(X_t)$ and $V(X_t)$ in Fig.2

C.V. of $Q(X_t)$, regression slopes of $V (X_t)$ on $M(X_t)$, and the corresponding $R^2$ in Fig4

Bar Graphs for $M(X)/Q(X)$, $V_W(X)/Q(X)$, $V_B(X)/Q(X)$ in Fig. 5

Plot of $V_W(X)/V(X)$ in Fig.6

Scatter plots of $Q(X_t)$, $M(X_t)$, and $V(X_t)$ against $\textrm{log}_{10}n_t$ in Fig.7

References

Appendix

Note

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages