This repository contains the supporting code for Semantic Similarity Covariance Matrix Shrinkage, published in Findings of EMNLP 2023. It implements the methods for shrinking covariance matrices using a cosine similarity target.
This project requires Python 3.8 or greater. Clone the repository and install the module:
python3.8 -m pip install .
The library requires a cosine similarity matrix that can be generated from normalized embeddings as an input. Assuming a set of k
embeddings
of dimension p
stored in a PyTorch [k,p]
tensor, the similarity matrix can be built using:
import torch
normalized_embeddings = torch.nn.functional.normalize(embeddings)
similarity_matrix = normalized_embeddings @ normalized_embeddings.t()
Assuming the random variable observations (e.g., stock price returns) are available as a [N,p]
tensor called returns
, the shrunk covariance matrix can be computed directly using:
from semantic_shrinkage import SemanticShrinkage
shrunk_covariance_matrix = SemanticShrinkage.from_returns(
returns, similarity_matrix
).get_shrunk_covariance()
We ❤️ contributions.
Have you had a good experience with this project? Why not share some love and contribute code, or just let us know about any issues you had with it?
We welcome issue reports here; be sure to choose the proper issue template for your issue, so that we can be sure you're providing the necessary information.
Before sending a Pull Request, please make sure you read our Contribution Guidelines.
Please read the LICENSE file.
This project has adopted a Code of Conduct. If you have any concerns about the Code, or behavior which you have experienced in the project, please contact us at [email protected].
Please refer to the project Security Policy.