-
Notifications
You must be signed in to change notification settings - Fork 163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BERT for Patents yields 1024 element array, but embedding_v1 is 64 element #49
Comments
The model to generate embedding_v1 has not been released, and we also haven't released pre-embedded patents with the BERT model in BigQuery. You could experiment with learning a mapping from BERT to embedding_v1 with a linear layer - they should match up well because they're both based on text. embedding_v1 is a set-of-words unigram model. |
Can you give some insight into how you dealt with limited window size for BERT? |
Thanks for that quick response. |
This repo is great. Thank you! Any plans to release the model that generated embedding_v1 or the BERT pre-embedded patents? |
How should I generate an embedding equivalent to embedding_v1? BERT for Patents generates a 1024 element embedding, but the embedding_v1 is a 64 element embedding.
The text was updated successfully, but these errors were encountered: