You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I went through the API docs and skimmed through source code and it appears that truncation is not supported. Note that when I manually truncated the sequence, I was able to feed it to the RoBERTa encoder.
The text was updated successfully, but these errors were encountered:
MootezSaaD
changed the title
Truncation sequences that are beyond the model's maximum length
Truncation of sequences that are beyond the model's maximum length
Jan 14, 2024
Thanks for the report! As you surmised, we don't currently support the truncation of inputs, but that error message can definitely be improved. We'll look into it, but please feel free to contribute a PR if you'd like to sort it out yourself 😃
Just wanted to add that we do support longer sequences with Curated Transformers in spaCy. We should probably provide something similar in Curated Transformers that could be used as an extension.
Hi,
First, I would like to thank you for this library :-) I'm really enjoying it.
I tried to tokenize a sequence with around 4K tokens and then fed it to a RoBERTa-based model (CodeBERT). This led to the following issue,
For reference, here was the code that I was using,
I went through the API docs and skimmed through source code and it appears that truncation is not supported. Note that when I manually truncated the sequence, I was able to feed it to the RoBERTa encoder.
The text was updated successfully, but these errors were encountered: