Update _load_sbert_model Parameters and Fix Tokenize Padding #122

pascalhuerten · 2024-07-17T11:39:25Z

Summary

This pull request addresses two significant updates to the InstructorEmbedding functionality to improve compatibility and performance when using sentence-transformers version 3.0.1.

Changes Introduced:

Refactor: Added Missing Parameters to _load_sbert_model for Enhanced Compatibility
- Parameters added:
  - local_files_only=False
  - model_kwargs=None
  - tokenizer_kwargs=None
  - config_kwargs=None
- These parameters were missing and are now included to ensure compatibility with sentence-transformers 3.0.1.
Refactor: Updated tokenize Method's Padding Parameter Back to True
- Changed the padding parameter from "max_length" back to True.
- This modification addresses a significant performance bottleneck observed in encoding operations, particularly impacting softmax and linear layers.
- The performance profile comparison underscores the benefit of this change.

Performance Comparison:

With padding=True:

ncalls  tottime  percall  cumtime  percall  filename:lineno(function)
145     0.558    0.004    0.558    0.004    {built-in method torch._C._nn.linear}
24      0.083    0.003    0.440    0.018    .venv\lib\site-packages\transformers\models\t5\modeling_t5.py:279(forward)
24      0.041    0.002    0.041    0.002    {method 'softmax' of 'torch._C.TensorBase' objects}

With padding="max_length":

24      3.274    0.136    3.274    0.136    {method 'softmax' of 'torch._C.TensorBase' objects}
145     3.253    0.022    3.253    0.022    {built-in method torch._C._nn.linear}
24      0.625    0.026    5.479    0.228    .venv\lib\site-packages\transformers\models\t5\modeling_t5.py:445(forward)

Additional Note

There might have been a valid reason to choose padding="max_length" for better performance during parallelization. However, I did not test for this case and therefore cannot speak for it. If so, there should at least be a parameter to choose the padding strategy.

Testing:

The updated methods were tested using sentence-transformers 3.0.1 on a CPU with a batch size of 1 on a single sentence to validate the functional and performance improvements.

Please review the changes and let me know if any further adjustments are necessary.

Thank you for considering this request. I look forward to your feedback.

…d dependency to correspond to newer feature, update target name

…to fix error: Instructor._load_sbert_model() got an unexpected keyword argument 'local_files_only'

padding="max_length" lead to 10x slower encoding speed on cpu. Especially linear and softmax function were impacted by this. Dynamic padding solved this performance bottleneck.

BBC-Esq and others added 7 commits April 13, 2024 16:42

fix class names ASAP!!!

6288c96

modify to fix not being able to use local model

30a598d

refactor: correct library name and logic for local files only, update…

188ee78

…d dependency to correspond to newer feature, update target name

refactor: returning the _ to the name

56c7449

chore: increased version

1c0286e

refactor: update _load_sbert_model method with additional parameters …

229b653

…to fix error: Instructor._load_sbert_model() got an unexpected keyword argument 'local_files_only'

refactor: update tokenize padding parameter back to True.

4721e73

padding="max_length" lead to 10x slower encoding speed on cpu. Especially linear and softmax function were impacted by this. Dynamic padding solved this performance bottleneck.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update _load_sbert_model Parameters and Fix Tokenize Padding #122

Update _load_sbert_model Parameters and Fix Tokenize Padding #122

pascalhuerten commented Jul 17, 2024

Update _load_sbert_model Parameters and Fix Tokenize Padding #122

Are you sure you want to change the base?

Update _load_sbert_model Parameters and Fix Tokenize Padding #122

Conversation

pascalhuerten commented Jul 17, 2024

Summary

Changes Introduced:

Performance Comparison:

Additional Note

Testing: