-
Notifications
You must be signed in to change notification settings - Fork 121
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug/Model Request]: Is slower than sentence transformer for all-minilm-l6-v2 #292
Comments
For reference, our benchmark of fastembed is here - https://colab.research.google.com/github/qdrant/fastembed/blob/main/experiments/Throughput_Across_Models.ipynb I would have to try your version to tell for sure what's the difference, but at the first glance you are encoding one sentence at a time, while our benchmarks are in batches |
I am also computing batch wise (batch size=512):
|
Complete python benchmarking code:
Output:
|
Thanks for sharing, we will look into it! |
Refactored the testing script a bit, here are my results: https://colab.research.google.com/drive/1SroKOUZ0iYN1vo2mRXdhIQeVyy0RWQTG?usp=sharing It uses internal batching instead of external loop, as both libraries actually provide the interface capable of creating batches internally. Additionally, tried a different scenario of inferencing individual queries, data-parallel approach and running on higher CPU machine (default colab has 2 cpus, but higher tier has 8) |
My use case involves constanly consuming messages from a stream, in a batch size (configurable), computing embeddings and doing some computation and writing it to a db. Therefore your approach is not fit for my use case Seems like fast embed is not so fast after all. |
@0110G I tried to convert fastembed version into steaming with python generators, so the Please let me know if this option is closer to your use-case. |
This works but I am not getting the similar results to what you showed on collab. Sentence transormers is still faster for me. |
hi @0110G Actually, I've encountered several cases, when onnx model was slower on mac os, the issue might be in onnxruntime |
I was running colab on a higher tier machine with 8cpu, it might be the reason |
What happened?
On benchmarking synchronous computation times for generating embeddings for
VS
I am using fastembed 0.3.3
Why is this working so slow wrt original impl.? What can I do to improve performance ?
What Python version are you on? e.g. python --version
3.9.16
Version
0.2.7 (Latest)
What os are you seeing the problem on?
MacOS
Relevant stack traces and/or logs
No response
The text was updated successfully, but these errors were encountered: