[Bug/Model Request]: Is slower than sentence transformer for all-minilm-l6-v2 #292

0110G · 2024-07-09T13:52:59Z

What happened?

On benchmarking synchronous computation times for generating embeddings for

Using sentence transformers: ~1300 msgs per sec

    from sentence_transformers import SentenceTransformer
    model_standard = SentenceTransformer("all-MiniLM-L6-v2")

    start_time = time.time()
    for i in range(iter_count):
        model_standard.encode(random.sample(sentences, 1)[0])
    time_standard = time.time() - start_time
    print("Standard requires: {}s".format(time_standard))
    print("{} processed per sec".format(batch_size*iter_count/time_standard))

VS

Using FastEmbed (Synchronously): 800 msgs per sec

    fast_model = TextEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")
    start_time = time.time()
    for i in range(iter_count):
        list(fast_model.embed(random.sample(sentences, 1)[0]))
    time_standard = time.time() - start_time
    print("Fast requires: {}s".format(time_standard))
    print("{} processed per sec".format(batch_size*iter_count/time_standard))

I am using fastembed 0.3.3

pip show fastembed
Name: fastembed
Version: 0.3.3
Summary: Fast, light, accurate library built for retrieval embedding generation
Home-page: https://github.com/qdrant/fastembed
Author: Qdrant Team
Author-email: [email protected]
License: Apache License
Location: /Users/<>/PycharmProjects/Voyager/venv/lib/python3.9/site-packages
Requires: tqdm, PyStemmer, numpy, mmh3, onnxruntime, pillow, onnx, loguru, tokenizers, huggingface-hub, snowballstemmer, requests
Required-by:

Why is this working so slow wrt original impl.? What can I do to improve performance ?

What Python version are you on? e.g. python --version

3.9.16

Version

0.2.7 (Latest)

What os are you seeing the problem on?

MacOS

Relevant stack traces and/or logs

No response

The text was updated successfully, but these errors were encountered:

generall · 2024-07-09T14:30:29Z

For reference, our benchmark of fastembed is here - https://colab.research.google.com/github/qdrant/fastembed/blob/main/experiments/Throughput_Across_Models.ipynb

I would have to try your version to tell for sure what's the difference, but at the first glance you are encoding one sentence at a time, while our benchmarks are in batches

0110G · 2024-07-09T15:13:09Z

I am also computing batch wise (batch size=512):

sentences = [["Some arbitrary sentence 1"]*512, ["Some arbitrary sentence 2"]*512]

0110G · 2024-07-09T15:21:53Z

Complete python benchmarking code:

import random
import time

from sentence_transformers import SentenceTransformer
from fastembed import TextEmbedding



if __name__ == '__main__':
    iter_count = 50
    batch_size = 512
    sentences = [["biblestudytools kjv romans 6"]*512, ["MS Dhoni is one of the best wicket keeper in the world"]*512] #Standard requires: 39.150851249694824s
    
    # Sentence transformers
    model_standard = SentenceTransformer("all-MiniLM-L6-v2")
    fast_model = TextEmbedding(model_name="sentence-transformers/all-MiniLM-L6-v2")

    start_time = time.time()
    for i in range(iter_count):
        model_standard.encode(random.sample(sentences, 1)[0])
    time_standard = time.time() - start_time
    print("Standard requires: {}s".format(time_standard))
    print("{} processed per sec".format(batch_size*iter_count/time_standard))

    start_time = time.time()
    for i in range(iter_count):
        list(fast_model.embed(random.sample(sentences, 1)[0]))
    time_standard = time.time() - start_time
    print("Fast requires: {}s".format(time_standard))
    print("{} processed per sec".format(batch_size*iter_count/time_standard))

Output:


Standard requires: 21.204905033111572s
1207.267844870112 processed per sec
Fast requires: 25.721112966537476s
995.2913014808091 processed per sec

generall · 2024-07-09T15:36:01Z

Thanks for sharing, we will look into it!

generall · 2024-07-09T19:43:53Z

@0110G

Refactored the testing script a bit, here are my results: https://colab.research.google.com/drive/1SroKOUZ0iYN1vo2mRXdhIQeVyy0RWQTG?usp=sharing

It uses internal batching instead of external loop, as both libraries actually provide the interface capable of creating batches internally.
If your use-case requires different batching, it apparently might not work so well with fastembed.

Additionally, tried a different scenario of inferencing individual queries, data-parallel approach and running on higher CPU machine (default colab has 2 cpus, but higher tier has 8)

0110G · 2024-07-10T03:38:51Z

My use case involves constanly consuming messages from a stream, in a batch size (configurable), computing embeddings and doing some computation and writing it to a db. Therefore your approach is not fit for my use case

Seems like fast embed is not so fast after all.

generall · 2024-07-10T09:03:00Z

@0110G
I think I understood the problem: when you call embed function in fastembed, it spawns workers each time. So, it would create an overhead.

I tried to convert fastembed version into steaming with python generators, so the embed function is only called once: https://colab.research.google.com/drive/1X03qTpBVNGDYs82CztfpqF2JOq_-75hK?usp=sharing

Please let me know if this option is closer to your use-case.

0110G · 2024-07-10T13:28:19Z

This works but I am not getting the similar results to what you showed on collab. Sentence transormers is still faster for me.
I find this absurd how can onnx model be slower than the actual implemenation

joein · 2024-07-10T13:32:29Z

hi @0110G

Actually, I've encountered several cases, when onnx model was slower on mac os, the issue might be in onnxruntime

generall · 2024-07-10T13:34:18Z

I was running colab on a higher tier machine with 8cpu, it might be the reason

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug/Model Request]: Is slower than sentence transformer for all-minilm-l6-v2 #292

[Bug/Model Request]: Is slower than sentence transformer for all-minilm-l6-v2 #292

0110G commented Jul 9, 2024

generall commented Jul 9, 2024

0110G commented Jul 9, 2024

0110G commented Jul 9, 2024

generall commented Jul 9, 2024

generall commented Jul 9, 2024

0110G commented Jul 10, 2024

generall commented Jul 10, 2024

0110G commented Jul 10, 2024

joein commented Jul 10, 2024

generall commented Jul 10, 2024

[Bug/Model Request]: Is slower than sentence transformer for all-minilm-l6-v2 #292

[Bug/Model Request]: Is slower than sentence transformer for all-minilm-l6-v2 #292

Comments

0110G commented Jul 9, 2024

What happened?

What Python version are you on? e.g. python --version

Version

What os are you seeing the problem on?

Relevant stack traces and/or logs

generall commented Jul 9, 2024

0110G commented Jul 9, 2024

0110G commented Jul 9, 2024

generall commented Jul 9, 2024

generall commented Jul 9, 2024

0110G commented Jul 10, 2024

generall commented Jul 10, 2024

0110G commented Jul 10, 2024

joein commented Jul 10, 2024

generall commented Jul 10, 2024