Replies: 3 comments 2 replies
-
Hi @bpben, thanks for reporting this! We'll look into it and update here. |
Beta Was this translation helpful? Give feedback.
-
Alright, first of all: your config looks correct. I tried to replicate this, but on my machine each
How did you verify this? When you run Also, can you provide a |
Beta Was this translation helpful? Give feedback.
-
Thanks, just redid the environment, getting now 16 seconds, which is manageable, though not ideal. The GPU isn't busy, the utilization doesn't even budge. Not sure if I should be seeing that. Pip freeze is below. By the way - is the expected result some categorization?
|
Beta Was this translation helpful? Give feedback.
-
Hi all,
I'm running the example from the docs and it appears to load the model (Falcon-7B-instruct) to the GPU, but when I run
nlp("You look gorgeous!")
, it takes ~3 minutes. A call to the transformers library using the model for classification or CausalLM only takes a few seconds.Am I doing something wrong? My cfg is slightly modified from the example:
[system]
gpu_allocator = "pytorch"
[nlp]
lang = "en"
pipeline = ["llm"]
[components]
[components.llm]
factory = "llm"
[components.llm.task]
@llm_tasks = "spacy.TextCat.v2"
labels = ["COMPLIMENT", "INSULT"]
[components.llm.model]
@llm_models = "spacy.Falcon.v1"
name = "falcon-7b-instruct"
config_init = {"device": "cuda:0"}
Beta Was this translation helpful? Give feedback.
All reactions