Inference seems slow in my current setup #272

bpben · 2023-08-25T13:28:59Z

bpben
Aug 25, 2023

Hi all,
I'm running the example from the docs and it appears to load the model (Falcon-7B-instruct) to the GPU, but when I run nlp("You look gorgeous!"), it takes ~3 minutes. A call to the transformers library using the model for classification or CausalLM only takes a few seconds.

Am I doing something wrong? My cfg is slightly modified from the example:

[system]
gpu_allocator = "pytorch"

[nlp]
lang = "en"
pipeline = ["llm"]

[components]

[components.llm]
factory = "llm"

[components.llm.task]
@llm_tasks = "spacy.TextCat.v2"
labels = ["COMPLIMENT", "INSULT"]

[components.llm.model]
@llm_models = "spacy.Falcon.v1"
name = "falcon-7b-instruct"
config_init = {"device": "cuda:0"}

rmitsch · 2023-08-28T06:36:55Z

rmitsch
Aug 28, 2023
Maintainer

Hi @bpben, thanks for reporting this! We'll look into it and update here.

0 replies

rmitsch · 2023-08-28T07:03:48Z

rmitsch
Aug 28, 2023
Maintainer

Alright, first of all: your config looks correct. I tried to replicate this, but on my machine each nlp("You look gorgeous!") takes about a second.

...and it appears to load the model (Falcon-7B-instruct) to the GPU...

How did you verify this? When you run nlp("You look gorgeous!"), is your GPU busy? (You can check this e. g. with watch -n 1 nvidia-smi on a Linux system).

Also, can you provide a pip freeze of your venv?

0 replies

bpben · 2023-08-28T14:51:21Z

bpben
Aug 28, 2023
Author

Thanks, just redid the environment, getting now 16 seconds, which is manageable, though not ideal. The GPU isn't busy, the utilization doesn't even budge. Not sure if I should be seeing that. Pip freeze is below.

By the way - is the expected result some categorization? doc.cats returns {'COMPLIMENT': 0.0, 'INSULT': 0.0}. I don't have easy access to the OpenAI API, so I haven't tested the output. My understanding is there's some langchain prompt engineering going on under the hood, I'd imagine something might come back even from this tiny model.

annotated-types==0.5.0
asttokens==2.2.1
backcall==0.2.0
blis==0.7.10
catalogue==2.0.9
certifi==2023.7.22
charset-normalizer==3.2.0
click==8.1.7
cmake==3.25.0
comm==0.1.4
confection==0.1.1
cupy-cuda11x==12.2.0
cupy-wheel==12.2.0
cymem==2.0.7
debugpy==1.6.7.post1
decorator==5.1.1
einops==0.6.1
executing==1.2.0
fastrlock==0.8.1
filelock==3.9.0
fsspec==2023.6.0
huggingface-hub==0.16.4
idna==3.4
ipykernel==6.25.1
ipython==8.14.0
jedi==0.19.0
Jinja2==3.1.2
jupyter_client==8.3.0
jupyter_core==5.3.1
langcodes==3.3.0
lit==15.0.7
MarkupSafe==2.1.3
matplotlib-inline==0.1.6
mpmath==1.2.1
murmurhash==1.0.9
nest-asyncio==1.5.7
networkx==3.0
numpy==1.25.2
packaging==23.1
parso==0.8.3
pathy==0.10.2
pexpect==4.8.0
pickleshare==0.7.5
platformdirs==3.10.0
preshed==3.0.8
prompt-toolkit==3.0.39
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.3.0
pydantic_core==2.6.3
Pygments==2.16.1
python-dateutil==2.8.2
PyYAML==6.0.1
pyzmq==25.1.1
regex==2023.8.8
requests==2.31.0
safetensors==0.3.3
six==1.16.0
smart-open==6.3.0
spacy==3.6.1
spacy-legacy==3.0.12
spacy-llm==0.4.3
spacy-loggers==1.0.4
srsly==2.4.7
stack-data==0.6.2
sympy==1.11.1
thinc==8.1.12
tokenizers==0.13.3
torch==2.0.1+cu118
tornado==6.3.3
tqdm==4.66.1
traitlets==5.9.0
transformers==4.32.0
triton==2.0.0
typer==0.9.0
typing_extensions==4.7.1
urllib3==2.0.4
wasabi==1.1.2
wcwidth==0.2.6

2 replies

bpben Aug 28, 2023
Author

I stand corrected again - realized I had updated torch without updating xformers. With an update to xformers, I get 3 second response time.

I still don't see any predictions for those labels, but maybe I can raise another issues for that?

rmitsch Aug 29, 2023
Maintainer

Great to hear that this worked. The prediction issue is a known one - so far we can guarantee that our tasks work on GPT-3.5, but we can't do so for all provided models. Falcon returns unparseable responses for this prompt, I checked. Cross-model task compatibility is on our agenda. Until then you can experiment with prompts and use your own for this task.

An issue for this would be welcome!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference seems slow in my current setup #272

{{title}}

Replies: 3 comments 2 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Inference seems slow in my current setup #272

bpben Aug 25, 2023

Replies: 3 comments · 2 replies

rmitsch Aug 28, 2023 Maintainer

rmitsch Aug 28, 2023 Maintainer

bpben Aug 28, 2023 Author

bpben Aug 28, 2023 Author

rmitsch Aug 29, 2023 Maintainer

bpben
Aug 25, 2023

Replies: 3 comments 2 replies

rmitsch
Aug 28, 2023
Maintainer

rmitsch
Aug 28, 2023
Maintainer

bpben
Aug 28, 2023
Author

bpben Aug 28, 2023
Author

rmitsch Aug 29, 2023
Maintainer