Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval bug: llama-server stopped working after PR #11285 got merged #11335

Open
tim-janik opened this issue Jan 21, 2025 · 7 comments
Open

Eval bug: llama-server stopped working after PR #11285 got merged #11335

tim-janik opened this issue Jan 21, 2025 · 7 comments
Assignees

Comments

@tim-janik
Copy link

Name and Version

llama-server f30f099

Operating systems

Linux

GGML backends

CUDA

Hardware

RTX 4090, CUDA

Models

E.g. Code Qwen 2.5 7B-Chat (Q8)

Problem description & steps to reproduce

llama-server stopped generating any tokens for me, regardless of model, starting with commit f30f099 from #11285.
Simply reverting the above commit, e.g. on top of todays master (6171c9d) does fix the issue for me.

To reproduce, goto http://localhost:8080, enter a question hit return, nothing happens.

First Bad Commit

f30f099

Relevant log output

│ main: server is listening on http://0.0.0.0:8080 - starting the main loop                                                                                                                                                                                 │                
             │ srv  update_slots: all slots are idle                                                                                                                                                                                                                     │                
             │ request: GET / 127.0.0.1 200                                                                                                                                                                                                                              │                
             │ request: GET /favicon.ico  400                                                                                                                                                                                                                            │                
             │ request: POST /v1/chat/completions  400                                                                                                                                                                                                                   │
@ngxson ngxson self-assigned this Jan 21, 2025
@ngxson
Copy link
Collaborator

ngxson commented Jan 21, 2025

Can you send the request via curl to see what is the response? (Or maybe see the response from browser's devtool)

From what I saw on your log, it responses with status code 400, meaning ERROR_TYPE_INVALID_REQUEST. This could potentially be due to upstream changes from httplib library, not a bug from llama-server.

@ngxson
Copy link
Collaborator

ngxson commented Jan 21, 2025

I'm not able to reproduce the bug on either my laptop (macbook M3) or on server (linux - nvidia T4)

@ngxson
Copy link
Collaborator

ngxson commented Jan 21, 2025

I did more tests but still can't reproduce the issue. Please provide more info about your setup:

  • What is the browser you're using
  • What happen when the request is sent via curl or wget or postman
  • What show up if you access some other endpoints, for example /api/models

Also from your log, request: GET /favicon.ico 400, this means even non-API paths returns 400 error which is wrong. It's further strengthen my suspicion about bug in httplib.

@tim-janik
Copy link
Author

I did more tests but still can't reproduce the issue. Please provide more info about your setup:

* What is the browser you're using

* What happen when the request is sent via `curl` or `wget` or postman

* What show up if you access some other endpoints, for example `/api/models`

Also from your log, request: GET /favicon.ico 400, this means even non-API paths returns 400 error which is wrong. It's further strengthen my suspicion about bug in httplib.

Thanks for the hint. I reverted f30f099 and only applied the httplib 0.18.5 change.
Turns out I can only trigger this issue if the URL contains a %0A, like this:

http://localhost:8080/?something=%0A

Which explains why only I saw it, I probably still had some q= or m= arg left over from playing with #11150 (btw, I'd appreciate answers to the two questions I posed in that PR if you could afford the time).

Which means this httplib change breaks multi-line use cases of #11150, do you have any idea why the newer httplib would break with URL encoded newlines?

@ngxson
Copy link
Collaborator

ngxson commented Jan 22, 2025

Ok sorry I forgot about #11150, I'll have a look later because it's not the priority.

I'm tagging author of httplib @yhirose too, maybe you have an idea why it returns 400 even for non-handler endpoints? (For example, /favicon.ico that should have returned 404 instead of 400)

@ngxson
Copy link
Collaborator

ngxson commented Jan 23, 2025

OK I think I pinpointed the problem. It only happens on OPTIONS request, which causes the memory to be corrupted somewhere. I repeatedly send OPTIONS request and got this in the log:

request:    400
request:    400
request: OPTIONS /tokenize 127.0.0.1 200
request: ndled   400
request: root   400
request: root   400

@ngxson
Copy link
Collaborator

ngxson commented Jan 24, 2025

Related to yhirose/cpp-httplib#2028

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants