-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Connection closures running piko
as StatefulSet
#147
Comments
Hey @yquansah, thanks for raising this I can't think of anything off the top of my head, but I'll play about this this later today to try and reproduce
I'd be surprised if it was caused by gossip, which only gossips around what upstreams are connected, but won't (or at least shouldn't) close connections |
@andydunstall Yeah nice. I am also going to play around with it later this afternoon. Thanks! |
Hey @yquansah, thanks for the above example, I'm able to reproduce So when you run with multiple nodes, if you connect to node N1 but the upstream is connected to N2, then N1 forwards all traffic to N2 Since Piko forward connects to the server using WebSockets (to wrap the underlying TCP connection and work with a HTTP load balancer), if N1 receives the WebSocket connection that it needs to forward to N2, it falls back to using the HTTP proxy to forward the WebSocket to N2 However by default the HTTP proxy has a timeout of 30 seconds ( |
@andydunstall Really great information here, I appreciate the sleuthing, and it definitely makes sense. We'll go with the proxy timeout for now as you suggested for a quick fix. |
hey, we gave this a whirl and are getting this error when the container starts:
here is the yaml for the deployment:
perhaps I'm doing something wrong here? |
Ah sorry I forgot its required, I'll add a patch now A quick work around is just set a very large timeout (such as (Sorry I haven't had much time to fix the underlying issue, I'll try and get to soon) |
I've merged #151 to allow a timeout of 0, so if you re-build main it should be ok |
@andydunstall Quick one. Thank you! |
awesome! could I trouble you for a tagged image @andydunstall ? right now we are pulling |
yep sure will tag |
@condaatje Thats done: |
awesome thanks! |
by the way - if you'd like to join our alpha (launching today) we'd be honored to have you! especially because we've absolutely loved working with piko so far: https://x.com/hyperbolic_labs/status/1823779096650015026 |
Thanks! I don't have anything I could use a GPU for, but the product looks great! |
I've merged #155, which only applies So I'll close this for now. Let me know if theres anything else I can help with! |
appreciated! and this is great - could I trouble you for a |
@condaatje sure tagged |
thanks! |
It seems as though when I run
piko
as aStatefulSet
usingtcp
, I am running into random connection closures (from the agent?).These are the logs I see from the
agent
:One interesting thing I noticed as that the connection closures seem to happen at a regular interval from the time a connection is opened (30s). This does not happen when I just run 1 replica of the server interestingly enough, only when I run more than 1 using the gossip protocol.
Here is a repro config, running the workload on kubernetes (minikube cluster)...
I will try and dig into the
gossip
protocol tomorrow, but just wanted to raise this issue in case there were any quick hints from your end @andydunstall.Note
The image names would need to be changed from the above config. The images there were just from my local Docker build.
The text was updated successfully, but these errors were encountered: