Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating a session in renku v2 in a resource pool with a quota that is full will not result in any error #662

Open
olevski opened this issue Feb 21, 2025 · 2 comments
Labels
bug Something isn't working

Comments

@olevski
Copy link
Member

olevski commented Feb 21, 2025

The error message about the quota getting exceeded is on the statefulset (or the statefulset events). And amalthea does not check these for errors. This should be fixed in amalthea.

Amalthea should get this error and surface it.

Currently it just stays in "NotReady" / "Starting" status forever.

@olevski olevski added the bug Something isn't working label Feb 21, 2025
@olevski
Copy link
Member Author

olevski commented Feb 21, 2025

This is an example of events that show up on the statefulset:

Events:
  Type     Reason        Age                 From                    Message
  ----     ------        ----                ----                    -------
  Warning  FailedCreate  17m (x17 over 23m)  statefulset-controller  create Pod elisabet-cap-18c263bb3dbf-0 in StatefulSet elisabet-cap-18c263bb3dbf fa
iled error: pods "elisabet-cap-18c263bb3dbf-0" is forbidden: exceeded quota: 8586da3c-af5a-4e2c-9e82-f09b622f2f78, requested: requests.memory=30752Mi,r
equests.nvidia.com/gpu=1, used: requests.memory=748288Mi,requests.nvidia.com/gpu=24, limited: requests.memory=800G,requests.nvidia.com/gpu=24
  Warning  FailedCreate  95s (x2 over 12m)   statefulset-controller  create Pod elisabet-cap-18c263bb3dbf-0 in StatefulSet elisabet-cap-18c263bb3dbf fa
iled error: pods "elisabet-cap-18c263bb3dbf-0" is forbidden: exceeded quota: 8586da3c-af5a-4e2c-9e82-f09b622f2f78, requested: requests.memory=30752Mi,r
equests.nvidia.com/gpu=1, used: requests.memory=738048Mi,requests.nvidia.com/gpu=24, limited: requests.memory=800G,requests.nvidia.com/gpu=24

@Panaetius Panaetius self-assigned this Mar 5, 2025
@Panaetius
Copy link
Member

Tried some things, mainly if Events could be reconciled on their own.

SwissDataScienceCenter/amalthea#895 has a working solution listing events for a stateful set but this is tabled for now due to the perfomande implications being scary.

@Panaetius Panaetius removed their assignment Mar 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants