-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Crashloop Detection #200
Labels
Comments
lucas-koehler
added a commit
that referenced
this issue
Aug 1, 2023
- Extend the resource clients for app definitions, sessions, and workspaces to be able to handle the status subresource. - Add basic status for our CRDs to track handling state of the operator - Prevent crashloops by never handling resources that are in error or handling state. The latter indicates an unexcepted crash during handling (e.g. NPE). - NOTE: The previous step was only dony for LAZY handlers. EAGER handlers did not change - Exemplary sub steps for volume claim an attachment for workspace resources - Increase resource versions for all CRs - Move API, KIND, and CRD_NAME constants from Spec to resource classes (e.g. from SessionSpec to Session) because they belong to the resource itself. This became apparent when adding the Status to the resources Note: Status handling can probably be refactored to be a bit cleaner and make use of better code-reuse. Part of #200
lucas-koehler
added a commit
that referenced
this issue
Aug 1, 2023
There are some issues with setting app definitions status that need to be fixed first
lucas-koehler
added a commit
to eclipse-theia/theia-cloud-helm
that referenced
this issue
Aug 1, 2023
- Add status to all crds. - Extend operator role to allow editing status for our crds Part of eclipse-theia/theia-cloud#200
This was referenced Aug 1, 2023
lucas-koehler
added a commit
that referenced
this issue
Aug 2, 2023
- Extend the resource clients for app definitions, sessions, and workspaces to be able to handle the status subresource. - Add basic status for our CRDs to track handling state of the operator - Prevent crashloops by never handling resources that are in error or handling state. The latter indicates an unexcepted crash during handling (e.g. NPE). - NOTE: The previous step was only dony for LAZY handlers. EAGER handlers did not change - Exemplary sub steps for volume claim an attachment for workspace resources - Increase resource versions for all CRs - Move API, KIND, and CRD_NAME constants from Spec to resource classes (e.g. from SessionSpec to Session) because they belong to the resource itself. This became apparent when adding the Status to the resources Note: Status handling can probably be refactored to be a bit cleaner and make use of better code-reuse. Part of #200
lucas-koehler
added a commit
that referenced
this issue
Aug 2, 2023
There are some issues with setting app definitions status that need to be fixed first
lucas-koehler
added a commit
that referenced
this issue
Aug 2, 2023
- AppDefinition resource version fix after rebase - Catch all exceptions when adding session, workspace resources and add error state - Do not handle resources in HANDLED state again. - Smaller fixes
lucas-koehler
added a commit
to eclipse-theia/theia-cloud-helm
that referenced
this issue
Aug 2, 2023
- Add status to all crds. - Extend operator role to allow editing status for our crds Part of eclipse-theia/theia-cloud#200
lucas-koehler
added a commit
to eclipse-theia/theia-cloud-helm
that referenced
this issue
Aug 2, 2023
- Add status to all crds. - Extend operator role to allow editing status for our crds Part of eclipse-theia/theia-cloud#200
lucas-koehler
added a commit
to eclipse-theia/theia-cloud-helm
that referenced
this issue
Aug 2, 2023
- Add status to all crds. - Extend operator role to allow editing status for our crds - Increase chart versions and use new app version Part of eclipse-theia/theia-cloud#200
lucas-koehler
added a commit
that referenced
this issue
Aug 2, 2023
NOTE: Status handling for app definitions is commented out as there are still some issues - Catch all exceptions when adding session, workspace, appdefinition resources and add error state - Extend the resource clients for app definitions, sessions, and workspaces to be able to handle the status subresource. - Add basic status for our CRDs to track handling state of the operator - Prevent crashloops by never handling resources that are in error or handling state. The latter indicates an unexcepted crash during handling (e.g. NPE). - NOTE: The previous step was only dony for LAZY handlers. EAGER handlers did not change - Exemplary sub steps for volume claim an attachment for workspace resources - Increase resource versions for all CRs - Move API, KIND, and CRD_NAME constants from Spec to resource classes (e.g. from SessionSpec to Session) because they belong to the resource itself. This became apparent when adding the Status to the resources Part of #200
lucas-koehler
added a commit
that referenced
this issue
Aug 3, 2023
Enable checking and storing of AppDefinitions' handling in the operator
jfaltermeier
pushed a commit
that referenced
this issue
Aug 3, 2023
Enable checking and storing of AppDefinitions' handling in the operator
lucas-koehler
added a commit
that referenced
this issue
Aug 29, 2023
When handling of a resource is unexpectedly interrupted without throwing an error, it currently remains in HANDLING state and is never handled again. This could happen if the operator is shut down mid handling, e.g. because of resource constraints. This commit sets the resource to ERROR state and adds a message that handling was unexpectedly interrupted before. Contributed on behalf of STMicroelectronics Signed-off-by: Lucas Koehler <[email protected]>
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
When handling of a resource is unexpectedly interrupted without throwing an error, it currently remains in HANDLING state and is never handled again. This could happen if the operator is shut down mid handling, e.g. because of resource constraints. This commit sets the resource to ERROR state and adds a message that handling was unexpectedly interrupted before. Contributed on behalf of STMicroelectronics Signed-off-by: Lucas Koehler <[email protected]>
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
NOTE: Status handling for app definitions is commented out as there are still some issues - Catch all exceptions when adding session, workspace, appdefinition resources and add error state - Extend the resource clients for app definitions, sessions, and workspaces to be able to handle the status subresource. - Add basic status for our CRDs to track handling state of the operator - Prevent crashloops by never handling resources that are in error or handling state. The latter indicates an unexcepted crash during handling (e.g. NPE). - NOTE: The previous step was only dony for LAZY handlers. EAGER handlers did not change - Exemplary sub steps for volume claim an attachment for workspace resources - Increase resource versions for all CRs - Move API, KIND, and CRD_NAME constants from Spec to resource classes (e.g. from SessionSpec to Session) because they belong to the resource itself. This became apparent when adding the Status to the resources Part of #200
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
Enable checking and storing of AppDefinitions' handling in the operator
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
When handling of a resource is unexpectedly interrupted without throwing an error, it currently remains in HANDLING state and is never handled again. This could happen if the operator is shut down mid handling, e.g. because of resource constraints. This commit sets the resource to ERROR state and adds a message that handling was unexpectedly interrupted before. Contributed on behalf of STMicroelectronics Signed-off-by: Lucas Koehler <[email protected]>
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
NOTE: Status handling for app definitions is commented out as there are still some issues - Catch all exceptions when adding session, workspace, appdefinition resources and add error state - Extend the resource clients for app definitions, sessions, and workspaces to be able to handle the status subresource. - Add basic status for our CRDs to track handling state of the operator - Prevent crashloops by never handling resources that are in error or handling state. The latter indicates an unexcepted crash during handling (e.g. NPE). - NOTE: The previous step was only dony for LAZY handlers. EAGER handlers did not change - Exemplary sub steps for volume claim an attachment for workspace resources - Increase resource versions for all CRs - Move API, KIND, and CRD_NAME constants from Spec to resource classes (e.g. from SessionSpec to Session) because they belong to the resource itself. This became apparent when adding the Status to the resources Part of #200
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
Enable checking and storing of AppDefinitions' handling in the operator
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
When handling of a resource is unexpectedly interrupted without throwing an error, it currently remains in HANDLING state and is never handled again. This could happen if the operator is shut down mid handling, e.g. because of resource constraints. This commit sets the resource to ERROR state and adds a message that handling was unexpectedly interrupted before. Contributed on behalf of STMicroelectronics Signed-off-by: Lucas Koehler <[email protected]>
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
NOTE: Status handling for app definitions is commented out as there are still some issues - Catch all exceptions when adding session, workspace, appdefinition resources and add error state - Extend the resource clients for app definitions, sessions, and workspaces to be able to handle the status subresource. - Add basic status for our CRDs to track handling state of the operator - Prevent crashloops by never handling resources that are in error or handling state. The latter indicates an unexcepted crash during handling (e.g. NPE). - NOTE: The previous step was only dony for LAZY handlers. EAGER handlers did not change - Exemplary sub steps for volume claim an attachment for workspace resources - Increase resource versions for all CRs - Move API, KIND, and CRD_NAME constants from Spec to resource classes (e.g. from SessionSpec to Session) because they belong to the resource itself. This became apparent when adding the Status to the resources Part of #200
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
Enable checking and storing of AppDefinitions' handling in the operator
jfaltermeier
pushed a commit
that referenced
this issue
Aug 30, 2023
When handling of a resource is unexpectedly interrupted without throwing an error, it currently remains in HANDLING state and is never handled again. This could happen if the operator is shut down mid handling, e.g. because of resource constraints. This commit sets the resource to ERROR state and adds a message that handling was unexpectedly interrupted before. Contributed on behalf of STMicroelectronics Signed-off-by: Lucas Koehler <[email protected]>
jfaltermeier
pushed a commit
to eclipse-theia/theia-cloud-helm
that referenced
this issue
Aug 30, 2023
- Add status to all crds. - Extend operator role to allow editing status for our crds - Increase chart versions and use new app version Part of eclipse-theia/theia-cloud#200
jfaltermeier
pushed a commit
to eclipse-theia/theia-cloud-helm
that referenced
this issue
Aug 30, 2023
- Add status to all crds. - Extend operator role to allow editing status for our crds - Increase chart versions and use new app version Part of eclipse-theia/theia-cloud#200
This issue is stale because it has been open for 180 days with no activity. |
Keep open |
This issue is stale because it has been open for 180 days with no activity. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Is your feature request related to a problem? Please describe.
The operator should be able to detect crash loops and skip over problematic custom resources.
Reproducer Session
theia-cloud branch osweek23-1
theia-cloud-helm branch osweek23-1
terraform test-configuration 2-01_try-now
kubectl apply -f osweek23/empty-session.yaml
Reproducer Workspace
theia-cloud branch osweek23-1
theia-cloud-helm branch osweek23-1
terraform test-configuration 2-01_try-now
kubectl apply -f osweek23/empty-workspace.yaml
Reproducer AppDefinition
theia-cloud branch osweek23-2
theia-cloud-helm branch osweek23-1
terraform test-configuration 2-01_try-now
kubectl apply -f osweek23/coffee-session.yaml
Describe the solution you'd like
We may use additional fields in the custom resources metadata to save the current step the operator way making and number of tries for this step.
If the operator sees that there were already multiple tries to read the container image from an app definition, it may assume that this will fail and skip over it.
Describe alternatives you've considered
Cluster provider
No response
Additional information
No response
The text was updated successfully, but these errors were encountered: