Skip to content

OSC/job-pod-reaper

Repository files navigation

CI Status GitHub release GitHub All Releases Go Report Card codecov

job-pod-reaper

Kubernetes service that can reap pods that have run past their lifetime.

This reaping is intended to be run against pods that act like short lived jobs. Additional resources with the same job label as the expired pod will also be reaped.

Current list of resources that can be reaped:

  • Pod
  • Service
  • ConfigMap
  • Secret

Metrics about the count of reaped resources, duration of last reaping, and error counts can be queried using Prometheus /metrics endpoint exposed as a Service on port 8080.

Kubernetes support

Currently this code is built and tested against Kubernetes 1.29.x.

The Kubernetes APIs used by this project do not tend to change between Kubernetes releases so it is likely this code will work on all 1.x releases of Kubernetes.

Install

Install with Helm

Only Helm 3 is supported.

helm repo add job-pod-reaper https://osc.github.io/job-pod-reaper
helm install job-pod-reaper job-pod-reaper/job-pod-reaper -n job-pod-reaper --create-namespace

For Open OnDemand the following adjustments can be made to get a working install using Helm:

helm install job-pod-reaper job-pod-reaper/job-pod-reaper \
-n job-pod-reaper --create-namespace \
--set config.reapNamespaces=false \
--set config.namespaceLabels='app.kubernetes.io/name=open-ondemand' \
--set config.objectLabels='app.kubernetes.io/managed-by=open-ondemand'

See Cluster Role Bindings for information on necessary RoleBinding needed to allow job-pod-reaper to reap OnDemand pods if not reaping all namespaces.

Install with YAML

First install the necessary Namespace and RBAC resources:

kubectl apply -f https://github.com/OSC/job-pod-reaper/releases/latest/download/namespace-rbac.yaml

For Open OnDemand a deployment can be installed using Open OnDemand specific deployment:

kubectl apply -f https://github.com/OSC/job-pod-reaper/releases/latest/download/ondemand-deployment.yaml

A more generic deployment:

kubectl apply -f https://github.com/OSC/job-pod-reaper/releases/latest/download/deployment.yaml

Cluster Role Bindings

If you wish to authorize the job-pod-reaper to reap only specific namespaces, those namespaces will need to have the following RoleBinding added (replace $NAMESPACE with namespace name). Use this RoleBinding on namespaces listed with --reap-namespaces or if those namespaces match labels defined with --namespace-labels

apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
  name: $NAMESPACE-job-pod-reaper-rolebinding
  namespace: $NAMESPACE
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: job-pod-reaper
subjects:
- kind: ServiceAccount
  name: job-pod-reaper
  namespace: job-pod-reaper

If you wish to authorize job-pod-reader for all namespaces the following ClusterRoleBinding is required. This would be needed if --namespace-labels is not defined and you set --reap-namespaces=all.

apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
  name: job-pod-reaper
  namespace: job-pod-reaper
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: job-pod-reaper
subjects:
- kind: ServiceAccount
  name: job-pod-reaper
  namespace: job-pod-reaper

Configuration

To give a lifetime to your pods, add the following annotation:

pod.kubernetes.io/lifetime: $DURATION

DURATION has to be a valid golang duration string.

Example: pod.kubernetes.io/lifetime: 24h

The above annotation will cause the pod to be reaped (killed) once it reaches the age of 1d (24h)

Changing what is reaped

By default pods in any namespace with pod.kubernetes.io/lifetime annotation that have job label are reaped if their lifetime has expired. Any Services, ConfigMaps or Secrets with matching job label in the same namespace as the expired pod will also be reaped.

If you wish to scope the namespaces searched change either --namespace-labels flag to limit namespaces searched by label, or list the namespaces with --reap-namespaces (comma separated). See Cluster Role Bindings on the necessary RBAC changes based on the scope of what namespaces to search.

If you wish to only reap pods with a given label, set --object-labels. This also affects which possible orphaned job objects will be reaped.

If you wish to reap pods only and don't set the job label set --job-label=none.

Deployment Details

The job-pod-reaper is intended to be deployed inside a Kubernetes cluster. It can also be run outside the cluster via cron.

The following flags and environment variables can modify the behavior of the job-pod-reaper:

Flag Environment Variable Description
--run-once RUN_ONCE=true Set to only execute reap code once and exit, ie used when run via cron
--reap-max=30 REAP_MAX=30 The maximum number of jobs to reap during each loop
--reap-interval=60s REAP_INTERVAL=60s Duration between each reaping execution when run in loop
--reap-namespaces=all REAP_NAMESPACES=all Comma separated list of namespaces to reap, ignored if use --namespace-labels
--namespace-labels NAMESPACE_LABELS The labels to use when filtering namespaces to search, overrides --reap-namespaces
--object-labels OBJECT_LABELS Comma separated list of labels to filter which pods and orphaned objects to reap
--job-label=job JOB_LABEL=job The label associated to objects that represent a job to reap, set to none to not require job label
--kubeconfig KUBECONFIG The path to Kubernetes config, required when run outside Kubernetes
--listen-address LISTEN_ADDRESS=:8080 Address to listen for HTTP requests
--no-process-metrics PROCESS_METRICS=false Disable metrics about the running processes such as CPU, memory and Go stats
--log-level=info LOG_LEVEL=info The logging level One of: [debug, info, warn, error]
--log-format=logfmt LOG_FORMAT=logfmt The logging format, either logfmt or json