Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optional removal of node taint on successful IP assignment #146

Merged
merged 8 commits into from
May 13, 2024
41 changes: 40 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,7 @@ To enable IPv6 support, set the `ipv6` flag (or set `IPV6` environment variable)

### Kubernetes Service Account

KubeIP requires a Kubernetes service account with the following permissions:
KubeIP requires a Kubernetes service account with at least the following permissions:

```yaml
apiVersion: v1
Expand Down Expand Up @@ -129,6 +129,44 @@ spec:
value: "true"
```

### Node Taints

KubeIP can be configured to attempt removal of a Taint Key from its node once the static IP has been successfully assigned, preventing workloads from being scheduled on the node until it has successfully received a static IP address. This can be useful, for example, in cases where the workload must call resources with IP-whitelisting, to prevent race conditions between KubeIP and the workload on newly provisioned nodes.

To enable this feature, set the `taint-key` configuration parameter (See [How to run KubeIP](#how-to-run-kubeip)) to the taint key that should be removed. Then add a toleration to the KubeIP DaemonSet, so that it itself can be scheduled on the tainted nodes. For example, given that new nodes are created with a taint key of `kubeip.com/not-ready`:

```diff
kind: DaemonSet
spec:
template:
spec:
serviceAccountName: kubeip-service-account
+ tolerations:
+ - key: kubeip.com/not-ready
+ operator: Exists
+ effect: NoSchedule
containers:
- name: kubeip
image: doitintl/kubeip-agent
env:
+ - name: TAINT_KEY
+ value: kubeip.com/not-ready
```

The parameter has no default value, and if not set, KubeIP will not attempt to remove any taints. If the provided Taint Key is not present on the node, KubeIP will simply log this fact and continue normally without attempting to remove it. If the Taint Key is present, but removing it fails for some reason, KubeIP will release the IP address back into the pool before restarting and trying again.

Using this feature requires KubeIP to have permission to patch nodes. To use this feature, the `ClusterRole` resource rules need to be updated. **Note that if this configuration option is not set, KubeIP will not attempt to patch any nodes, and the change to the rules is not necessary.**

Please keep in mind that this will give KubeIP permission to make updates to any node in your cluster, so please make sure that this aligns with your security requirements before enabling this feature!

```diff
rules:
- apiGroups: [ "" ]
resources: [ "nodes" ]
- verbs: [ "get" ]
+ verbs: [ "get", "patch" ]
```

### AWS

Make sure that KubeIP DaemonSet is deployed on nodes that have a public IP (node running in public subnet) and uses a Kubernetes service
Expand Down Expand Up @@ -231,6 +269,7 @@ OPTIONS:
--project value name of the GCP project or the AWS account ID (not needed if running in node) [$PROJECT]
--region value name of the GCP region or the AWS region (not needed if running in node) [$REGION]
--release-on-exit release the static public IP address on exit (default: true) [$RELEASE_ON_EXIT]
--taint-key value specify a taint key to remove from the node once the static public IP address is assigned [$TAINT_KEY]
--retry-attempts value number of attempts to assign the static public IP address (default: 10) [$RETRY_ATTEMPTS]
--retry-interval value when the agent fails to assign the static public IP address, it will retry after this interval (default: 5m0s) [$RETRY_INTERVAL]
--lease-duration value duration of the kubernetes lease (default: 5) [$LEASE_DURATION]
Expand Down
4 changes: 4 additions & 0 deletions chart/templates/clusterrole.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,11 @@ metadata:
rules:
- apiGroups: [ "" ]
resources: [ "nodes" ]
{{- if .Values.rbac.allowNodesPatchPermission }}
verbs: [ "get", "patch" ]
{{- else }}
verbs: [ "get" ]
{{- end }}
- apiGroups: [ "coordination.k8s.io" ]
resources: [ "leases" ]
verbs: [ "create", "delete", "get" ]
Expand Down
2 changes: 2 additions & 0 deletions chart/templates/daemonset.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,8 @@ spec:
fieldPath: spec.nodeName
- name: FILTER
value: {{ .Values.daemonSet.env.FILTER | quote }}
- name: TAINT_KEY
value: {{ .Values.daemonSet.env.TAINT_KEY | quote }}
- name: LOG_LEVEL
value: {{ .Values.daemonSet.env.LOG_LEVEL | quote }}
- name: LOG_JSON
Expand Down
2 changes: 2 additions & 0 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ serviceAccount:
# Role-Based Access Control (RBAC) configuration.
rbac:
create: true
allowNodesPatchPermission: false

# DaemonSet configuration.
daemonSet:
Expand All @@ -35,6 +36,7 @@ daemonSet:
kubeip: use
env:
FILTER: labels.kubeip=reserved;labels.environment=demo
TAINT_KEY: ""
LOG_LEVEL: debug
LOG_JSON: true
resources:
Expand Down
26 changes: 26 additions & 0 deletions cmd/main.go
RagnarHal marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -174,6 +174,26 @@ func run(c context.Context, log *logrus.Entry, cfg *config.Config) error {
return errors.Wrap(err, "assigning static public IP address")
}

if cfg.TaintKey != "" {
logger := log.WithField("taint-key", cfg.TaintKey)
tainter := nd.NewTainter(clientset)

didRemoveTaint, err := tainter.RemoveTaintKey(ctx, n, cfg.TaintKey)
if err != nil {
logger.Error("removing taint key failed, releasing static public IP address")
if releaseErr := releaseIP(assigner, n); releaseErr != nil { //nolint:contextcheck
log.WithError(releaseErr).Error("releasing static public IP address after taint key removal failed")
}
return errors.Wrap(err, "removing node taint key")
}

if didRemoveTaint {
logger.Info("taint key removed successfully")
} else {
logger.Warning("taint key not present on node, skipped removal")
}
}

// pause the agent to prevent it from exiting immediately after assigning the static public IP address
// wait for the context to be done: SIGTERM, SIGINT
<-ctx.Done()
Expand Down Expand Up @@ -303,6 +323,12 @@ func main() {
Category: "Configuration",
Value: true,
},
&cli.StringFlag{
Name: "taint-key",
Usage: "specify a taint key to remove from the node once the static public IP address is assigned",
EnvVars: []string{"TAINT_KEY"},
Category: "Configuration",
},
&cli.StringFlag{
Name: "log-level",
Usage: "set log level (debug, info(*), warning, error, fatal, panic)",
Expand Down
3 changes: 3 additions & 0 deletions internal/config/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,8 @@ type Config struct {
LeaseDuration int `json:"lease-duration"`
// LeaseNamespace is the namespace of the kubernetes lease
LeaseNamespace string `json:"lease-namespace"`
// TaintKey is the taint key to remove from the node once the IP address is assigned
TaintKey string `json:"taint-key"`
}

func NewConfig(c *cli.Context) *Config {
Expand All @@ -50,5 +52,6 @@ func NewConfig(c *cli.Context) *Config {
cfg.ReleaseOnExit = c.Bool("release-on-exit")
cfg.LeaseDuration = c.Int("lease-duration")
cfg.LeaseNamespace = c.String("lease-namespace")
cfg.TaintKey = c.String("taint-key")
return &cfg
}
73 changes: 73 additions & 0 deletions internal/node/tainter.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,73 @@
package node

import (
"context"
"encoding/json"
"fmt"

"github.com/doitintl/kubeip/internal/types"
"github.com/pkg/errors"
v1 "k8s.io/api/core/v1"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
typesv1 "k8s.io/apimachinery/pkg/types"
"k8s.io/client-go/kubernetes"
)

type Tainter interface {
RemoveTaintKey(ctx context.Context, node *types.Node, taintKey string) (bool, error)
}

type tainter struct {
client kubernetes.Interface
}

func deleteTaintsByKey(taints []v1.Taint, taintKey string) ([]v1.Taint, bool) {
newTaints := []v1.Taint{}
didDelete := false

for i := range taints {
if taintKey == taints[i].Key {
didDelete = true
continue
}
newTaints = append(newTaints, taints[i])
}

return newTaints, didDelete
}

func NewTainter(client kubernetes.Interface) Tainter {
return &tainter{
client: client,
}
}

func (t *tainter) RemoveTaintKey(ctx context.Context, node *types.Node, taintKey string) (bool, error) {
// get node object from API server
n, err := t.client.CoreV1().Nodes().Get(ctx, node.Name, metav1.GetOptions{})
if err != nil {
return false, errors.Wrap(err, "failed to get kubernetes node")
}

// Remove taint from the node representation
newTaints, didDelete := deleteTaintsByKey(n.Spec.Taints, taintKey)
if !didDelete {
return false, nil
}

// Marshal the remaining taints of the node into json format for patching.
// The remaining taints may be empty, and that will result in an empty json array "[]"
newTaintsMarshaled, err := json.Marshal(newTaints)
if err != nil {
return false, errors.Wrap(err, "failed to marshal new taints")
}

// Patch the node with only the remaining taints
patch := fmt.Sprintf(`{"spec":{"taints":%v}}`, string(newTaintsMarshaled))
_, err = t.client.CoreV1().Nodes().Patch(ctx, node.Name, typesv1.MergePatchType, []byte(patch), metav1.PatchOptions{})
if err != nil {
return false, errors.Wrap(err, "failed to patch node taints")
}

return true, nil
}
Loading
Loading