Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Cluster API deployment method for TAS #108

Open
wants to merge 23 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 7 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
b5846ec
Add namespace to TAS Service Account
criscola Sep 23, 2022
c890226
Add Cluster API deployment method
criscola Sep 23, 2022
4d7d6df
Adding code_of_conduct and contributing readme file
madalazar Oct 18, 2022
d6f904b
Merge branch 'master' into feature/cluster-api
criscola Nov 18, 2022
050de7f
Merge branch 'master' into feature/cluster-api
criscola Jan 17, 2023
19026c4
Add Docker CAPI deployment specific guide
criscola Jan 17, 2023
f44598a
Add ClusterResourceSets for CAPD deployment
criscola Jan 17, 2023
6be7648
Move CRS to 'shared' folder.
criscola Jan 27, 2023
fd030d4
Update link to Health Metric Example.
criscola Jan 27, 2023
3badb05
Rename your-manifests.yaml to capi-quickstart.yaml
criscola Jan 27, 2023
57ff014
Fix numbering in markdown.
criscola Jan 27, 2023
1bb8999
Add yaml newlines.
criscola Jan 27, 2023
e41d190
Add testing/development notice in all markdowns.
criscola Jan 27, 2023
fb752e1
Move generic/docker provider links to top.
criscola Jan 27, 2023
eaf3e7c
Add Docker and Kind versions.
criscola Jan 27, 2023
2fe40df
Add small comment after clusterctl generate.
criscola Jan 27, 2023
1365e59
Add necessary feature flags.
criscola Jan 27, 2023
d3cd12c
Update paths of commands referencing the Helm chart.
criscola Jan 27, 2023
0891f21
Add yq commands to wrangle with the various resources with the comman…
criscola Jan 27, 2023
2d08d1e
Reformat docs.
criscola Jan 27, 2023
299571e
Add a few more links to files/folders.
criscola Jan 27, 2023
ed3d300
Add note on how to initialize Kind cluster in Docker provider.
criscola Jan 27, 2023
8f98dd6
More adjustments.
criscola Jan 27, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions telemetry-aware-scheduling/deploy/cluster-api/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Cluster API deployment

## Introduction

Cluster API is a Kubernetes sub-project focused on providing declarative APIs and tooling to simplify provisioning, upgrading, and operating multiple Kubernetes clusters. [Learn more](https://cluster-api.sigs.k8s.io/introduction.html).

This folder contains an automated and declarative way of deploying the Telemetry Aware Scheduler using Cluster API. We will make use of the [ClusterResourceSet feature](https://cluster-api.sigs.k8s.io/tasks/experimental-features/cluster-resource-set.html) to automatically apply a set of resources. Note you must enable its feature gate before running `clusterctl init` (with `export EXP_CLUSTER_RESOURCE_SET=true`).

## Guides

- [Cluster API deployment - Docker provider (for local testing/development only)](docker/capi-docker.md)
- [Cluster API deployment - Generic provider](generic/capi.md)

## Testing

You can test if the scheduler actually works by following this guide:
criscola marked this conversation as resolved.
Show resolved Hide resolved
[Health Metric Example](https://github.com/intel/platform-aware-scheduling/blob/25a646ece15aaf4c549d8152c4ffbbfc61f8a009/telemetry-aware-scheduling/docs/health-metric-example.md)
160 changes: 160 additions & 0 deletions telemetry-aware-scheduling/deploy/cluster-api/docker/capi-docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,160 @@
# Cluster API deployment - Docker provider (for local testing/development only)

## Requirements

- A management cluster provisioned in your infrastructure of choice and the relative tooling.
criscola marked this conversation as resolved.
Show resolved Hide resolved
See [Cluster API Quickstart](https://cluster-api.sigs.k8s.io/user/quick-start.html).
- Run Kubernetes v1.22 or greater (tested on Kubernetes v1.25).
- Docker

## Provision clusters with TAS installed using Cluster API

We will provision a KinD cluster with the TAS installed using Cluster API. This guide is meant for local testing/development only.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase This guide is meant for local testing/development only to This guide is meant for local testing/development only, this is not meant for production usage.. I would make it bold and add it at the beginning of both this wiki and the capi.md file

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added it to all 3 markdowns (capi, capi-docker, README).


For the deployment using a generic provider, please refer to [Cluster API deployment - Generic provider](capi.md).

1. Run the following to set up a KinD cluster for CAPD:

```bash
cat > kind-cluster-with-extramounts.yaml <<EOF
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
extraMounts:
- hostPath: /var/run/docker.sock
containerPath: /var/run/docker.sock
EOF
```

2. Enable the `CLUSTER_TOPOLOGY` feature gate:

```bash
export CLUSTER_TOPOLOGY=true
```

3. Initialize the management cluster:

```bash
clusterctl init --infrastructure docker
```

Run the following to generate the default cluster manifests:
madalazar marked this conversation as resolved.
Show resolved Hide resolved

```bash
clusterctl generate cluster capi-quickstart --flavor development \
--kubernetes-version v1.25.0 \
--control-plane-machine-count=3 \
--worker-machine-count=3 \
> capi-quickstart.yaml
madalazar marked this conversation as resolved.
Show resolved Hide resolved
```
madalazar marked this conversation as resolved.
Show resolved Hide resolved

Be aware that you will need to install a CNI such as Calico before the cluster will be usable. You may automate this
madalazar marked this conversation as resolved.
Show resolved Hide resolved
step in the same way as we will see with TAS resources using ClusterResourceSets.

2. Merge the contents of the resources provided in `cluster-patch.yaml`, `kubeadmcontrolplanetemplate-patch.yaml` and `clusterclass-patch.yaml` with
madalazar marked this conversation as resolved.
Show resolved Hide resolved
madalazar marked this conversation as resolved.
Show resolved Hide resolved
`your-manifests.yaml`.

The new config will:
- Configure TLS certificates for the extender
- Change the `dnsPolicy` of the scheduler to `ClusterFirstWithHostNet`
- Place `KubeSchedulerConfiguration` into control plane nodes and pass the relative CLI flag to the scheduler.
- Change the behavior of the pre-existing patch application of `/spec/template/spec/kubeadmConfigSpec/files` in `ClusterClass`
such that our new patch is not ignored/overwritten. For some more clarification on this, see [this issue](https://github.com/kubernetes-sigs/cluster-api/pull/7630).

You will also need to add a label to the `Cluster` resource of your new cluster to allow ClusterResourceSets to target
it (see `cluster-patch.yaml`). Simply add a label `scheduler: tas` in your `Cluster` resource present in `your-manifests.yaml`.

3. You will need to prepare the Helm Charts of the various components and join the TAS manifests together for convenience:

First, under `telemetry-aware-scheduling/deploy/charts` tweak the charts if you need (e.g.
additional metric scraping configurations), then render the charts:

```bash
helm template ../charts/prometheus_node_exporter_helm_chart/ > prometheus-node-exporter.yaml
helm template ../charts/prometheus_helm_chart/ > prometheus.yaml
helm template ../charts/prometheus_custom_metrics_helm_chart > prometheus-custom-metrics.yaml
```

You need to add namespaces resources, else resource application will fail. Prepend the following to `prometheus.yaml`:

```bash
kind: Namespace
apiVersion: v1
metadata:
name: monitoring
labels:
name: monitoring
````

Prepend the following to `prometheus-custom-metrics.yaml`:
```bash
kind: Namespace
apiVersion: v1
metadata:
name: custom-metrics
labels:
name: custom-metrics
```

The custom metrics adapter and the TAS deployment require TLS to be configured with a certificate and key.
Information on how to generate correctly signed certs in kubernetes can be found [here](https://github.com/kubernetes-sigs/apiserver-builder-alpha/blob/master/docs/concepts/auth.md).
Files ``serving-ca.crt`` and ``serving-ca.key`` should be in the current working directory.

Run the following:

```bash
kubectl -n custom-metrics create secret tls cm-adapter-serving-certs --cert=serving-ca.crt --key=serving-ca.key -oyaml --dry-run=client > custom-metrics-tls-secret.yaml
kubectl -n default create secret tls extender-secret --cert=serving-ca.crt --key=serving-ca.key -oyaml --dry-run=client > tas-tls-secret.yaml
```

**Attention: Don't commit the TLS certificate and private key to any Git repo as it is considered bad security practice! Make sure to wipe them off your workstation after applying the relative Secrets to your cluster.**

You also need the TAS manifests (Deployment, Policy CRD and RBAC accounts) and the extender's "configmapgetter"
ClusterRole. We will join the TAS manifests together, so we can have a single ConfigMap for convenience:

```bash
yq '.' ../tas-*.yaml > tas.yaml
madalazar marked this conversation as resolved.
Show resolved Hide resolved
```

4. Create and apply the ConfigMaps

```bash
kubectl create configmap custom-metrics-tls-secret-configmap --from-file=./custom-metrics-tls-secret.yaml -o yaml --dry-run=client > custom-metrics-tls-secret-configmap.yaml
kubectl create configmap custom-metrics-configmap --from-file=./prometheus-custom-metrics.yaml -o yaml --dry-run=client > custom-metrics-configmap.yaml
kubectl create configmap prometheus-configmap --from-file=./prometheus.yaml -o yaml --dry-run=client > prometheus-configmap.yaml
kubectl create configmap prometheus-node-exporter-configmap --from-file=./prometheus-node-exporter.yaml -o yaml --dry-run=client > prometheus-node-exporter-configmap.yaml
kubectl create configmap tas-configmap --from-file=./tas.yaml -o yaml --dry-run=client > tas-configmap.yaml
kubectl create configmap tas-tls-secret-configmap --from-file=./tas-tls-secret.yaml -o yaml --dry-run=client > tas-tls-secret-configmap.yaml
kubectl create configmap extender-configmap --from-file=../extender-configuration/configmap-getter.yaml -o yaml --dry-run=client > extender-configmap.yaml
```

Apply to the management cluster:

```bash
kubectl apply -f '*-configmap.yaml'
```

5. Apply the ClusterResourceSets

ClusterResourceSets resources are already given to you in `clusterresourcesets.yaml`.
Apply them to the management cluster with `kubectl apply -f clusterresourcesets.yaml`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this refer to docker/clusterresourcesets.yaml? And for generic, it also refers to it's own clusterresourcesets.yaml

Copy link
Author

@criscola criscola Jan 26, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The CRS resources are actually the same. Do you think it would make more sense to have only one file for both vs duplicating it, if yes any idea where to put the CRS file in the folder tree?
Edit: I created a shared folder and referenced the common resources in generic and docker guides. Should be more maintainable. If they go out of sync we can always move them out of the shared folder.

Copy link
Contributor

@madalazar madalazar Jan 31, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think having the shared folder and just referencing the files is a good idea. Do we still need the clusterresourcessets.yaml file in both (docker, generic) folders?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope I think we can just have it in the shared folder, it's also linked in both guides so users should find it easily.


6. Apply the cluster manifests

Finally, you can apply your manifests `kubectl apply -f your-manifests.yaml`.
The Telemetry Aware Scheduler will be running on your new cluster. You can connect to the workload cluster by
exporting its kubeconfig:

```bash
clusterctl get kubeconfig ecoqube-dev > ecoqube-dev.kubeconfig
```

Then, specifically for the CAPD docker, point the kubeconfig to the correct address of the HAProxy container:

```bash
sed -i -e "s/server:.*/server: https:\/\/$(docker port ecoqube-dev-lb 6443/tcp | sed "s/0.0.0.0/127.0.0.1/")/g" ./ecoqube-dev.kubeconfig
```

You can test if the scheduler actually works by following this guide:
[Health Metric Example](https://github.com/intel/platform-aware-scheduling/blob/25a646ece15aaf4c549d8152c4ffbbfc61f8a009/telemetry-aware-scheduling/docs/health-metric-example.md)
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
apiVersion: cluster.x-k8s.io/v1beta1
kind: Cluster
metadata:
labels:
scheduler: tas
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
apiVersion: cluster.x-k8s.io/v1beta1
kind: ClusterClass
spec:
patches:
- definitions:
- jsonPatches:
- op: add
# Note: we must add a dash - after files, as shown below. Else the patch application in KubeadmControlPlaneTemplate will fail!
path: /spec/template/spec/kubeadmConfigSpec/files/-
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
name: prometheus
spec:
clusterSelector:
matchLabels:
scheduler: tas
resources:
- kind: ConfigMap
name: prometheus-configmap
---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
name: prometheus-node-exporter
spec:
clusterSelector:
matchLabels:
scheduler: tas
resources:
- kind: ConfigMap
name: prometheus-node-exporter-configmap
---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
name: custom-metrics
spec:
clusterSelector:
matchLabels:
scheduler: tas
resources:
- kind: ConfigMap
name: custom-metrics-configmap
---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
name: custom-metrics-tls-secret
spec:
clusterSelector:
matchLabels:
scheduler: tas
resources:
- kind: ConfigMap
name: custom-metrics-tls-secret-configmap
---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
name: tas
spec:
clusterSelector:
matchLabels:
scheduler: tas
resources:
- kind: ConfigMap
name: tas-configmap
---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
name: tas-tls-secret
spec:
clusterSelector:
matchLabels:
scheduler: tas
resources:
- kind: ConfigMap
name: tas-tls-secret-configmap
---
apiVersion: addons.cluster.x-k8s.io/v1alpha3
kind: ClusterResourceSet
metadata:
name: extender
spec:
clusterSelector:
matchLabels:
scheduler: tas
resources:
- kind: ConfigMap
name: extender-configmap
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
apiVersion: controlplane.cluster.x-k8s.io/v1beta1
kind: KubeadmControlPlaneTemplate
spec:
template:
spec:
kubeadmConfigSpec:
clusterConfiguration:
scheduler:
extraArgs:
config: "/etc/kubernetes/schedulerconfig/scheduler-componentconfig.yaml"
extraVolumes:
- hostPath: "/etc/kubernetes/schedulerconfig"
mountPath: "/etc/kubernetes/schedulerconfig"
name: schedulerconfig
- hostPath: "/etc/kubernetes/pki/ca.key"
mountPath: "/host/certs/client.key"
name: cacert
- hostPath: "/etc/kubernetes/pki/ca.crt"
mountPath: "/host/certs/client.crt"
name: clientcert
initConfiguration:
patches:
directory: /etc/tas/patches
joinConfiguration:
patches:
directory: /etc/tas/patches
files:
- path: /etc/kubernetes/schedulerconfig/scheduler-componentconfig.yaml
content: |
apiVersion: kubescheduler.config.k8s.io/v1
kind: KubeSchedulerConfiguration
clientConnection:
kubeconfig: /etc/kubernetes/scheduler.conf
extenders:
- urlPrefix: "https://tas-service.default.svc.cluster.local:9001"
prioritizeVerb: "scheduler/prioritize"
filterVerb: "scheduler/filter"
weight: 1
enableHTTPS: true
managedResources:
- name: "telemetry/scheduling"
ignoredByScheduler: true
ignorable: true
tlsConfig:
insecure: false
certFile: "/host/certs/client.crt"
keyFile: "/host/certs/client.key"
- path: /etc/tas/patches/kube-scheduler+json.json
content: |-
[
{
"op": "add",
"path": "/spec/dnsPolicy",
"value": "ClusterFirstWithHostNet"
}
]
criscola marked this conversation as resolved.
Show resolved Hide resolved
Loading