-
Notifications
You must be signed in to change notification settings - Fork 386
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Merge pull request #292 from autumn0207/add_docs_for_crane_scheduler
add docs for crane-scheduler
- Loading branch information
Showing
9 changed files
with
271 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
# Dynamic-scheduler: a load-aware scheduler plugin | ||
|
||
## Introduction | ||
Native scheduler of kubernetes can only schedule pods by resource request, which can easily cause a series of load uneven problems: | ||
- for some nodes, the actual load is not much different from the resource request, which will lead to a very high probability of stability problems. | ||
- for others, the actual load is much smaller than the resource request, which will lead to a huge waste of resources. | ||
|
||
To solve these problems, Dynamic scheduler builds a simple but efficient model based on actual node utilization data,and filters out those nodes with high load to balance the cluster. | ||
## Design Details | ||
### Architecture | ||
<img src="./../images/dynamic-scheduler-plugin.png" div align=“center” width="600" height="350"/> | ||
|
||
|
||
As shown above, Dynamic scheduler relies on `Prometheus` and `Node-exporter` to collect and aggregate metrics data, and it consists of two components: | ||
- `Node-annotator` periodically pulls data from Prometheus and marks them with timestamp on the node in the form of annotations. | ||
>**Note:** `Node-annotator` is currently a module of `Crane-scheduler-controller`. | ||
- `Dynamic plugin` reads the load data directly from the node's annotation, filters and scores candidates based on a simple algorithm. | ||
|
||
### Scheduler Policy | ||
Dynamic provides a default [scheduler policy](../deploy/manifests/policy.yaml) and supports user-defined policies. The default policy reies on following metrics: | ||
- `cpu_usage_avg_5m` | ||
- `cpu_usage_max_avg_1h` | ||
- `cpu_usage_max_avg_1d` | ||
- `mem_usage_avg_5m` | ||
- `mem_usage_max_avg_1h` | ||
- `mem_usage_max_avg_1d` | ||
|
||
At the scheduling `Filter` stage, the node will be filtered if the actual usage rate of this node is greater than the threshold of any of the above metrics. And at the `Score` stage, the final score is the weighted sum of these metrics' values. | ||
|
||
### Hot Value | ||
In the production cluster, scheduling hotspots may occur frequently because the load of the nodes can not increase immediately after the pod is created. Therefore, we define an extra metrics named `Hot Value`, which represents the scheduling frequency of the node in recent times. And the final priority of the node is the final score minus the `Hot Value`. | ||
|
213 changes: 213 additions & 0 deletions
213
docs/tutorials/scheduling-pods-based-on-actual-node-load.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,213 @@ | ||
# Crane-scheduler | ||
|
||
## Overview | ||
Crane-scheduler is a collection of scheduler plugins based on [scheduler framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/), including: | ||
|
||
- [Dynamic scheuler: a load-aware scheduler plugin](./dynamic-scheduler-plugin.md) | ||
|
||
## Get Started | ||
|
||
### 1. Install Prometheus | ||
Make sure your kubernetes cluster has Prometheus installed. If not, please refer to [Install Prometheus](https://github.com/gocrane/fadvisor/blob/main/README.md#prerequests). | ||
|
||
### 2. Configure Prometheus Rules | ||
1) Configure the rules of Prometheus to get expected aggregated data: | ||
```yaml | ||
apiVersion: monitoring.coreos.com/v1 | ||
kind: PrometheusRule | ||
metadata: | ||
name: example-record | ||
spec: | ||
groups: | ||
- name: cpu_mem_usage_active | ||
interval: 30s | ||
rules: | ||
- record: cpu_usage_active | ||
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100) | ||
- record: mem_usage_active | ||
expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes) | ||
- name: cpu-usage-5m | ||
interval: 5m | ||
rules: | ||
- record: cpu_usage_max_avg_1h | ||
expr: max_over_time(cpu_usage_avg_5m[1h]) | ||
- record: cpu_usage_max_avg_1d | ||
expr: max_over_time(cpu_usage_avg_5m[1d]) | ||
- name: cpu-usage-1m | ||
interval: 1m | ||
rules: | ||
- record: cpu_usage_avg_5m | ||
expr: avg_over_time(cpu_usage_active[5m]) | ||
- name: mem-usage-5m | ||
interval: 5m | ||
rules: | ||
- record: mem_usage_max_avg_1h | ||
expr: max_over_time(mem_usage_avg_5m[1h]) | ||
- record: mem_usage_max_avg_1d | ||
expr: max_over_time(mem_usage_avg_5m[1d]) | ||
- name: mem-usage-1m | ||
interval: 1m | ||
rules: | ||
- record: mem_usage_avg_5m | ||
expr: avg_over_time(mem_usage_active[5m]) | ||
``` | ||
>**⚠️Troubleshooting:** The sampling interval of Prometheus must be less than 30 seconds, otherwise the above rules(such as cpu_usage_active) may not take effect. | ||
2) Update the configuration of Prometheus service discovery to ensure that node_exporters/telegraf are using node name as instance name: | ||
```yaml | ||
- job_name: kubernetes-node-exporter | ||
tls_config: | ||
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt | ||
insecure_skip_verify: true | ||
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token | ||
scheme: https | ||
kubernetes_sd_configs: | ||
... | ||
# Host name | ||
- source_labels: [__meta_kubernetes_node_name] | ||
target_label: instance | ||
... | ||
``` | ||
>**Note:** This step can be skipped if the node name itself is the host IP. | ||
### 3. Install Crane-scheduler | ||
There are two options: | ||
1) Install Crane-scheduler as a second scheduler: | ||
```bash | ||
helm repo add crane https://gocrane.github.io/helm-charts | ||
helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler | ||
``` | ||
2) Replace native Kube-scheduler with Crane-scheduler: | ||
1) Backup `/etc/kubernetes/manifests/kube-scheduler.yaml` | ||
```bash | ||
cp /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/ | ||
``` | ||
2) Modify configfile of kube-scheduler(`scheduler-config.yaml`) to enable Dynamic scheduler plugin and configure plugin args: | ||
```yaml | ||
apiVersion: kubescheduler.config.k8s.io/v1beta2 | ||
kind: KubeSchedulerConfiguration | ||
... | ||
profiles: | ||
- schedulerName: default-scheduler | ||
plugins: | ||
filter: | ||
enabled: | ||
- name: Dynamic | ||
score: | ||
enabled: | ||
- name: Dynamic | ||
weight: 3 | ||
pluginConfig: | ||
- name: Dynamic | ||
args: | ||
policyConfigPath: /etc/kubernetes/policy.yaml | ||
... | ||
``` | ||
3) Create `/etc/kubernetes/policy.yaml`, using as scheduler policy of Dynamic plugin: | ||
```yaml | ||
apiVersion: scheduler.policy.crane.io/v1alpha1 | ||
kind: DynamicSchedulerPolicy | ||
spec: | ||
syncPolicy: | ||
##cpu usage | ||
- name: cpu_usage_avg_5m | ||
period: 3m | ||
- name: cpu_usage_max_avg_1h | ||
period: 15m | ||
- name: cpu_usage_max_avg_1d | ||
period: 3h | ||
##memory usage | ||
- name: mem_usage_avg_5m | ||
period: 3m | ||
- name: mem_usage_max_avg_1h | ||
period: 15m | ||
- name: mem_usage_max_avg_1d | ||
period: 3h | ||
predicate: | ||
##cpu usage | ||
- name: cpu_usage_avg_5m | ||
maxLimitPecent: 0.65 | ||
- name: cpu_usage_max_avg_1h | ||
maxLimitPecent: 0.75 | ||
##memory usage | ||
- name: mem_usage_avg_5m | ||
maxLimitPecent: 0.65 | ||
- name: mem_usage_max_avg_1h | ||
maxLimitPecent: 0.75 | ||
priority: | ||
##cpu usage | ||
- name: cpu_usage_avg_5m | ||
weight: 0.2 | ||
- name: cpu_usage_max_avg_1h | ||
weight: 0.3 | ||
- name: cpu_usage_max_avg_1d | ||
weight: 0.5 | ||
##memory usage | ||
- name: mem_usage_avg_5m | ||
weight: 0.2 | ||
- name: mem_usage_max_avg_1h | ||
weight: 0.3 | ||
- name: mem_usage_max_avg_1d | ||
weight: 0.5 | ||
hotValue: | ||
- timeRange: 5m | ||
count: 5 | ||
- timeRange: 1m | ||
count: 2 | ||
``` | ||
4) Modify `kube-scheduler.yaml` and replace kube-scheduler image with Crane-scheduler: | ||
```yaml | ||
... | ||
image: docker.io/gocrane/crane-scheduler:0.0.23 | ||
... | ||
``` | ||
1) Install [crane-scheduler-controller](deploy/controller/deployment.yaml): | ||
```bash | ||
kubectl apply ./deploy/controller/rbac.yaml && kubectl apply -f ./deploy/controller/deployment.yaml | ||
``` | ||
|
||
### 4. Schedule Pods With Crane-scheduler | ||
Test Crane-scheduler with following example: | ||
```yaml | ||
apiVersion: apps/v1 | ||
kind: Deployment | ||
metadata: | ||
name: cpu-stress | ||
spec: | ||
selector: | ||
matchLabels: | ||
app: cpu-stress | ||
replicas: 1 | ||
template: | ||
metadata: | ||
labels: | ||
app: cpu-stress | ||
spec: | ||
schedulerName: crane-scheduler | ||
hostNetwork: true | ||
tolerations: | ||
- key: node.kubernetes.io/network-unavailable | ||
operator: Exists | ||
effect: NoSchedule | ||
containers: | ||
- name: stress | ||
image: docker.io/gocrane/stress:latest | ||
command: ["stress", "-c", "1"] | ||
resources: | ||
requests: | ||
memory: "1Gi" | ||
cpu: "1" | ||
limits: | ||
memory: "1Gi" | ||
cpu: "1" | ||
``` | ||
>**Note:** Change `crane-scheduler` to `default-scheduler` if `crane-scheduler` is used as default. | ||
|
||
There will be the following event if the test pod is successfully scheduled: | ||
```bash | ||
Type Reason Age From Message | ||
---- ------ ---- ---- ------- | ||
Normal Scheduled 28s crane-scheduler Successfully assigned default/cpu-stress-7669499b57-zmrgb to vm-162-247-ubuntu | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters