Skip to content

Commit

Permalink
Merge pull request #292 from autumn0207/add_docs_for_crane_scheduler
Browse files Browse the repository at this point in the history
add docs for crane-scheduler
  • Loading branch information
qmhu authored Apr 29, 2022
2 parents 85108d0 + 06caad7 commit cb35820
Show file tree
Hide file tree
Showing 9 changed files with 271 additions and 0 deletions.
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ The goal of Crane is to provide a one-stop-shop project to help Kubernetes users
- Effective Pod Autoscaling (Effective Horizontal & Vertical Pod Autoscaling)
- Cost Optimization
- **Enhanced QoS** based on Pod PriorityClass
- **Load-aware Scheduling**

<img alt="Crane Overview" height="550" src="docs/images/crane-overview.png" width="800"/>

Expand Down
Binary file added docs/images/dynamic-scheduler-plugin.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 8 additions & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,8 @@ The goal of Crane is to provide a one-stop-shop project to help Kubernetes users
- Effective Pod Autoscaling (Effective Horizontal & Vertical Pod Autoscaling)
- Cost Optimization
- **Enhanced QoS** based on Pod PriorityClass
- **Load-aware Scheduling**


![Crane Overview](images/crane-overview.png)

Expand Down Expand Up @@ -47,6 +49,11 @@ Avoidance Actions:

Please see [this document](tutorials/using-qos-ensurance.md) to learn more.

## Load-aware Scheduling
Native scheduler of kubernetes can only schedule pods by resource request, which can easily cause a series of load uneven problems. In contrast, Crane-scheduler can get the actual load of kubernetes nodes from Prometheus, and achieve more efficient scheduling.

Please see [this document](tutorials/scheduling-pods-based-on-actual-node-load.md) to learn more.

## Repositories

Crane is composed of the following components:
Expand All @@ -62,3 +69,4 @@ Crane is composed of the following components:
- [crane-agent](https://github.com/gocrane/crane/tree/main/cmd/crane-agent) - Ensure critical workloads SLO based on abnormally detection.
- [gocrane/api](https://github.com/gocrane/api) - This repository defines component-level APIs for the Crane platform.
- [gocrane/fadvisor](https://github.com/gocrane/fadvisor) - Financial advisor which collect resource prices from cloud API.
- [gocrane/crane-scheduler](https://github.com/gocrane/crane-scheduler) - A Kubernetes scheduler which can schedule pod based on actual node load.
6 changes: 6 additions & 0 deletions docs/index.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ The goal of Crane is to provide a one-stop-shop project to help Kubernetes users
- Effective Pod Autoscaling (Effective Horizontal & Vertical Pod Autoscaling)
- Cost Optimization
- **Enhanced QoS** based on Pod PriorityClass
- **Load-aware Scheduling**

![Crane Overview](images/crane-overview.png)

Expand Down Expand Up @@ -47,6 +48,10 @@ Avoidance Actions:

Please see [this document](tutorials/using-qos-ensurance.md) to learn more.

## 负载感知调度
原生的`Kubernetes`调度器只能基于资源的`Request`进行调度业务,这很容易导致集群负载不均的问题。与之对比的是,`Crane-scheudler`可以直接从`Prometheus`获取节点的真实负载情况,从而实现更有效的调度。

更多请参见[文档](tutorials/scheduling-pods-based-on-actual-node-load.md)
## Repositories

Crane is composed of the following components:
Expand All @@ -62,4 +67,5 @@ Crane is composed of the following components:
- [crane-agent](https://github.com/gocrane/crane/tree/main/cmd/crane-agent) - Ensure critical workloads SLO based on abnormally detection.
- [gocrane/api](https://github.com/gocrane/api) - This repository defines component-level APIs for the Crane platform.
- [gocrane/fadvisor](https://github.com/gocrane/fadvisor) - Financial advisor which collect resource prices from cloud API.
- [gocrane/crane-scheduler](https://github.com/gocrane/crane-scheduler) - 一个可以基于真实负载对业务进行调度的 `Kubernestes` 调度器。

5 changes: 5 additions & 0 deletions docs/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,11 @@ helm install crane -n crane-system --create-namespace crane/crane
helm install fadvisor -n crane-system --create-namespace crane/fadvisor
```

### Deploying Crane-scheduler(optional)
```bash
helm install scheduler -n crane-system --create-namespace crane/scheduler
```

### Verify Installation

Check deployments are all available by running:
Expand Down
5 changes: 5 additions & 0 deletions docs/installation.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@ helm install crane -n crane-system --create-namespace crane/crane
helm install fadvisor -n crane-system --create-namespace crane/fadvisor
```

### 安装 Crane-scheduler(可选)
```console
helm install scheduler -n crane-system --create-namespace crane/scheduler
```

## 验证安装是否成功

使用如下命令检查安装的 Deployment 是否正常:
Expand Down
32 changes: 32 additions & 0 deletions docs/tutorials/dynamic-scheduler-plugin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Dynamic-scheduler: a load-aware scheduler plugin

## Introduction
Native scheduler of kubernetes can only schedule pods by resource request, which can easily cause a series of load uneven problems:
- for some nodes, the actual load is not much different from the resource request, which will lead to a very high probability of stability problems.
- for others, the actual load is much smaller than the resource request, which will lead to a huge waste of resources.

To solve these problems, Dynamic scheduler builds a simple but efficient model based on actual node utilization data,and filters out those nodes with high load to balance the cluster.
## Design Details
### Architecture
<img src="./../images/dynamic-scheduler-plugin.png" div align=“center” width="600" height="350"/>


As shown above, Dynamic scheduler relies on `Prometheus` and `Node-exporter` to collect and aggregate metrics data, and it consists of two components:
- `Node-annotator` periodically pulls data from Prometheus and marks them with timestamp on the node in the form of annotations.
>**Note:** `Node-annotator` is currently a module of `Crane-scheduler-controller`.
- `Dynamic plugin` reads the load data directly from the node's annotation, filters and scores candidates based on a simple algorithm.

### Scheduler Policy
Dynamic provides a default [scheduler policy](../deploy/manifests/policy.yaml) and supports user-defined policies. The default policy reies on following metrics:
- `cpu_usage_avg_5m`
- `cpu_usage_max_avg_1h`
- `cpu_usage_max_avg_1d`
- `mem_usage_avg_5m`
- `mem_usage_max_avg_1h`
- `mem_usage_max_avg_1d`

At the scheduling `Filter` stage, the node will be filtered if the actual usage rate of this node is greater than the threshold of any of the above metrics. And at the `Score` stage, the final score is the weighted sum of these metrics' values.

### Hot Value
In the production cluster, scheduling hotspots may occur frequently because the load of the nodes can not increase immediately after the pod is created. Therefore, we define an extra metrics named `Hot Value`, which represents the scheduling frequency of the node in recent times. And the final priority of the node is the final score minus the `Hot Value`.

213 changes: 213 additions & 0 deletions docs/tutorials/scheduling-pods-based-on-actual-node-load.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,213 @@
# Crane-scheduler

## Overview
Crane-scheduler is a collection of scheduler plugins based on [scheduler framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/), including:

- [Dynamic scheuler: a load-aware scheduler plugin](./dynamic-scheduler-plugin.md)

## Get Started

### 1. Install Prometheus
Make sure your kubernetes cluster has Prometheus installed. If not, please refer to [Install Prometheus](https://github.com/gocrane/fadvisor/blob/main/README.md#prerequests).

### 2. Configure Prometheus Rules
1) Configure the rules of Prometheus to get expected aggregated data:
```yaml
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: example-record
spec:
groups:
- name: cpu_mem_usage_active
interval: 30s
rules:
- record: cpu_usage_active
expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100)
- record: mem_usage_active
expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)
- name: cpu-usage-5m
interval: 5m
rules:
- record: cpu_usage_max_avg_1h
expr: max_over_time(cpu_usage_avg_5m[1h])
- record: cpu_usage_max_avg_1d
expr: max_over_time(cpu_usage_avg_5m[1d])
- name: cpu-usage-1m
interval: 1m
rules:
- record: cpu_usage_avg_5m
expr: avg_over_time(cpu_usage_active[5m])
- name: mem-usage-5m
interval: 5m
rules:
- record: mem_usage_max_avg_1h
expr: max_over_time(mem_usage_avg_5m[1h])
- record: mem_usage_max_avg_1d
expr: max_over_time(mem_usage_avg_5m[1d])
- name: mem-usage-1m
interval: 1m
rules:
- record: mem_usage_avg_5m
expr: avg_over_time(mem_usage_active[5m])
```
>**⚠️Troubleshooting:** The sampling interval of Prometheus must be less than 30 seconds, otherwise the above rules(such as cpu_usage_active) may not take effect.
2) Update the configuration of Prometheus service discovery to ensure that node_exporters/telegraf are using node name as instance name:
```yaml
- job_name: kubernetes-node-exporter
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
insecure_skip_verify: true
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
scheme: https
kubernetes_sd_configs:
...
# Host name
- source_labels: [__meta_kubernetes_node_name]
target_label: instance
...
```
>**Note:** This step can be skipped if the node name itself is the host IP.
### 3. Install Crane-scheduler
There are two options:
1) Install Crane-scheduler as a second scheduler:
```bash
helm repo add crane https://gocrane.github.io/helm-charts
helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler
```
2) Replace native Kube-scheduler with Crane-scheduler:
1) Backup `/etc/kubernetes/manifests/kube-scheduler.yaml`
```bash
cp /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/
```
2) Modify configfile of kube-scheduler(`scheduler-config.yaml`) to enable Dynamic scheduler plugin and configure plugin args:
```yaml
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
...
profiles:
- schedulerName: default-scheduler
plugins:
filter:
enabled:
- name: Dynamic
score:
enabled:
- name: Dynamic
weight: 3
pluginConfig:
- name: Dynamic
args:
policyConfigPath: /etc/kubernetes/policy.yaml
...
```
3) Create `/etc/kubernetes/policy.yaml`, using as scheduler policy of Dynamic plugin:
```yaml
apiVersion: scheduler.policy.crane.io/v1alpha1
kind: DynamicSchedulerPolicy
spec:
syncPolicy:
##cpu usage
- name: cpu_usage_avg_5m
period: 3m
- name: cpu_usage_max_avg_1h
period: 15m
- name: cpu_usage_max_avg_1d
period: 3h
##memory usage
- name: mem_usage_avg_5m
period: 3m
- name: mem_usage_max_avg_1h
period: 15m
- name: mem_usage_max_avg_1d
period: 3h
predicate:
##cpu usage
- name: cpu_usage_avg_5m
maxLimitPecent: 0.65
- name: cpu_usage_max_avg_1h
maxLimitPecent: 0.75
##memory usage
- name: mem_usage_avg_5m
maxLimitPecent: 0.65
- name: mem_usage_max_avg_1h
maxLimitPecent: 0.75
priority:
##cpu usage
- name: cpu_usage_avg_5m
weight: 0.2
- name: cpu_usage_max_avg_1h
weight: 0.3
- name: cpu_usage_max_avg_1d
weight: 0.5
##memory usage
- name: mem_usage_avg_5m
weight: 0.2
- name: mem_usage_max_avg_1h
weight: 0.3
- name: mem_usage_max_avg_1d
weight: 0.5
hotValue:
- timeRange: 5m
count: 5
- timeRange: 1m
count: 2
```
4) Modify `kube-scheduler.yaml` and replace kube-scheduler image with Crane-scheduler:
```yaml
...
image: docker.io/gocrane/crane-scheduler:0.0.23
...
```
1) Install [crane-scheduler-controller](deploy/controller/deployment.yaml):
```bash
kubectl apply ./deploy/controller/rbac.yaml && kubectl apply -f ./deploy/controller/deployment.yaml
```

### 4. Schedule Pods With Crane-scheduler
Test Crane-scheduler with following example:
```yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: cpu-stress
spec:
selector:
matchLabels:
app: cpu-stress
replicas: 1
template:
metadata:
labels:
app: cpu-stress
spec:
schedulerName: crane-scheduler
hostNetwork: true
tolerations:
- key: node.kubernetes.io/network-unavailable
operator: Exists
effect: NoSchedule
containers:
- name: stress
image: docker.io/gocrane/stress:latest
command: ["stress", "-c", "1"]
resources:
requests:
memory: "1Gi"
cpu: "1"
limits:
memory: "1Gi"
cpu: "1"
```
>**Note:** Change `crane-scheduler` to `default-scheduler` if `crane-scheduler` is used as default.

There will be the following event if the test pod is successfully scheduled:
```bash
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 28s crane-scheduler Successfully assigned default/cpu-stress-7669499b57-zmrgb to vm-162-247-ubuntu
```
1 change: 1 addition & 0 deletions mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,7 @@ nav:
- Analytics and Recommendation: tutorials/analytics-and-recommendation.md
- Qos Ensurance: tutorials/using-qos-ensurance.md
- Time Series Prediction: tutorials/using-time-series-prediction.md
- Load-aware Scheduling: tutorials/scheduling-pods-based-on-actual-node-load.md
- Proposals:
- Advanced CpuSet Manager: proposals/20220228-advanced-cpuset-manger.md
- Contributing: CONTRIBUTING.md
Expand Down

0 comments on commit cb35820

Please sign in to comment.