Merge pull request #292 from autumn0207/add_docs_for_crane_scheduler

add docs for crane-scheduler
gocrane · Apr 29, 2022 · cb35820 · cb35820
2 parents 85108d0 + 06caad7
commit cb35820
Show file tree

Hide file tree

Showing 9 changed files with 271 additions and 0 deletions.
diff --git a/README.md b/README.md
@@ -23,6 +23,7 @@ The goal of Crane is to provide a one-stop-shop project to help Kubernetes users
   - Effective Pod Autoscaling (Effective Horizontal & Vertical Pod Autoscaling)
   - Cost Optimization
 - **Enhanced QoS** based on Pod PriorityClass
+- **Load-aware Scheduling** 
 
 <img alt="Crane Overview" height="550" src="docs/images/crane-overview.png" width="800"/>
 

diff --git a/docs/images/dynamic-scheduler-plugin.png b/docs/images/dynamic-scheduler-plugin.png
diff --git a/docs/index.md b/docs/index.md
@@ -10,6 +10,8 @@ The goal of Crane is to provide a one-stop-shop project to help Kubernetes users
     - Effective Pod Autoscaling (Effective Horizontal & Vertical Pod Autoscaling)
     - Cost Optimization
 - **Enhanced QoS** based on Pod PriorityClass
+- **Load-aware Scheduling** 
+
 
 ![Crane Overview](images/crane-overview.png)
 
@@ -47,6 +49,11 @@ Avoidance Actions:
 
 Please see [this document](tutorials/using-qos-ensurance.md) to learn more.
 
+## Load-aware Scheduling
+Native scheduler of kubernetes can only schedule pods by resource request, which can easily cause a series of load uneven problems. In contrast, Crane-scheduler can get the actual load of kubernetes nodes from Prometheus, and achieve more efficient scheduling.
+
+Please see [this document](tutorials/scheduling-pods-based-on-actual-node-load.md) to learn more.
+
 ## Repositories
 
 Crane is composed of the following components:
@@ -62,3 +69,4 @@ Crane is composed of the following components:
 - [crane-agent](https://github.com/gocrane/crane/tree/main/cmd/crane-agent) - Ensure critical workloads SLO based on abnormally detection.
 - [gocrane/api](https://github.com/gocrane/api) - This repository defines component-level APIs for the Crane platform.
 - [gocrane/fadvisor](https://github.com/gocrane/fadvisor) - Financial advisor which collect resource prices from cloud API.
+- [gocrane/crane-scheduler](https://github.com/gocrane/crane-scheduler) - A Kubernetes scheduler which can schedule pod based on actual node load.
diff --git a/docs/index.zh.md b/docs/index.zh.md
@@ -10,6 +10,7 @@ The goal of Crane is to provide a one-stop-shop project to help Kubernetes users
   - Effective Pod Autoscaling (Effective Horizontal & Vertical Pod Autoscaling)
   - Cost Optimization
 - **Enhanced QoS** based on Pod PriorityClass
+- **Load-aware Scheduling** 
 
 ![Crane Overview](images/crane-overview.png)
 
@@ -47,6 +48,10 @@ Avoidance Actions:
 
 Please see [this document](tutorials/using-qos-ensurance.md) to learn more.
 
+## 负载感知调度
+原生的`Kubernetes`调度器只能基于资源的`Request`进行调度业务，这很容易导致集群负载不均的问题。与之对比的是，`Crane-scheudler`可以直接从`Prometheus`获取节点的真实负载情况，从而实现更有效的调度。
+
+更多请参见[文档](tutorials/scheduling-pods-based-on-actual-node-load.md)。
 ## Repositories
 
 Crane is composed of the following components:
@@ -62,4 +67,5 @@ Crane is composed of the following components:
 - [crane-agent](https://github.com/gocrane/crane/tree/main/cmd/crane-agent) - Ensure critical workloads SLO based on abnormally detection.
 - [gocrane/api](https://github.com/gocrane/api) - This repository defines component-level APIs for the Crane platform.
 - [gocrane/fadvisor](https://github.com/gocrane/fadvisor) - Financial advisor which collect resource prices from cloud API.
+- [gocrane/crane-scheduler](https://github.com/gocrane/crane-scheduler) - 一个可以基于真实负载对业务进行调度的 `Kubernestes` 调度器。
 
diff --git a/docs/installation.md b/docs/installation.md
@@ -40,6 +40,11 @@ helm install crane -n crane-system --create-namespace crane/crane
 helm install fadvisor -n crane-system --create-namespace crane/fadvisor
 ```
 
+### Deploying Crane-scheduler(optional)
+```bash
+helm install scheduler -n crane-system --create-namespace crane/scheduler
+```
+
 ### Verify Installation
 
 Check deployments are all available by running:

diff --git a/docs/installation.zh.md b/docs/installation.zh.md
@@ -49,6 +49,11 @@ helm install crane -n crane-system --create-namespace crane/crane
 helm install fadvisor -n crane-system --create-namespace crane/fadvisor
 ```
 
+### 安装 Crane-scheduler（可选）
+```console
+helm install scheduler -n crane-system --create-namespace crane/scheduler
+```
+
 ## 验证安装是否成功
 
 使用如下命令检查安装的 Deployment 是否正常：

diff --git a/docs/tutorials/dynamic-scheduler-plugin.md b/docs/tutorials/dynamic-scheduler-plugin.md
@@ -0,0 +1,32 @@
+# Dynamic-scheduler: a load-aware scheduler plugin 
+
+## Introduction
+Native scheduler of kubernetes can only schedule pods by resource request, which can easily cause a series of load uneven problems:
+- for some nodes, the actual load is not much different from the resource request, which will lead to a very high probability of stability problems.
+- for others, the actual load is much smaller than the resource request, which will lead to a huge waste of resources.
+
+To solve these problems, Dynamic scheduler builds a simple but efficient model based on actual node utilization data，and filters out those nodes with high load to balance the cluster.
+## Design Details
+### Architecture
+<img src="./../images/dynamic-scheduler-plugin.png" div align=“center” width="600" height="350"/>
+
+
+As shown above, Dynamic scheduler relies on `Prometheus` and `Node-exporter` to collect and aggregate metrics data, and it consists of two components:
+- `Node-annotator` periodically pulls data from Prometheus and marks them with timestamp on the node in the form of annotations.
+>**Note:** `Node-annotator` is currently a module of `Crane-scheduler-controller`.
+- `Dynamic plugin` reads the load data directly from the node's annotation, filters and scores candidates based on a simple algorithm.
+
+###  Scheduler Policy
+Dynamic provides a default [scheduler policy](../deploy/manifests/policy.yaml) and supports user-defined policies. The default policy reies on following metrics:
+- `cpu_usage_avg_5m` 
+- `cpu_usage_max_avg_1h`
+- `cpu_usage_max_avg_1d`
+- `mem_usage_avg_5m`
+- `mem_usage_max_avg_1h`
+- `mem_usage_max_avg_1d`
+
+At the scheduling `Filter` stage, the node will be filtered if the actual usage rate of this node is greater than the threshold of any of the above metrics. And at the `Score` stage, the final score is the weighted sum of these metrics' values.
+
+### Hot Value
+In the production cluster, scheduling hotspots may occur frequently because the load of the nodes can not increase immediately after the pod is created. Therefore, we define an extra metrics named `Hot Value`, which represents the scheduling frequency of the node in recent times. And the final priority of the node is the final score minus the `Hot Value`.
+
diff --git a/docs/tutorials/scheduling-pods-based-on-actual-node-load.md b/docs/tutorials/scheduling-pods-based-on-actual-node-load.md
@@ -0,0 +1,213 @@
+# Crane-scheduler
+
+## Overview
+Crane-scheduler is a collection of scheduler plugins based on [scheduler framework](https://kubernetes.io/docs/concepts/scheduling-eviction/scheduling-framework/), including:
+
+- [Dynamic scheuler: a load-aware scheduler plugin](./dynamic-scheduler-plugin.md)
+
+## Get Started
+
+### 1. Install Prometheus
+Make sure your kubernetes cluster has Prometheus installed. If not, please refer to [Install Prometheus](https://github.com/gocrane/fadvisor/blob/main/README.md#prerequests).
+
+### 2. Configure Prometheus Rules
+1) Configure the rules of Prometheus to get expected aggregated data:
+```yaml
+apiVersion: monitoring.coreos.com/v1
+kind: PrometheusRule
+metadata:
+    name: example-record
+spec:
+    groups:
+    - name: cpu_mem_usage_active
+        interval: 30s
+        rules:
+        - record: cpu_usage_active
+        expr: 100 - (avg by (instance) (irate(node_cpu_seconds_total{mode="idle"}[30s])) * 100)
+        - record: mem_usage_active
+        expr: 100*(1-node_memory_MemAvailable_bytes/node_memory_MemTotal_bytes)
+    - name: cpu-usage-5m
+        interval: 5m
+        rules:
+        - record: cpu_usage_max_avg_1h
+        expr: max_over_time(cpu_usage_avg_5m[1h])
+        - record: cpu_usage_max_avg_1d
+        expr: max_over_time(cpu_usage_avg_5m[1d])
+    - name: cpu-usage-1m
+        interval: 1m
+        rules:
+        - record: cpu_usage_avg_5m
+        expr: avg_over_time(cpu_usage_active[5m])
+    - name: mem-usage-5m
+        interval: 5m
+        rules:
+        - record: mem_usage_max_avg_1h
+        expr: max_over_time(mem_usage_avg_5m[1h])
+        - record: mem_usage_max_avg_1d
+        expr: max_over_time(mem_usage_avg_5m[1d])
+    - name: mem-usage-1m
+        interval: 1m
+        rules:
+        - record: mem_usage_avg_5m
+        expr: avg_over_time(mem_usage_active[5m])
+```
+>**⚠️Troubleshooting:** The sampling interval of Prometheus must be less than 30 seconds, otherwise the above rules(such as cpu_usage_active) may not take effect.
+2) Update the configuration of Prometheus service discovery to ensure that node_exporters/telegraf are using node name as instance name:
+```yaml
+    - job_name: kubernetes-node-exporter
+      tls_config:
+        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
+        insecure_skip_verify: true
+      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
+      scheme: https
+      kubernetes_sd_configs:
+      ...
+      # Host name
+      - source_labels: [__meta_kubernetes_node_name]
+        target_label: instance
+      ...
+```
+>**Note:** This step can be skipped if the node name itself is the host IP.
+
+### 3. Install Crane-scheduler
+There are two options:
+1) Install Crane-scheduler as a second scheduler:
+   ```bash
+   helm repo add crane https://gocrane.github.io/helm-charts
+   helm install scheduler -n crane-system --create-namespace --set global.prometheusAddr="REPLACE_ME_WITH_PROMETHEUS_ADDR" crane/scheduler
+   ```
+2) Replace native Kube-scheduler with Crane-scheduler:
+   1) Backup `/etc/kubernetes/manifests/kube-scheduler.yaml`
+   ```bash
+   cp /etc/kubernetes/manifests/kube-scheduler.yaml /etc/kubernetes/
+   ```
+   2) Modify configfile of kube-scheduler(`scheduler-config.yaml`) to enable Dynamic scheduler plugin and configure plugin args:
+   ```yaml
+   apiVersion: kubescheduler.config.k8s.io/v1beta2
+   kind: KubeSchedulerConfiguration
+   ...
+   profiles:
+   - schedulerName: default-scheduler
+     plugins:
+       filter:
+         enabled:
+         - name: Dynamic
+       score:
+         enabled:
+         - name: Dynamic
+           weight: 3
+     pluginConfig:
+     - name: Dynamic
+        args:
+         policyConfigPath: /etc/kubernetes/policy.yaml
+   ...
+   ```
+   3) Create `/etc/kubernetes/policy.yaml`, using as scheduler policy of Dynamic plugin:
+   ```yaml
+    apiVersion: scheduler.policy.crane.io/v1alpha1
+    kind: DynamicSchedulerPolicy
+    spec:
+      syncPolicy:
+        ##cpu usage
+        - name: cpu_usage_avg_5m
+          period: 3m
+        - name: cpu_usage_max_avg_1h
+          period: 15m
+        - name: cpu_usage_max_avg_1d
+          period: 3h
+        ##memory usage
+        - name: mem_usage_avg_5m
+          period: 3m
+        - name: mem_usage_max_avg_1h
+          period: 15m
+        - name: mem_usage_max_avg_1d
+          period: 3h
+
+      predicate:
+        ##cpu usage
+        - name: cpu_usage_avg_5m
+          maxLimitPecent: 0.65
+        - name: cpu_usage_max_avg_1h
+          maxLimitPecent: 0.75
+        ##memory usage
+        - name: mem_usage_avg_5m
+          maxLimitPecent: 0.65
+        - name: mem_usage_max_avg_1h
+          maxLimitPecent: 0.75
+
+      priority:
+        ##cpu usage
+        - name: cpu_usage_avg_5m
+          weight: 0.2
+        - name: cpu_usage_max_avg_1h
+          weight: 0.3
+        - name: cpu_usage_max_avg_1d
+          weight: 0.5
+        ##memory usage
+        - name: mem_usage_avg_5m
+          weight: 0.2
+        - name: mem_usage_max_avg_1h
+          weight: 0.3
+        - name: mem_usage_max_avg_1d
+          weight: 0.5
+
+      hotValue:
+        - timeRange: 5m
+          count: 5
+        - timeRange: 1m
+          count: 2
+   ```
+   4) Modify `kube-scheduler.yaml` and replace kube-scheduler image with Crane-scheduler：
+   ```yaml
+   ...
+    image: docker.io/gocrane/crane-scheduler:0.0.23
+   ...
+   ```
+   1) Install [crane-scheduler-controller](deploy/controller/deployment.yaml):
+    ```bash
+    kubectl apply ./deploy/controller/rbac.yaml && kubectl apply -f ./deploy/controller/deployment.yaml
+    ```
+
+### 4. Schedule Pods With Crane-scheduler
+Test Crane-scheduler with following example:
+```yaml
+apiVersion: apps/v1
+kind: Deployment
+metadata:
+  name: cpu-stress
+spec:
+  selector:
+    matchLabels:
+      app: cpu-stress
+  replicas: 1
+  template:
+    metadata:
+      labels:
+        app: cpu-stress
+    spec:
+      schedulerName: crane-scheduler
+      hostNetwork: true
+      tolerations:
+      - key: node.kubernetes.io/network-unavailable
+        operator: Exists
+        effect: NoSchedule
+      containers:
+      - name: stress
+        image: docker.io/gocrane/stress:latest
+        command: ["stress", "-c", "1"]
+        resources:
+          requests:
+            memory: "1Gi"
+            cpu: "1"
+          limits:
+            memory: "1Gi"
+            cpu: "1"
+```
+>**Note:** Change `crane-scheduler` to `default-scheduler` if `crane-scheduler` is used as default.
+
+There will be the following event if the test pod is successfully scheduled:
+```bash
+Type    Reason     Age   From             Message
+----    ------     ----  ----             -------
+Normal  Scheduled  28s   crane-scheduler  Successfully assigned default/cpu-stress-7669499b57-zmrgb to vm-162-247-ubuntu
+```
diff --git a/mkdocs.yml b/mkdocs.yml
@@ -69,6 +69,7 @@ nav:
       - Analytics and Recommendation: tutorials/analytics-and-recommendation.md
       - Qos Ensurance: tutorials/using-qos-ensurance.md
       - Time Series Prediction: tutorials/using-time-series-prediction.md
+      - Load-aware Scheduling: tutorials/scheduling-pods-based-on-actual-node-load.md
   - Proposals:
       - Advanced CpuSet Manager: proposals/20220228-advanced-cpuset-manger.md
   - Contributing: CONTRIBUTING.md