Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YUNIKORN-2749] Place all observability related doc in same directory #465

Merged
merged 2 commits into from
Aug 26, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/get_started/get_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -101,4 +101,4 @@ YuniKorn UI provides a centralised view for cluster resource capacity, utilizati

Besides, YuniKorn also exposes its scheduling metrics via Prometheus.

If you want to monitor the YuniKorn core services by using Prometheus and Grafana, you can find our step-by-step tutorial [here](../user_guide/prometheus.md).
If you want to monitor the YuniKorn core services by using Prometheus and Grafana, you can find our step-by-step tutorial [here](../user_guide/observability/prometheus.md).
Original file line number Diff line number Diff line change
Expand Up @@ -37,7 +37,7 @@ helm repo add grafana https://grafana.github.io/helm-charts
helm repo update
helm upgrade --install grafana grafana/grafana -n meta --create-namespace
```
![grafana](../assets/grafana.png)
![grafana](../../assets/grafana.png)

## Install Loki
1. Create the `binary.yaml` file
Expand Down Expand Up @@ -120,7 +120,7 @@ bloomGateway:
```
helm upgrade --install loki grafana/loki -n meta -f binary.yaml
```
![loki](../assets/loki.png)
![loki](../../assets/loki.png)


:::info[Troubleshoot]
Expand All @@ -142,7 +142,7 @@ config:
```
helm upgrade --install promtail grafana/promtail -f promtail.yaml
```
![promtail](../assets/promtail.png)
![promtail](../../assets/promtail.png)

## Grafana settings to connect to Loki
### 1. Access the Grafana Web UI
Expand All @@ -155,15 +155,15 @@ After running port forwarding, you can access Grafana's web interface by [localh
In grafana, adding a loki data source with url and http headers allows grafana to fetch logs.

#### set URL field `http://loki-gateway`
![setting_1](../assets/grafana_loki_setting_1.png)
![setting_1](../../assets/grafana_loki_setting_1.png)

#### In order to fetch logs from promtail which tenantID is user, set HTTP headers field X-Scope-OrgId with user.
![setting_2](../assets/grafana_loki_setting_2.png)
![setting_2](../../assets/grafana_loki_setting_2.png)

## Loki log result
1. Set tracking target
![track_target](../assets/loki_track_tg.png)
![track_target](../../assets/loki_track_tg.png)
2. bar chart
![bar_chart](../assets/loki_log_1.png)
![bar_chart](../../assets/loki_log_1.png)
3. INFO log
![logs trace](../assets/loki_log_2.png)
![logs trace](../../assets/loki_log_2.png)
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ YuniKorn exposes its scheduling metrics via Prometheus. Thus, we need to set up

We will provide two methods for building Prometheus: either running it locally or using Helm to deploy it in your cluster. Additionally, in the Helm version, we will explain how to integrate it with Grafana and provide generic Grafana Dashboards for monitoring Yunikorn's metrics and observing the changes over time.

If you don't know what metric can be used, you can use [REST API](../api/scheduler.md#metrics).
If you don't know what metric can be used, you can use [REST API](../../api/scheduler.md#metrics).

## Run Prometheus locally

Expand Down Expand Up @@ -75,13 +75,13 @@ kubectl port-forward svc/yunikorn-service 9080:9080 -n yunikorn
./prometheus --config.file=prometheus.yml
```

![prometheus-cmd](../assets/prometheus-cmd.png)
![prometheus-cmd](../../assets/prometheus-cmd.png)

### 5. Access the Prometheus UI

You should be able to browse to a status page at [localhost:9090](http://localhost:9090/). Give it a couple of seconds to collect data about itself from its own HTTP metrics endpoint.

![prometheus-web-ui](../assets/prometheus-web-ui.png)
![prometheus-web-ui](../../assets/prometheus-web-ui.png)

You can also verify that Prometheus is serving metrics by navigating to its metrics endpoint:[localhost:9090/metrics](http://localhost:9090/metrics)

Expand Down Expand Up @@ -146,7 +146,7 @@ After running port-forward, you can enter [localhost:9090](http://localhost:9090

We can search yunikorn keyword to find the mointor target in targets page, and check the status.

![prometheus-web-ui-target-yunikorn](../assets/prometheus-web-ui-target-yunikorn.png)
![prometheus-web-ui-target-yunikorn](../../assets/prometheus-web-ui-target-yunikorn.png)

## Access Grafana Dashboard

Expand All @@ -158,24 +158,24 @@ kubectl port-forward -n prometheus svc/prometheus-grafana 7070:80

After running port-forward, you can enter [localhost:7070](http://localhost:7070) to access grafana, and in the login page, enter account:`admin` ,password:`prom-operator`.

![grafana-login-page](../assets/grafana_login_page.png)
![grafana-login-page](../../assets/grafana_login_page.png)

### Download JSON files for Yunikorn Dashboard

A dashboard consists of multiple panels that are organized and arranged in rows. Each panel has the ability to interact with data from any Grafana data source that has been configured. For more detailed information, please refer to the [Grafana Dashboards](https://grafana.com/docs/grafana/latest/dashboards).

We provide a sample dashboard JSON file. To access it, you can navigate to the `/deployments/grafana-dashboard` directory in the Yunikorn-k8shim repository.

You can refer to the [REST API](../api/scheduler.md#metrics) to build your own custom Dashboard.
You can refer to the [REST API](../../api/scheduler.md#metrics) to build your own custom Dashboard.

### Import the JSON files in the Dashboard

Once you access the Dashboard page, you can proceed to import the provided JSON file.

![import_dashboard_01](../assets/import_dashboard_01.png)
![import_dashboard_01](../../assets/import_dashboard_01.png)

![import_dashboard_02](../assets/import_dashboard_02.png)
![import_dashboard_02](../../assets/import_dashboard_02.png)

Once the import is complete, you will be able to locate Yunikorn's Dashboard on the page. From there, you can regularly monitor the status of Yunikorn.

![grafana_dashboard](../assets/grafana_dashboard.png)
![grafana_dashboard](../../assets/grafana_dashboard.png)
29 changes: 18 additions & 11 deletions sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -34,8 +34,6 @@ module.exports = {
'user_guide/resource_quota_management',
'user_guide/gang_scheduling',
'user_guide/labels_and_annotations_in_yunikorn',
'user_guide/prometheus',
'user_guide/loki',
'user_guide/use_cases',
'user_guide/event_system',
{
Expand All @@ -53,6 +51,23 @@ module.exports = {
'user_guide/workloads/run_ray_service',
],
},
{
type: 'category',
label: 'Observability',
items: [
'user_guide/observability/prometheus',
'user_guide/observability/loki',
{
type: 'category',
label: 'Metrics for Prometheus',
items: [
'metrics/scheduler',
'metrics/runtime',
'metrics/queue',
]
},
],
},
{
type: 'category',
label: 'REST APIs',
Expand All @@ -62,15 +77,7 @@ module.exports = {
'api/system'
]
},
{
type: 'category',
label: 'Metrics for Prometheus',
items: [
'metrics/scheduler',
'metrics/runtime',
'metrics/queue',
]
},

'user_guide/troubleshooting'
],
'Developer Guide': [
Expand Down