Skip to content

Commit

Permalink
docs: cover metrics export feature
Browse files Browse the repository at this point in the history
- Add metrics export configuration documentation.
- Mention metrics export optional feature in quickstart guide.
- Mention metrics export feature in overview page.
- Mention possible Prometheus integration in architecture page.
  • Loading branch information
rezib committed Oct 28, 2024
1 parent e14edbd commit ba21b34
Show file tree
Hide file tree
Showing 11 changed files with 623 additions and 466 deletions.
7 changes: 6 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,12 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- show-conf: Introduce `slurm-web-show-conf` utility to dump current
configuration settings of gateway and agent components with their origin,
which can either be configuration definition file or site override (#349).
- docs: Add manpage for `slurm-web-show-conf` command.
- docs:
- Add manpage for `slurm-web-show-conf` command.
- Add metrics export configuration documentation.
- Mention metrics export optional feature in quickstart guide.
- Mention metrics export feature in overview page.
- Mention possible Prometheus integration in architecture page.

### Changed
- docs: Update configuration reference documentation.
Expand Down
1 change: 1 addition & 0 deletions docs/modules/conf/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@
** xref:conf/gateway.adoc[Gateway]
** xref:conf/agent.adoc[Agent]
* xref:policy.adoc[]
* xref:metrics.adoc[]
121 changes: 121 additions & 0 deletions docs/modules/conf/pages/metrics.adoc
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
= Metrics Export

Slurm-web agent can export metrics in standard OpenMetrics format on `/metrics`
endpoint. This is designed to be scraped by Prometheus (or compatible) in order
to store metrics in timeseries databases and draw diagrams of historical data.

This page explains how to enable and secure this feature by
<<restrict,restricting access>> to specific hosts and
<<prometheus,configure Prometheus>> to scrap these metrics. It also provides a
<<reference,reference list of all available metrics>>.

== Configuration

The metrics export feature is disabled by default. It can be enabled with the
following lines in [.path]#`/etc/slurm-web/agent.ini`#:

[source,ini]
----
[metrics]
enabled=yes
----

.More details
****
* xref:conf/agent.adoc#_metrics[Agent configuration metrics section reference documentation].
****

[#restrict]
== Host Restriction

For security reasons, Slurm-web agent restrict access to `/metrics` endpoint to
localhost only. When Prometheus is running on external hosts, you must define
`restrict` parameter in [.path]#`/etc/slurm-web/agent.ini`# to allow other
networks explicitely. For example:

[source,ini]
----
[metrics]
enabled=yes
restrict=
192.168.1.0/24
10.0.0.251/32
----

In this example, all IP addresses in range `192.168.1.[0-254]` and `10.0.0.251`
are permitted to request metrics.

.More details
****
* xref:conf/agent.adoc#_metrics[Agent configuration reference documentation for metrics section].
****

[#prometheus]
== Prometheus Integration

Prometheus must be configured to request `/metrics` endpoint of Slurm-web agent.
Edit [.path]#`/etc/prometheus/prometheus.yml`# to add one of the following
configuration snippets depending of your setup:

* Slurm-web agent running as native service (ie. with
`slurm-web-agent.service`):

[source,yaml]
----
scrape_configs:
- job_name: slurm
scrape_interval: 30s
static_configs:
- targets: ['localhost:5012']
----

* Slurm-web agent running on xref:wsgi/index.adoc[production HTTP server]:

[source,yaml]
----
scrape_configs:
- job_name: slurm
scrape_interval: 30s
metrics_path: /agent/metrics
static_configs:
- targets: ['localhost:80']
----

NOTE: You may need to adjust the target hostname, typically if Prometheus is
running on a remote host, and destination port (for example 443 for HTTPS).

.Reference
****
* https://prometheus.io/docs/prometheus/latest/configuration/configuration/[Prometheus Official Configuration Documentation].
****

[#reference]
== Available Metrics

This table describes all metrics exported by Slurm-web:

[cols="1l,3a"]
|===
|Metric|Description

|slurm_nodes[state]
|Number of compute nodes in a given state. Supported states are: _idle_,
_mixed_, _allocated_, _down_, _drain_ and _unknown_.

|slurm_nodes_total
|Total number of compute nodes managed by Slurm.

|slurm_cores[state]
|Number of cores of compute nodes in a given state. Supported states are:
_idle_, _mixed_, _allocated_, _down_, _drain_ and _unknown_.

|slurm_cores_total
|Total number of cores on compute nodes managed by Slurm.

|slurm_jobs[state]
|Number of jobs in a given state in Slurm controller queue. Supported states
are: _running_, _completed_, _completing_, _cancelled_, _pending_ and _unknown_.

|slurm_jobs_total
|Total number of jobs in Slurm controller queue.
|===
22 changes: 22 additions & 0 deletions docs/modules/install/pages/quickstart.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -637,6 +637,28 @@ xref:misc:troubleshooting.adoc#wsgi[troubleshooting guide] for help.
* xref:conf:wsgi/index.adoc[Production HTTP server setup guide].
****

== Metrics (optional)

Slurm-web offers the possibility to
xref:overview:overview.adoc#metrics[export Slurm metrics] in
https://openmetrics.io/[OpenMetrics] format and integrate with
https://prometheus.io/[Prometheus]. This feature can be used to store metrics in
timeseries databases and draw diagrams of historical data.

This feature is disabled by default. It can be enabled with the following lines
in [.path]#`/etc/slurm-web/agent.ini`#:

[source,ini]
----
[metrics]
enabled=yes
----

.More details
****
* xref:conf:metrics.adoc[Metrics export configuration documentation].
****

== Multi-clusters

Slurm-web is designed to support
Expand Down
Binary file modified docs/modules/overview/images/arch/slurm-web_integration.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit ba21b34

Please sign in to comment.