You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dynolog provides system telemetry at Meta as well as in open source environments. Metric logging using Prometheus - an industry standard framework for logging/exporting metrics. This can also be leveraged by Meta AI Research super cluster and other open source infra based clusters.
Prometheus
Prometheus is an open source tool for metrics collection and publishing. One can use it to monitor metics remotely, graph them as well as integrate with Grafana for visualization.
A core concept in Prometheus is its data model. It consists of labels - a list of attributes of entities to associate with the metric (ex “ {nodename, gpu id}”), and metrics - numerical values that represent points in a time series..
Prometheus server runs on the box or node. Typically, it uses a pull model, obtaining the latest values of metrics and labels. (Visualized in diagram above)
TLDR
Dynolog provides system telemetry at Meta as well as in open source environments. Metric logging using Prometheus - an industry standard framework for logging/exporting metrics. This can also be leveraged by Meta AI Research super cluster and other open source infra based clusters.
Prometheus
Prometheus is an open source tool for metrics collection and publishing. One can use it to monitor metics remotely, graph them as well as integrate with Grafana for visualization.
Implementation
We can leverage the library https://github.com/jupp0r/prometheus-cpp/ that is straightforward to use.
The text was updated successfully, but these errors were encountered: