Skip to content

Commit

Permalink
Adding Prometheus metrics framework (#460)
Browse files Browse the repository at this point in the history
* Test Prometheus client with sample metrics

* Updated documentation and additional labels for model metrics

* Updated inference unit tests to check for metrics; Added openapi docs

* Addressed review comments

* Added POSTMAN regression tests for metrics

* Update connecter types

* Bumping serving-sdk dependency

* Dropping metrics plugins support for now

* Fixed erroneous test cases

* added flag to enable or disable prometheous metric api

* Adding missing file

* doc changes and fixed typo

* Update configuration.md

made section for new params related to metric

* Update metrics_api.md

* Update MetricAggregator.java

Fixed constant name

Co-authored-by: dhaniram-kshirsagar <[email protected]>
Co-authored-by: dhaniram kshirsagar <[email protected]>
  • Loading branch information
3 people committed Aug 10, 2020
1 parent 4a67a43 commit b89d1ca
Show file tree
Hide file tree
Showing 41 changed files with 907 additions and 144 deletions.
10 changes: 9 additions & 1 deletion docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -81,6 +81,7 @@ See [Enable SSL](#enable-ssl) to configure HTTPS.

* `inference_address`: Inference API binding address. Default: http://127.0.0.1:8080
* `management_address`: management API binding address. Default: http://127.0.0.1:8081
* `metrics_address`: metrics API binding address. Default: http://127.0.0.1:8082
* To run predictions on models on a public IP address, specify the IP address as `0.0.0.0`.
To run predictions on models on a specific IP address, specify the IP address and port.

Expand All @@ -98,7 +99,7 @@ inference_address=https://172.16.1.10:8080

### Enable SSL

To enable HTTPs, you can change `inference_address` or `management_address` protocol from http to https. For example: `inference_address=https://127.0.0.1`.
To enable HTTPs, you can change `inference_address`, `management_address` or `metrics_address` protocol from http to https. For example: `inference_address=https://127.0.0.1`.
The default is port 443, but you can make TorchServe listen on whatever port you set to accept https requests.
For example, to receive https traffic on port 8443, you would use: `inference_address=https://127.0.0.1:8443`.

Expand Down Expand Up @@ -126,6 +127,7 @@ Configure the following properties in config.properties:
```bash
inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
metrics_address=https://127.0.0.1:8445
keystore=keystore.p12
keystore_pass=changeit
keystore_type=PKCS12
Expand All @@ -142,6 +144,7 @@ Config following property in config.properties:
```properties
inference_address=https://127.0.0.1:8443
management_address=https://127.0.0.1:8444
metrics_address=https://127.0.0.1:8445
private_key_file=mykey.key
certificate_file=mycert.pem
```
Expand Down Expand Up @@ -193,6 +196,11 @@ By default, TorchServe uses all available GPUs for inference. Use `number_of_gpu

* `number_of_gpu`: Maximum number of GPUs that TorchServe can use for inference. Default: all available GPUs in system.

### Enable metrics api
* `enable_metrics_api` : Enable or disable metric apis i.e. it can be either `true` or `false`. Default: true (Enabled)
* `metrics_format` : Use this to specify metric report format . At present, the only supported and default value for this is `prometheus'
This is used in conjunction with `enable_meterics_api` option above.

### Other properties

Most of the following properties are designed for performance tuning. Adjusting these numbers will impact scalability and throughput.
Expand Down
66 changes: 66 additions & 0 deletions docs/metrics_api.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Metrics API

Metrics API is listening on port 8082 and only accessible from localhost by default. To change the default setting, see [TorchServe Configuration](configuration.md). The default metrics endpoint returns Prometheus formatted metrics. You can query metrics using curl requests or point a [Prometheus Server](#prometheus-server) to the endpoint and use [Grafana](#grafana) for dashboards.

By default these APIs are enable however same can be disabled by setting `enable_metrics_api=false` in torchserve config.properties file.
For details refer [Torchserve config](configuration.md) docs.

```console
curl http://127.0.0.1:8082/metrics

# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1990.348
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 2032.411
# HELP ts_inference_requests_total Total number of inference requests.
# TYPE ts_inference_requests_total counter
ts_inference_requests_total{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1.0
ts_inference_requests_total{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 1.0
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 364.884
ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 82.349
```

```console
curl "http://127.0.0.1:8082/metrics?name[]=ts_inference_latency_microseconds&name[]=ts_queue_latency_microseconds" --globoff

# HELP ts_inference_latency_microseconds Cumulative inference duration in microseconds
# TYPE ts_inference_latency_microseconds counter
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 1990.348
ts_inference_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 2032.411
# HELP ts_queue_latency_microseconds Cumulative queue duration in microseconds
# TYPE ts_queue_latency_microseconds counter
ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noopversioned",model_version="1.11",} 364.884
ts_queue_latency_microseconds{uuid="d5f84dfb-fae8-4f92-b217-2f385ca7470b",model_name="noop",model_version="default",} 82.349
```

#### Prometheus server

To view these metrics on a Prometheus server, download and install using the instructions [here](https://prometheus.io/download/#prometheus). Create a minimal `prometheus.yml` config file as below and run `./prometheus --config.file=prometheus.yml`.

```yaml
global:
scrape_interval: 15s
evaluation_interval: 15s

scrape_configs:
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
- job_name: 'torchserve'
static_configs:
- targets: ['localhost:8082'] #TorchServe metrics endpoint
```
Navigate to http://localhost:9090/ on a browser to execute queries and create graphs
<img width="1231" alt="PrometheusServer" src="https://user-images.githubusercontent.com/880376/86984450-806fc680-c143-11ea-9ae2-f2ef42f24f4c.png">
#### Grafana
Once you have the Torchserve and Prometheus servers running, you can further [setup](https://prometheus.io/docs/visualization/grafana/) Grafana, point it to Prometheus server and navigate to http://localhost:3000/ to create dashboards and graphs.
You can use command given below to start Grafana -
`sudo systemctl daemon-reload && sudo systemctl enable grafana-server && sudo systemctl start grafana-server`

<img width="1220" alt="Screen Shot 2020-07-08 at 5 51 57 PM" src="https://user-images.githubusercontent.com/880376/86984550-c4fb6200-c143-11ea-9434-09d4d43dd6d4.png">
1 change: 1 addition & 0 deletions docs/rest_api.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ When TorchServe starts, it starts two web services:

* [Inference API](inference_api.md)
* [Management API](management_api.md)
* [Metrics API](metrics_api.md)

By default, TorchServe listens on port 8080 for the Inference API and 8081 for the Management API.
Both APIs are accessible only from localhost by default. To enable access from a remote host, see [TorchServe Configuration](configuration.md).
5 changes: 3 additions & 2 deletions frontend/gradle.properties
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
org.gradle.daemon=true
org.gradle.jvmargs=-Xmx1024M
commons_cli_version=1.3.1
gson_version=2.8.5
prometheus_version=0.9.0
netty_version=4.1.50.Final
slf4j_api_version=1.7.25
slf4j_log4j12_version=1.7.25
gson_version=2.8.5
commons_cli_version=1.3.1
testng_version=7.1.0
torchserve_sdk_version=0.0.3
2 changes: 2 additions & 0 deletions frontend/server/build.gradle
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
dependencies {
implementation "io.netty:netty-all:${netty_version}"
implementation "io.prometheus:simpleclient:${prometheus_version}"
implementation "io.prometheus:simpleclient_servlet:${prometheus_version}"
implementation project(":modelarchive")
implementation "commons-cli:commons-cli:${commons_cli_version}"
implementation "org.pytorch:torchserve-plugins-sdk:${torchserve_sdk_version}"
Expand Down
19 changes: 16 additions & 3 deletions frontend/server/src/main/java/org/pytorch/serve/ModelServer.java
Original file line number Diff line number Diff line change
Expand Up @@ -307,8 +307,9 @@ public List<ChannelFuture> start()

initModelStore();

Connector inferenceConnector = configManager.getListener(false);
Connector managementConnector = configManager.getListener(true);
Connector inferenceConnector = configManager.getListener(ConnectorType.INFERENCE_CONNECTOR);
Connector managementConnector =
configManager.getListener(ConnectorType.MANAGEMENT_CONNECTOR);

inferenceConnector.clean();
managementConnector.clean();
Expand All @@ -334,7 +335,19 @@ public List<ChannelFuture> start()
} else {
futures.add(
initializeServer(
inferenceConnector, serverGroup, workerGroup, ConnectorType.BOTH));
inferenceConnector, serverGroup, workerGroup, ConnectorType.ALL));
}

if (configManager.isMetricApiEnable()) {
EventLoopGroup metricsGroup = serverGroups.getMetricsGroup();
Connector metricsConnector = configManager.getListener(ConnectorType.METRICS_CONNECTOR);
metricsConnector.clean();
futures.add(
initializeServer(
metricsConnector,
serverGroup,
metricsGroup,
ConnectorType.METRICS_CONNECTOR));
}

SnapshotManager.getInstance().saveStartupSnapshot();
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
import org.pytorch.serve.http.InferenceRequestHandler;
import org.pytorch.serve.http.InvalidRequestHandler;
import org.pytorch.serve.http.ManagementRequestHandler;
import org.pytorch.serve.http.PrometheusMetricsRequestHandler;
import org.pytorch.serve.servingsdk.impl.PluginsManager;
import org.pytorch.serve.util.ConfigManager;
import org.pytorch.serve.util.ConnectorType;
Expand Down Expand Up @@ -53,20 +54,26 @@ public void initChannel(Channel ch) {
pipeline.addLast("aggregator", new HttpObjectAggregator(maxRequestSize));

HttpRequestHandlerChain httpRequestHandlerChain = apiDescriptionRequestHandler;
if (ConnectorType.BOTH.equals(connectorType)
if (ConnectorType.ALL.equals(connectorType)
|| ConnectorType.INFERENCE_CONNECTOR.equals(connectorType)) {
httpRequestHandlerChain =
httpRequestHandlerChain.setNextHandler(
new InferenceRequestHandler(
PluginsManager.getInstance().getInferenceEndpoints()));
}
if (ConnectorType.BOTH.equals(connectorType)
if (ConnectorType.ALL.equals(connectorType)
|| ConnectorType.MANAGEMENT_CONNECTOR.equals(connectorType)) {
httpRequestHandlerChain =
httpRequestHandlerChain.setNextHandler(
new ManagementRequestHandler(
PluginsManager.getInstance().getManagementEndpoints()));
}
if (ConfigManager.getInstance().isMetricApiEnable()
&& ConnectorType.ALL.equals(connectorType)
|| ConnectorType.METRICS_CONNECTOR.equals(connectorType)) {
httpRequestHandlerChain =
httpRequestHandlerChain.setNextHandler(new PrometheusMetricsRequestHandler());
}
httpRequestHandlerChain.setNextHandler(invalidRequestHandler);
pipeline.addLast("handler", new HttpRequestHandler(apiDescriptionRequestHandler));
}
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@
import org.pytorch.serve.archive.ModelException;
import org.pytorch.serve.archive.ModelNotFoundException;
import org.pytorch.serve.archive.ModelVersionNotFoundException;
import org.pytorch.serve.metrics.api.MetricAggregator;
import org.pytorch.serve.openapi.OpenApiUtils;
import org.pytorch.serve.servingsdk.ModelServerEndpoint;
import org.pytorch.serve.util.NettyUtils;
Expand Down Expand Up @@ -110,7 +111,6 @@ private void handlePredictions(
if (segments.length == 4) {
modelVersion = segments[3];
}

predict(ctx, req, null, segments[2], modelVersion);
}

Expand Down Expand Up @@ -177,6 +177,7 @@ private void predict(
return;
}

MetricAggregator.handleInferenceMetric(modelName, modelVersion);
Job job = new Job(ctx, modelName, modelVersion, WorkerCommands.PREDICT, input);
if (!ModelManager.getInstance().addJob(job)) {
String responseMessage =
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
package org.pytorch.serve.http;

import io.netty.buffer.ByteBuf;
import io.netty.buffer.ByteBufOutputStream;
import io.netty.buffer.Unpooled;
import io.netty.channel.ChannelHandlerContext;
import io.netty.handler.codec.http.DefaultFullHttpResponse;
import io.netty.handler.codec.http.FullHttpRequest;
import io.netty.handler.codec.http.FullHttpResponse;
import io.netty.handler.codec.http.HttpHeaderNames;
import io.netty.handler.codec.http.HttpResponseStatus;
import io.netty.handler.codec.http.HttpVersion;
import io.netty.handler.codec.http.QueryStringDecoder;
import io.prometheus.client.CollectorRegistry;
import io.prometheus.client.exporter.common.TextFormat;
import java.io.IOException;
import java.io.OutputStream;
import java.io.OutputStreamWriter;
import java.io.Writer;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import org.pytorch.serve.archive.ModelException;
import org.pytorch.serve.util.NettyUtils;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

public class PrometheusMetricsRequestHandler extends HttpRequestHandlerChain {

private static final Logger logger =
LoggerFactory.getLogger(PrometheusMetricsRequestHandler.class);

/** Creates a new {@code MetricsRequestHandler} instance. */
public PrometheusMetricsRequestHandler() {
// TODO: Add plugins manager support
}

@Override
protected void handleRequest(
ChannelHandlerContext ctx,
FullHttpRequest req,
QueryStringDecoder decoder,
String[] segments)
throws ModelException {
if (segments.length >= 2 && "metrics".equals(segments[1])) {
ByteBuf resBuf = Unpooled.directBuffer();
List<String> params =
decoder.parameters().getOrDefault("name[]", Collections.emptyList());
FullHttpResponse resp;
try (OutputStream outputStream = new ByteBufOutputStream(resBuf);
Writer writer = new OutputStreamWriter(outputStream)) {
TextFormat.write004(
writer,
CollectorRegistry.defaultRegistry.filteredMetricFamilySamples(
new HashSet<>(params)));
resp =
new DefaultFullHttpResponse(
HttpVersion.HTTP_1_1, HttpResponseStatus.OK, resBuf);
} catch (IOException e) {
logger.error("Exception encountered while reporting metrics");
throw new ModelException(e.getMessage(), e);
}
resp.headers().set(HttpHeaderNames.CONTENT_TYPE, TextFormat.CONTENT_TYPE_004);
NettyUtils.sendHttpResponse(ctx, resp, true);
} else {
chain.handleRequest(ctx, req, decoder, segments);
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
package org.pytorch.serve.metrics.api;

import org.pytorch.serve.metrics.format.prometheous.PrometheusMetricManager;
import org.pytorch.serve.util.ConfigManager;

public final class MetricAggregator {

private MetricAggregator() {}

public static void handleInferenceMetric(final String modelName, final String modelVersion) {
ConfigManager configMgr = ConfigManager.getInstance();
if (configMgr.isMetricApiEnable()
&& configMgr.getMetricsFormat().equals(ConfigManager.METRIC_FORMAT_PROMETHEUS)) {
PrometheusMetricManager.getInstance().incInferCount(modelName, modelVersion);
}
}

public static void handleInferenceMetric(
final String modelName, final String modelVersion, long timeInQueue, long inferTime) {
ConfigManager configMgr = ConfigManager.getInstance();
if (configMgr.isMetricApiEnable()
&& configMgr.getMetricsFormat().equals(ConfigManager.METRIC_FORMAT_PROMETHEUS)) {
PrometheusMetricManager metrics = PrometheusMetricManager.getInstance();
metrics.incInferLatency(inferTime, modelName, modelVersion);
metrics.incQueueLatency(timeInQueue, modelName, modelVersion);
}
}
}
Loading

0 comments on commit b89d1ca

Please sign in to comment.