Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add grpc metrics #414

Merged
merged 1 commit into from
Jun 30, 2023
Merged

Add grpc metrics #414

merged 1 commit into from
Jun 30, 2023

Conversation

antgubarev
Copy link

@antgubarev antgubarev commented Apr 28, 2023

fix the issue #648

Copy link
Collaborator

@srenatus srenatus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your contribution.

The code changes looks good to me -- but adding just any sort of test would be good.

internal/internal.go Outdated Show resolved Hide resolved
internal/internal.go Outdated Show resolved Hide resolved
@antgubarev
Copy link
Author

antgubarev commented Apr 28, 2023

Thanks for your contribution.

The code changes looks good to me -- but adding just any sort of test would be good.

@srenatus Done

}
if output.Status.Code != int32(code.Code_OK) {
t.Fatal("Expected request to be allowed but got:", output)
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if the test scaffolding is capable enough -- but could we check that the desired prometheus metric is actually exported, and has a non-zero value after this incoming request?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antgubarev
Copy link
Author

antgubarev commented May 2, 2023

@srenatus @ashutosh-narkar Could you please review PR and my last fixes

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for working on this @antgubarev. Few comments inline. Also these metrics should be part of the status update and decision log. It would be great if we have a test to verify that.

GRPCMaxRecvMsgSize int `json:"grpc-max-recv-msg-size"`
GRPCMaxSendMsgSize int `json:"grpc-max-send-msg-size"`
SkipRequestBodyParse bool `json:"skip-request-body-parse"`
EnablePrometheusMetrics bool `json:"enable-prometheus-metrics"`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Can we call this something like EnablePerformanceMetrics

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. Done

@@ -237,6 +240,15 @@ func (p *envoyExtAuthzGrpcServer) listen() {
addr = "grpc://" + addr
}

if p.cfg.EnablePrometheusMetrics {
summaryAuthzDuration := prometheus.NewSummary(prometheus.SummaryOpts{
Name: "grpc_authz_request_duration_seconds",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please help me understand the reason to use NewSummary v/s something like SummaryVec or HistogramVec? The latter would be useful if we want aggregates afaik. Similar to this.

Nit: Can we change grpc_authz_request_duration_seconds to grpc_request_duration_seconds

Copy link
Author

@antgubarev antgubarev May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have only one value. Vec is for multiple values. It's a single reason. I will change to SummaryVec if you want.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have multiple instances of the plugin and we need to aggregate the results, will we be able to do this?

Also is there any impact from the change in default behavior of Summary as indicated in this comment? It would be helpful if we have more tests in any case.

// A Summary captures individual observations from an event or sample stream and
// summarizes them in a manner similar to traditional summary statistics: 1. sum
// of observations, 2. observation count, 3. rank estimations.
//
// A typical use-case is the observation of request latencies. By default, a
// Summary provides the median, the 90th and the 99th percentile of the latency
// as rank estimations. However, the default behavior will change in the
// upcoming v1.0.0 of the library. There will be no rank estimations at all by
// default. For a sane transition, it is recommended to set the desired rank
// estimations explicitly.
//
// Note that the rank estimations cannot be aggregated in a meaningful way with
// the Prometheus query language (i.e. you cannot average or add them). If you
// need aggregatable quantiles (e.g. you want the 99th percentile latency of all
// queries served across all instances of a service), consider the Histogram
// metric type. See the Prometheus documentation for more details.
//
// To create Summary instances, use NewSummary.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, you're right. I fixed it. Thanks!

@ashutosh-narkar
Copy link
Member

@antgubarev please let us know if you need any help with this. This would be a good one to get in.

@antgubarev
Copy link
Author

@antgubarev please let us know if you need any help with this. This would be a good one to get in.

Thanks! I don't need any help but I need free time ) I'm going to finish it in the next few days.

@antgubarev
Copy link
Author

antgubarev commented May 25, 2023

@antgubarev please let us know if you need any help with this. This would be a good one to get in.

@ashutosh-narkar I can't find a way to test status plugin :( . Could you help me?

@antgubarev
Copy link
Author

antgubarev commented May 31, 2023

@ashutosh-narkar @srenatus Could you give your opinion about this PR

@ashutosh-narkar
Copy link
Member

Thanks for working on this @antgubarev. I'll take a look next week.

grpc-max-recv-msg-size: 40194304 # default: 1024 * 1024 * 4
grpc-max-send-msg-size: 2147483647 # default: max Int
skip-request-body-parse: false # default: false
enable-performance-metrics: false # default: false. Adds `grpc_request_duration_seconds` prometheus histogram metric
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@antgubarev here we are stating that by enabling this we'll get a metric called grpc_request_duration_seconds which is a histogram metric. This looks good. But below we've defined grpc_request_duration_seconds as a prometheus.Summary which does not seem right. I would have imagined something like a prometheus.HistogramVec similar to what OPA does. This would be a more useful metric than a summary as users typically want to understand the latency distribution of their requests. WDYT?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ashutosh-narkar We've already used prometheus.HistogramVec not Summary :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for making the changes @antgubarev. Can you please look into the test failure. Also these metrics should be part of the status update and decision log. It would be great if we have a test to verify that.

@antgubarev
Copy link
Author

antgubarev commented Jun 17, 2023

Thanks for making the changes @antgubarev. Can you please look into the test failure. Also these metrics should be part of the status update and decision log. It would be great if we have a test to verify that.

@ashutosh-narkar I'm sorry it's my fault. I accidentally reverted more commits than I needed. I fixed it!

@ashutosh-narkar
Copy link
Member

@antgubarev thanks so much for working on this. The changes look good. Can you please squash your commits and rebase? Thanks.

Signed-off-by: Anton Gubarev <[email protected]>
@antgubarev
Copy link
Author

@ashutosh-narkar Done. Thanks for your review.

Copy link
Member

@ashutosh-narkar ashutosh-narkar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution @antgubarev!

@ashutosh-narkar ashutosh-narkar merged commit 761f94d into open-policy-agent:main Jun 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants