Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Prometheus] Add unit tests for parsing metric families #36669

Merged
merged 10 commits into from
Oct 3, 2023
Merged

[Prometheus] Add unit tests for parsing metric families #36669

merged 10 commits into from
Oct 3, 2023

Conversation

constanca-m
Copy link
Contributor

@constanca-m constanca-m commented Sep 25, 2023

What does this PR do?

The previous naming we were using for metrics, especially for open metrics, had a few problems. Since there were no unit tests checking the way we were parsing the metrics, we did not see this before. So this PR does 2 things:

  • It solidifies the way we name our metrics family (see next section)
  • Adds the unit tests to make sure everything works as expected

The specification for OpenMetrics can be found here.

Naming: metric family name and metrics names

For OpenMetric Parser (see suffixes here):

  • counter:
    • If the name of the metric sample does not include _total or _created, then the metric is ignored. These are the only possible options.
    • If the name of the metric sample includes a suffix, we do one of the two:
      • The metric FAMILY name is the name without the suffix IF it exists in the families dictionary already (we know this way we received metadata for that)
      • The metric family name is the same as the metric name with the suffix otherwise.
  • gauge: there are no rules here, no suffixes are necessary
  • histogram: we check for the acceptable suffixes
  • gaugehistogram: same as histogram
  • summary: same logic as for histogram
  • info: we apply the same reasoning as we did for counter
  • stateset: no rules for the suffixes
  • unknown: no rules for the suffixes

For Prometheus Parser:

  • counter: we do not remove the suffix, unlike with OpenMetrics
  • gauge
  • histogram: same logic as the above
  • summary: same logic as the above

Please check the textparse_test file to see clear examples of this.

Related issues

Relates to #36537.

@constanca-m constanca-m requested a review from a team September 25, 2023 11:06
@constanca-m constanca-m requested a review from a team as a code owner September 25, 2023 11:06
@constanca-m constanca-m self-assigned this Sep 25, 2023
@botelastic botelastic bot added the needs_team Indicates that the issue/PR needs a Team:* label label Sep 25, 2023
@mergify
Copy link
Contributor

mergify bot commented Sep 25, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @constanca-m? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

@@ -351,7 +352,7 @@ func isBucket(name string) bool {
return strings.HasSuffix(name, suffixBucket)
}

func summaryMetricName(name string, s float64, qv string, lbls string, t *int64, summariesByName map[string]map[string]*OpenMetric) (string, *OpenMetric) {
func summaryMetricName(name string, s float64, qv string, lbls string, summariesByName map[string]map[string]*OpenMetric) (string, *OpenMetric) {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Parameter t wasn't being used, that is why it is removed.

@@ -653,8 +658,6 @@ func ParseMetricFamilies(b []byte, contentType string, ts time.Time) ([]*MetricF
metricFamiliesByName[lookupMetricName] = fam
}

fam.Name = &metricName
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line needed to be removed, because it was causing errors.

Example: if we had this metric:

# HELP cc_seconds A counter
# TYPE cc_seconds counter
# UNIT cc_seconds seconds
cc_seconds_total 1.0
cc_seconds_created 123.456

The metric would be renamed to cc_seconds_total and after to cc_seconds_created. When in reality, the metric name is cc_seconds (without the suffix). And then the metric.Metric is a list of metrics, where both are included.
This test case is in the test file.

@constanca-m constanca-m added the Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team label Sep 25, 2023
@botelastic botelastic bot removed the needs_team Indicates that the issue/PR needs a Team:* label label Sep 25, 2023
@elasticmachine
Copy link
Collaborator

elasticmachine commented Sep 25, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-10-03T08:14:51.488+0000

  • Duration: 51 min 35 sec

Test stats 🧪

Test Results
Failed 0
Passed 4411
Skipped 902
Total 5313

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages and run the E2E tests.

  • /beats-tester : Run the installation tests with beats-tester.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

Copy link
Contributor

@belimawr belimawr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm far from being an expert in this bit of the codebase, but LGTM

@constanca-m
Copy link
Contributor Author

I will close this PR for now while we resolve the naming conflicts.

@constanca-m constanca-m reopened this Sep 29, 2023
@mergify
Copy link
Contributor

mergify bot commented Sep 29, 2023

This pull request does not have a backport label.
If this is a bug or security fix, could you label this PR @constanca-m? 🙏.
For such, you'll need to label your PR with:

  • The upcoming major version of the Elastic Stack
  • The upcoming minor version of the Elastic Stack (if you're not pushing a breaking change)

To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v8./d.0 is the label to automatically backport to the 8./d branch. /d is the digit

// Remove the two possible suffixes, _created and _total
if isTotal(metricName) {
lookupMetricName = strings.TrimSuffix(metricName, suffixTotal)
} else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
} else {
} isCreated(metricName) {

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because of this condition above if contentType == OpenMetricsType && !isTotal(metricName) && !isCreated(metricName) we already know that if we are in the else, the suffix is _created @gizas

@constanca-m constanca-m requested a review from a team as a code owner September 29, 2023 13:05
@@ -97,8 +97,8 @@ metrics_one_count_total{name="john",surname="williams"} 2
metrics_one_count_total{name="jahn",surname="baldwin",age="30"} 3
`

openMetricsCounterKeyLabelWithNaNInf = `# TYPE metrics_one_count_errors counter
metrics_one_count_errors{name="jane",surname="foster"} 1
openMetricsCounterKeyLabelWithNaNInf = `# TYPE metrics_one_count_errors_total counter
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to this, counter types MUST have the suffix _total. I had to fix this to pass the test.

@constanca-m
Copy link
Contributor Author

The changes on OpenMetrics sample files are the result from running go -test data. It was causing the tests to fail, I don't know why it was only picked up on this PR.

# TYPE metric_info info
metric_info 2
# TYPE metric_without_suffix info
metric_without_suffix 3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is dropped right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

require.ElementsMatch(t, expected, result)
}

func TestHistogramPrometheus(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dummy question: Why we need both this test and TestHistogramOpenMetrics?
When this will be called?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For open metrics, we use a parser that is different than the one we use for prometheus metrics. That is decided by the content we give to the parse metrics function. So we are just checking the expected results for the two possible parsers: the PromParser and the OpenMetricsParser.

Copy link
Contributor

@gizas gizas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Constanca make sure that the link https://github.com/OpenObservability/OpenMetrics/blob/main/specification/OpenMetrics.md

is added in the description of this PR, to have it handy for future reference

Copy link
Member

@ChrsMark ChrsMark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes lgtm.

However since this changes significantly the core functionality I would feel more confident if we had a manual testing phase in place as a double check.

@constanca-m constanca-m merged commit 25624ab into elastic:main Oct 3, 2023
26 checks passed
@constanca-m constanca-m deleted the add-prometheus-tests branch October 3, 2023 09:30
Scholar-Li pushed a commit to Scholar-Li/beats that referenced this pull request Feb 5, 2024
* Add tests for parsing metrics.

* Remove _info suffix.

* Add unit tests

* Add unit tests

* Add unit tests

* Add license header

* Fix goimports.

* Fix openmetrics tests.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Team:Cloudnative-Monitoring Label for the Cloud Native Monitoring team :Testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants