Prometheus metrics about amount of security issues #425

wuestkamp · 2021-03-11T18:20:41Z

wuestkamp
Mar 11, 2021

Awesome project!
is there a way to get the summaries from the CRDs like this one:

  Summary:
    Critical Count:  3
    High Count:      7
    Low Count:       2
    Medium Count:    14
    None Count:      0
    Unknown Count:   0

into Prometheus? I guess I could write a custom app which reads the CRD reports and then converts these into prometheus metrics. Or is there maybe already a general project like that?
Because the operator metrics on 8080/metrics don't include info like that.

danielpacak · 2021-03-11T20:23:56Z

danielpacak
Mar 11, 2021

Great question @wuestkamp We do not provide such metrics yet, but we'd like to do so in Starboard. As you relized Starboard has already the /metrics endpoint exposed but those metrics are related to Go runtime and work queues (shared informers) processing.

Code has to be written to expose scan results that we keep in CRDs and/or shared informers caches in Prometheus format.

There are tools that can expose any CRD on a regular schedule as Prometheus metrics, which is another option to consider. We also saw a project that runs just Trivy (not like Starboard other types of scanners) and exposes vulnerability summaries as Prometheus metrics.

However, with Starboard we want to take holistic approach to K8s security and I believe that exposing Prometheus is a great functionality to build in.

2 replies

skuethe Apr 28, 2022

Are there any new thoughts on implementing this? I do now of the giantswarm/starboard-exporter solution, but having this functionality build-in would definitely have it's benefits.

NissesSenap May 3, 2022

@skuethe this was mentioned under the trivy-operator discussion #1173.

NissesSenap · 2021-09-28T16:23:41Z

NissesSenap
Sep 28, 2021

First of all I think starboard is a great project but for me and my company this is one of the key features that is missing. We are today managing about 20 k8s clusters and the number will grow. Today we are using metrics as a way to visualize any issues that we might have in our clusters and I want that to include CVE:s.

Want to be able to query prometheus with a CVE and see which pod is having a issue
Want to be able to query prometheus with a severity="CRITICAL" for a namespace and be able to easy give that report to my developers.

Taking inspiration from: https://github.com/kaidotdev/kube-trivy-exporter the project provide:

trivy_vulnerabilities{image="gcr.io/spinnaker-marketplace/echo:2.5.1-20190612034009",installedVersion="0.168-1",pkgName="libelf1",severity="MEDIUM",vulnerabilityId="CVE-2018-16403"} 1

Since we also have the replicaset information already I think also should add that as a metric to easily pinpoint where we have the issue. I guess we can use the job label for that.

trivy_vulnerabilities{image="gcr.io/spinnaker-marketplace/echo:2.5.1-20190612034009",installedVersion="0.168-1",pkgName="libelf1",severity="MEDIUM",vulnerabilityId="CVE-2018-16403",job="replica-spinnarker1"} 1

I have done a test implementation of this when the job creates a new report, that works fine but it's just for new reports and all the metrics would be gone if the operator restarts.
To me the best way forward seems to create a new controller that actually looks at the vulnerabilityreport CR:s. Then we should be able to update the metrics depending of a creation/delete of a report without to much trouble.

This will also help us when solving: #537, instead of creating a cronjob to delete the existing vulnerabilityreport CR:s once a night we could add a TTL config to starboard and we should be able to use the same controller created for metrics to delete existing reports and thus triggering new ones.

What do you think @danielpacak @wuestkamp

0 replies

NissesSenap · 2021-09-28T16:36:34Z

NissesSenap
Sep 28, 2021

This is partly related to #563 but includes metrics for all reports that gets generated. Personally I feel the need for it mostly in for vulnerability's and I think it should be possible to create separate PR:s to implement or probably even preferred but it's a good thing to keep in mind when deciding on the naming convention for the metrics that starboard would expose.

0 replies

danielpacak · 2021-10-06T19:43:00Z

danielpacak
Oct 6, 2021

I really like the idea of having a separate controller to discover VulnerabilityReports and manage Prometheus metrics. For the schema and PromQL we can get started with the proposal based on your experience managing clusters in production. Just bear in mind that Trivy is a plugin and you could have other scanners so we should use more generic names for exported metrics / labels.

This new controlled should be disabled by default with an option to turn it on. Similar to OPERATOR_CIS_KUBERNETES_BENCHMARK_ENABLED env used to enable / disable infrastructure scanning.

0 replies

fredgate · 2021-10-19T16:22:49Z

fredgate
Oct 19, 2021

Starboard is awesome, but if each analysis stay in its report object in the cluster, it is not very practical.
A controller that aggregates data for each report kind and exposes them as metrics seems a very good idea. Here are my two cents to move the discussion forward.
For example for the vulnerabilityreports, we could have these metrics :

image_vulnerabilities_count { repository="myimage", tag="1.2.5", severity="critical" } 2
image_vulnerabilities_count { repository="myimage", tag="1.2.5", severity="high" } 1
image_vulnerabilities_count { repository="myimage", tag="1.2.5", severity="medium" } 4
image_vulnerabilities_count { repository="myimage", tag="1.2.5", severity="low" } 23
image_vulnerabilities_count { repository="myimage", tag="1.2.5", severity="unknown" } 2

I am not sure that metrics should expose information about each detected vulnerabilties, like that :

image_vulnerabilities { repository="myimage", tag="1.2.5", vulnerability="CVE-2021-37750" } 1

Not only what should we set as value for the metric. But also because this could result in a large number of values for the label vulnerability of the metric. And for good performance with prometheus, the cardinality is the key.

0 replies

mycodeself · 2021-11-12T17:13:45Z

mycodeself
Nov 12, 2021

Hi guys, I've been testing starboard on our clusters for a while now and I really think you are doing a great job!

I'm really interested in this feature and in contributing to the project, so I have made a small POC with a controller that exposes as prometheus metrics the summary of the vulnerabilities found.

POC code https://github.com/mycodeself/starboard-exporter/blob/main/controllers/vulnerabilityreport_controller.go

I've tested it on our cluster with around ~1500 vulnerabilityreport and the exposed metrics are as follows:

vulnerabilityreport_vulnerabilities_count{name="daemonset-falco-exporter-falco-exporter",namespace="falco",repository="falcosecurity/falco-exporter",severity="CRITICAL",tag="0.5.0"} 3
vulnerabilityreport_vulnerabilities_count{name="daemonset-falco-exporter-falco-exporter",namespace="falco",repository="falcosecurity/falco-exporter",severity="HIGH",tag="0.5.0"} 9
vulnerabilityreport_vulnerabilities_count{name="daemonset-falco-exporter-falco-exporter",namespace="falco",repository="falcosecurity/falco-exporter",severity="LOW",tag="0.5.0"} 2
vulnerabilityreport_vulnerabilities_count{name="daemonset-falco-exporter-falco-exporter",namespace="falco",repository="falcosecurity/falco-exporter",severity="MEDIUM",tag="0.5.0"} 4
vulnerabilityreport_vulnerabilities_count{name="daemonset-falco-exporter-falco-exporter",namespace="falco",repository="falcosecurity/falco-exporter",severity="UNKNOWN",tag="0.5.0"} 0

As @fredgate says I think that exposing metrics with the details of vulnerabilities may cause some problems due to high cardinality.

I would like you to take a look at the code and tell me if it is more or less the idea you had in mind. I'm open to receive any feedback and make the necessary changes. Would you accept a PR with a similar implementation?

1 reply

stone-z Nov 17, 2021

Looks like we had the same thought @mycodeself 😄

We also would like to get prometheus metrics from Starboard, so have also put together an exporter for that purpose.

Ours currently exposes both the summary and individual CVE metrics (the high cardinality option) because we will likely need to expose that information, but we're open to making that configurable if others would like to use it differently. If the cardinality ends up being a problem we will revisit.

Edit: Exported labels (thus cardinality) is now configurable ✅

fredgate · 2021-11-25T14:01:46Z

fredgate
Nov 25, 2021

Very interesting projects 👍
@mycodeself could you provide a Docker image and Helm chart ?

0 replies

Madhuri2801 · 2022-08-01T08:18:55Z

Madhuri2801
Aug 1, 2022

Hi All , I need to export Trivy scanned reports in Kubernetes cluster to grafana to visualize using Prometheus as a data source. I have installed Kube-trivy exporter but it is not taking the cluster IP/ports after applying Manifests.

Can anyone recommend other options for my requirement.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prometheus metrics about amount of security issues #425

{{title}}

Replies: 8 comments 3 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Prometheus metrics about amount of security issues #425

Replies: 8 comments · 3 replies

Replies: 8 comments 3 replies