Aliases, and how they are supposed to be used #888

nscuro · 2022-12-05T11:15:17Z

Hey OSV team, thanks for your great work!

We're currently looking at how we can correlate vulnerabilities that describe the same thing.

As per specification, OSV has the aliases field for this:

The aliases field gives a list of IDs of the same vulnerability in other databases, in the form of the id field.

At least in my interpretation, aliasing is a bidirectional relationship that also applies transitively.
If X aliases Y and Z, Y should also alias X, and Y should also alias Z. If they all describe the same thing, that should be a valid assumption.

However, in reality, we see that many vulnerability databases (ab-)use the OSV schema to publish advisories. In my understanding, a vulnerability would describe one defect, and that one defect only. Whereas an advisory can potentially refer to multiple vulnerabilities (as in "we patched all these vulnerabilities in version 1.2.3 of our package"). This appears to be a common thing for at least the Go, Rust, and (especially) Debian ecosystems in the OSV database. There are most likely more, but these have been the most obvious candidates to us.

For example, GO-2022-0586 presumably aliases four CVEs and four GHSAs:

These are four different vulnerabilities, with different CWEs, descriptions and severities. CVEs and GHSAs actually alias each other in pairs of two (GHSA-28r2-q6m8-9hpx aliases CVE-2022-30323, but not CVE-2022-26945 etc.):

In cases of advisories like this, the "aliases" are neither bidirectional (GHSA-28r2-q6m8-9hpx isn't really the same as GO-2022-0586), nor are they fully transitive (CVE-2022-26945 is not the same as CVE-2022-30323). If one was to attempt to find all aliases for GHSA-28r2-q6m8-9hpx here, traversing this graph would yield wrong results.

The Debian ecosystem especially has many of these scenarios, where one DLA can refer to loads of CVEs:

I have the feeling that OSV entries of type "advisory" (maybe such a distinction would be good to have?) should instead use the related field. Although I imagine this will be hard to enforce, and even harder to apply in an automated fashion.

Am I understanding aliasing in OSV correctly? Is this a data quality issue with the databases that use the OSV schema? Is there anything we can do about it?

The text was updated successfully, but these errors were encountered:

oliverchang · 2022-12-07T05:28:40Z

Hey @nscuro !

Thanks for the very detailed issue!

Can you explain a bit more about the exact use case you're trying to achieve here with the OSV data? Are you trying to build your own graph representation?

For the OSV schema, we actively avoided trying to make an explicit distinction between a group of vulnerabilities (advisory) vs a single vulnerability to keep things simple. In terms of the end result we want to enable, it's the same -- the ability to identify which package versions are affected and which versions to update to.

How we envision a vulnerability scanner working with our data would be this:

Extract a list of packages and versions to query. Say this is just Package "Foo" at Version "1.0.0".
Query OSV and get the list of vulnerability entries that say "Foo" at "1.0.0" is vulnerable.
Use "aliases"/"related" to group them together for presentation, e.g. in a bug filed.
Suggest a fix/resolution such that all the entries in a single group agree.

Under this workflow, it seems to make sense to group all of the related vulnerabilities together, so users have the full context on what all the vulnerability sources say, and updates/remediation steps account for all relevant entries in the same group. The fact that some of these are "advisories" should not matter -- having them be split up would have the same effect.

If this is an issue of semantics and representation, we can certainly ask our data sources to use related instead in the cases where an entry they're exporting consists of multiple other vulnerabilities from a different source, and this is likely more correct. I think this would be a relatively easy ask for our current sources to adopt.

nscuro · 2022-12-08T11:03:56Z

Can you explain a bit more about the exact use case you're trying to achieve here with the OSV data?

Our use case is not primarily about recommending a fixed version to an end user ("updating to version X will resolve all these issues"), it's more about tracking risk, and making it transparent. So knowing which vulnerabilities are the same and which are not does matter to us.

We also have a VEX-like use case, where users (or machines) evaluate whether a project is actually affected by a vulnerability, and record their decision. Obviously we want to avoid redundant work being done, a decision should not have to be recorded for GHSA-28r2-q6m8-9hpx and CVE-2022-30323 separately, as they describe the same thing. On the other hand, we don't want the same decision being applied to different vulnerabilities (CVE-2022-30323 vs. CVE-2022-26945), because the exposure, attack vector, impact etc. may differ.

Approaching this use case the other way around, if a vendor provided a VEX document stating that their product is not affected by CVE-2022-30323, this should also be applicable to actual aliases like GHSA-28r2-q6m8-9hpx, but not CVE-2022-26945.

If this is an issue of semantics and representation, we can certainly ask our data sources to use related instead in the cases where an entry they're exporting consists of multiple other vulnerabilities from a different source, and this is likely more correct.

That'd be great!

oliverchang · 2022-12-09T02:01:30Z

Can you explain a bit more about the exact use case you're trying to achieve here with the OSV data?

Our use case is not primarily about recommending a fixed version to an end user ("updating to version X will resolve all these issues"), it's more about tracking risk, and making it transparent. So knowing which vulnerabilities are the same and which are not does matter to us.

We also have a VEX-like use case, where users (or machines) evaluate whether a project is actually affected by a vulnerability, and record their decision. Obviously we want to avoid redundant work being done, a decision should not have to be recorded for GHSA-28r2-q6m8-9hpx and CVE-2022-30323 separately, as they describe the same thing. On the other hand, we don't want the same decision being applied to different vulnerabilities (CVE-2022-30323 vs. CVE-2022-26945), because the exposure, attack vector, impact etc. may differ.

Got it, thanks for explaning! Are you thinking of recording VEX on a per package basis, such that users can transitively determine from the entire dependency graph if they're actually indirectly affected by a vulnerability?

Approaching this use case the other way around, if a vendor provided a VEX document stating that their product is not affected by CVE-2022-30323, this should also be applicable to actual aliases like GHSA-28r2-q6m8-9hpx, but not CVE-2022-26945.

If this is an issue of semantics and representation, we can certainly ask our data sources to use related instead in the cases where an entry they're exporting consists of multiple other vulnerabilities from a different source, and this is likely more correct.

That'd be great!

We'll start conversations here with Go, and fix up the Debian ones.

github-actions · 2024-07-27T18:06:22Z

This issue has not had any activity for 60 days and will be automatically closed in two weeks

nscuro · 2024-07-27T19:27:10Z

Commenting to signal that this issue is still relevant.

I am enlightened however to see there is a continuous effort to improve the situation :)

oliverchang · 2024-07-28T22:58:56Z

Commenting to signal that this issue is still relevant.

I am enlightened however to see there is a continuous effort to improve the situation :)

Thanks! removed the stale tags.

oliverchang added the data quality Issues with data quality label Dec 7, 2022

another-rex mentioned this issue Dec 8, 2022

JSON output that groups aliases google/osv-scanner#29

Closed

nscuro mentioned this issue Apr 13, 2023

Allow for vulnerability alias synchronization to be disabled for each source that supports it DependencyTrack/dependency-track#2670

Merged

3 tasks

another-rex mentioned this issue May 16, 2023

Solution for de-duplicating advisories with shared aliases #1293

Closed

andrewpollock mentioned this issue May 14, 2024

Change aliases for debian to related instead #1381

Merged

michaelkedar mentioned this issue Jul 9, 2024

Always return the upstream aliases when no alias groups are generated #2374

Open

github-actions bot added the stale The issue or PR is stale and pending automated closure label Jul 27, 2024

oliverchang added backlog Important but currently unprioritized and removed stale The issue or PR is stale and pending automated closure labels Jul 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aliases, and how they are supposed to be used #888

Aliases, and how they are supposed to be used #888

nscuro commented Dec 5, 2022 •

edited

Loading

oliverchang commented Dec 7, 2022

nscuro commented Dec 8, 2022

oliverchang commented Dec 9, 2022 •

edited

Loading

github-actions bot commented Jul 27, 2024

nscuro commented Jul 27, 2024 •

edited

Loading

oliverchang commented Jul 28, 2024

Aliases, and how they are supposed to be used #888

Aliases, and how they are supposed to be used #888

Comments

nscuro commented Dec 5, 2022 • edited Loading

oliverchang commented Dec 7, 2022

nscuro commented Dec 8, 2022

oliverchang commented Dec 9, 2022 • edited Loading

github-actions bot commented Jul 27, 2024

nscuro commented Jul 27, 2024 • edited Loading

oliverchang commented Jul 28, 2024

nscuro commented Dec 5, 2022 •

edited

Loading

oliverchang commented Dec 9, 2022 •

edited

Loading

nscuro commented Jul 27, 2024 •

edited

Loading