Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify vers ranges #132

Open
nscuro opened this issue Apr 16, 2024 · 8 comments
Open

Simplify vers ranges #132

nscuro opened this issue Apr 16, 2024 · 8 comments

Comments

@nscuro
Copy link

nscuro commented Apr 16, 2024

The cve_index table currently contains many vers ranges with redundant constraints.

image

As per vers specification:

These pairs of contiguous constraints with these comparators are redundant and invalid (ignoring
any != since they can show up anywhere):

  • =, < or <= followed by < or <=: this is the same as < or <=
  • > or >= followed by =, > or >=: this is the same as > or >=

The algorithm to simplify is documented in the spec: https://github.com/package-url/purl-spec/blob/version-range-spec/VERSION-RANGE-SPEC.rst#version-constraints-simplification

I believe the univers library supports simplification as well.

Looking at some of the constraints, it makes me wonder if perhaps the current representation is a result of attempting to split ranges into pairs of 2 (as demonstrated in the spec)?

In that case, care needs to be taken to not break the meaning of range by erroneously omitting the corresponding lower or upper bounds. Sadly, an algorithm for splitting is not part of the spec, but @sahibamittal implemented one in versatile recently (tests for verification here).

@prabhu
Copy link
Contributor

prabhu commented Apr 16, 2024

@nscuro we had the opposite problem with normalization and implementing a version comparison logic that can work with both normalized and non-normalized versions of vers. In the end, we kept things simple to simply duplicate the constraints, so that each vers is guaranteed to be correct.

@nscuro
Copy link
Author

nscuro commented Apr 16, 2024

Thanks for the clarification.

Is the problem that univers doesn't support all required versioning schemes for the package types that the database pulls in?

Or is it that even those supported by univers are breaking due to garbage data / unconventional versions in the vuln feeds?

Trying to understand the source of the problem, because if vulnerability-db has these issues while building the database, consumers of it will have the same problems eventually. A majority of functionality in vers is bound to the requirement that versions can be compared and sorted.

@prabhu
Copy link
Contributor

prabhu commented Apr 16, 2024

@nscuro I have not tried univers yet. The issue is that some of our sources result in a list of tuples where each tuple is min, max, min_excluding, and max_excluding. We have not come up with a logic to convert such a list of tuples to a single normalized vers where even the list could be unsorted. Other than multiple rows what problem do you foresee with this implementation?

@nscuro
Copy link
Author

nscuro commented Apr 16, 2024

We have not come up with a logic to convert such a list of tuples to a single normalized vers where even the list could be unsorted.

Ideally this would be done by univers. In the simplest case, you should be able to simply "dump" all constraints into a single vers range, and then let univers take care of sorting and simplification. We have a few examples of conversions from GHSA, NVD, and OSV notations here.

Do you have some examples of tuples handy where this would not be feasible?

Other than multiple rows what problem do you foresee with this implementation?

Less a problem, more of an idealistic view that the vulnerability database should provide normalized data instead of having each client perform normalization. After all, it's required to evaluate a range. If the database is already unable to perform this task, the chance of consumers failing to do it is high.

@prabhu
Copy link
Contributor

prabhu commented Apr 16, 2024

@nscuro, different sources could be describing the same component, but offering slightly different versions. It will require database UPDATE calls which will be quite slow, since the indexes are added only at the end of the creation process.

@prabhu
Copy link
Contributor

prabhu commented Apr 17, 2024

@nscuro, Here is an example I came across where normalization is quite difficult since the version ranges across the sources do not agree.

vers-normalize

@nscuro
Copy link
Author

nscuro commented Apr 17, 2024

In that case it might make sense to track the source for each range, so we can attribute wrong ranges to where they came from. And then potentially apply manual corrections for those sources.

How do you handle that data right now in depscan? If you were to consider all rows, you'd be getting quite a few FPs at the moment.

@prabhu
Copy link
Contributor

prabhu commented Apr 17, 2024

@nscuro, yup. This is the idea behind an attribute called matching_vers in search.py. But, of course, it is up to the consuming application to decide on the logic.

depscan currently use the old VDB5 database and then filters the results based on CVE IDs alone. Perhaps, the vulnerabilities array could be used to present the data from all the sources?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants