Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A purl spec for the C/C++ package manager vcpkg #245

Open
wants to merge 18 commits into
base: master
Choose a base branch
from

Conversation

michaelbprice
Copy link

@michaelbprice michaelbprice commented Jul 31, 2023

This would address Issue #217. It is also being tracked on the vcpkg side in this discussion. microsoft/vcpkg#32732

@stevespringett stevespringett added the PURL type definition Non-core definitions that describe and standardize PURL types label Jul 31, 2023
…rameter.

- Renaming registry-version paramater to be consistent.
- Adding description of features parameter.
- Adding a distinct examples section.
@michaelbprice
Copy link
Author

I'm contemplating whether or not we can simplify the spec further and whether it's still missing anything vital.

The principle that the components of the purl string should only be what is necessary to disambiguate different packages from each other runs into a problem of having the same package used in different ways by the same consumer (say, it depends on a static build of the package on one platform and a dynamic build of the package on another platform).

Where I'm headed with this is to distinguish between 2 kinds of data.

  1. Data that identifies a package.
  2. Data that describes how a package is utilized in a particular context.

Data under 1. should be specified in the purl specification and the specification should allow for the flexibility to record data under 2. Properties for that 2. data could be also documented in a subparagraph, but it would be non-normative with regards to the purl spec.

With that approach, I think it means that the existing abi, triplet, and features properties would fall under category 2. And in order to properly distinguish, I think we'd need an indicator whether or not a package is a port or an artifact. That information could be encoded in the namespace field or as a property; I'm leaning towards a property for now.

I'm going to make those changes soon, and want this comment to serve as a record of the thought process behind the decision.

…ditional, unspecified qualifeirs must be tolerated.
@michaelbprice
Copy link
Author

After discussing it some with the vcpkg team, I don't think we need to necessarily distinguish ports from artifacts at this time, so I've made the changes mentioned above, but have not added a package_type qualifier yet (and seems that I likely will not).

@michaelbprice
Copy link
Author

Alright. I'm moving this out of draft status. I'm looking for reviews from stakeholders in addition to maintainers of the purl-spec repository. Pinging @dan-shaw, @ras0219-msft, @jhutchings1, @adriandiglio.

@michaelbprice michaelbprice marked this pull request as ready for review August 18, 2023 22:37
PURL-TYPES.rst Outdated Show resolved Hide resolved
PURL-TYPES.rst Outdated Show resolved Hide resolved
@BillyONeal
Copy link

I think the purl spec is somewhat unclear on what guarantees users are to expect of PURLs that make making a judgement call on whether this represents vcpkg effectively there. Couple of questions:

  • Is it expected that a PURL could in principle be given to a package manager, and attempt to produce the same package?
  • If two differnet PURLs can refer to the same content, is that OK?
  • If the same PURL can refer to totally different content, is that OK?

Examples like conan already here seem to say that different URIs to the same content as well as the same URI potentially identifying different content are OK but that seems to make PURLs almost meaningless.

Given a PURL, what is a user expected to be able to do with it? I look at what SPDX and what GitHub dependencies and similar want, and they want such vastly different things and want different guarantees on what that means.

Potential examples. It isn't clear to me which of these apply. I'm sure I missed some.

  • Uniquely identify the exact content that is executed by an end user at a given time
  • Uniquely identify the exact content that is executed by an end user for all time (e.g. cryptographic SHA)
  • Uniquely identify the source code and/or package recipe that the package manager executes in order to produce something
  • Partially identify the source code and/or package recipe that the package manager executes in order to produce something
  • Identify something enough such that likely software vulnerability information would be applied

@jhutchings1
Copy link
Contributor

I think the purl spec is somewhat unclear on what guarantees users are to expect of PURLs that make making a judgement call on whether this represents vcpkg effectively there.

PURLs should be deterministic. If there are things which could affect what dependency you resolve, they should be available as properties in the PURL schema for a type. The registry is a common example; most types allow you to provide a registry, but have a default option as well.

Reviewing your list, I believe every one of those is a goal. Sometimes purls will have maximum specificity (eg, a runtime might report that it ran a very specific piece of software), and other times, they'll have less (eg, a CVE may specify something like pkg:npm/[email protected] to specify a large range of affected products, at least if the version range spec is added #93 ).

  • Uniquely identify the exact content that is executed by an end user at a given time
  • Uniquely identify the exact content that is executed by an end user for all time (e.g. cryptographic SHA)
  • Uniquely identify the source code and/or package recipe that the package manager executes in order to produce something
  • Partially identify the source code and/or package recipe that the package manager executes in order to produce something
  • Identify something enough such that likely software vulnerability information would be applied

@BillyONeal
Copy link

@jhutchings1 How do we reconcile that with many existing examples that fail most of these tests?

Examples:

pkg:conan/[email protected]

  • Uniquely identify the exact content that is executed by an end user at a given time Nope, not built yet
  • Uniquely identify the exact content that is executed by an end user for all time (e.g. cryptographic SHA) Nope, different registries can say what openssl means is totally different
  • Uniquely identify the source code and/or package recipe that the package manager executes in order to produce something Nope, the recipe can change what happens depending on the machine it runs on
  • Partially identify the source code and/or package recipe that the package manager executes in order to produce something
  • Identify something enough such that likely software vulnerability information would be applied Asterisk: No way of identifying backports

pkg:cargo/[email protected]

  • Uniquely identify the exact content that is executed by an end user at a given time Nope, not built yet
  • Uniquely identify the exact content that is executed by an end user for all time (e.g. cryptographic SHA) Nope, not built yet
  • Uniquely identify the source code and/or package recipe that the package manager executes in order to produce something At least, I think ?
  • Partially identify the source code and/or package recipe that the package manager executes in order to produce something
  • Identify something enough such that likely software vulnerability information would be applied

pkg:nuget/[email protected]

  • Uniquely identify the exact content that is executed by an end user at a given time Nope, depends on target configuration
  • Uniquely identify the exact content that is executed by an end user for all time (e.g. cryptographic SHA)
  • Uniquely identify the source code and/or package recipe that the package manager executes in order to produce something
  • Partially identify the source code and/or package recipe that the package manager executes in order to produce something
  • Identify something enough such that likely software vulnerability information would be applied

@BillyONeal
Copy link

@jhutchings1 (To clarify, I'm trying to make sure vcpkg's support for this is consistent with the spec's design goals but the front matter seems to be missing these details and the examples seem to not be consistent with what design goals are listed there so I don't know how I feel about it)

@jhutchings1
Copy link
Contributor

Producers should provide as much information as they have, but most properties should be optional so that producers like CVE issuers can issue CVEs that target a broader set of packages with just one purl. There's not a one size fits all approach here, so design for flexibility.

test-suite-data.json Outdated Show resolved Hide resolved
@ras0219-msft
Copy link

but most properties should be optional so that producers like CVE issuers can issue CVEs that target a broader set of packages with just one purl. There's not a one size fits all approach here, so design for flexibility.

It sounds like the PURL spec should then be viewed as serving two separate purposes:

  1. As a descriptor for a unique, specific "package"
  2. As a query language over those descriptors, specifically for CVE matching

Solving (2) is much more complicated than identifying individual packages; there's a certain policy decision of applicability. For example, if I get [email protected] from a different registry, has a fix for CVE 100000 been backported to that variant? Is [email protected] expected to be the same project when it comes from different registries?

More realistically, does [email protected]?package_revision=1 suffer from all the same CVEs as [email protected] or was the entire point of the packaging update to apply patches to fix said CVEs? Are these are expected to be tracked in the same way as 1.0.1 vs 1.0.0: every minor packaging revision is a totally unique source version which (upon initial minting) has no CVEs?

How does this currently work for PURLs into Linux distributions -- especially Debian Stable?

@matt-phylum
Copy link
Contributor

CVE matching seems like it is always messy. For PURL I think typically [email protected] is expected to always be [email protected], not zlib1g@1:1.2.11.dfsg-1+deb10u2 or 1:1.2.11.dfsg-2+deb11u2 depending on what version of Debian the software is installed on. CVEs usually are matched using CPEs like cpe:2.3:a:zlib:zlib:1.2.11:*:*:*:*:*:*:*, which works for things that aren't on a package registry or aren't even standalone software products, but even then a CVE scanner can't know that Debian Buster's zlib1g 1:1.2.11.dfsg-1+deb10u2 (pkg:deb/debian/zlib1g@1:1.2.11.dgsg-1+deb10u2?distro=buster) isn't vulnerable to CVE-2022-37434 without consulting Debian's vulnerability database to find that it was patched in that version by DLA-3103-1.

For PURLs into Debian, the spec says you should have a PURL like pkg:deb/debian/zlib1g@1:1.2.11.dfsg-1+deb10u2?distro=buster which refers to a specific file¹. From there, it looks like you need to translate pkg:deb/debian/zlib1g@1:1.2.11.dfsg-1+deb10u2?distro=buster into pkg:deb/debian/zlib@1:1.2.11.dfsg-1+deb10u2?arch=source&distro=buster (the source package has a different name), which is built from zlib 1.2.11. Then you can either just look up what Debian has in the Debian security tracker, or you can use the Debian security tracker data to map zlib to cpe:/a:gnu:zlib, which is a deprecated alias of cpe:2.3:a:zlib:zlib, and then search for vulnerabilities matching cpe:2.3:a:zlib:zlib:1.2.11:*:*:*:*:*:*:*. Searching for the CPE will return a list of vulnerabilities containing both vulnerabilities that have been patched and vulnerabilities that aren't even in the Debian security tracker data yet, so then you would need to overwrite the global CVE information about zlib:1.2.11 with the matching Debian CVE information about zlib:1.2.11.dfsg-1+deb10u2 to get the final list. It's complicated, but it's probably unavoidable when Debian is shipping multiple packages based on its own fork of zlib.

At least I'm pretty sure that's how it works for tools like Trivy and debscan.

For software library packages being incorporated into a product via bundling or static linking, it's much simpler because the packages are (usually) specific, immutable files in specific repositories, so the question of whether that package is vulnerable or not depends on only the package, of which there is only a single instance, published by the package author (ie CVE-2022-37434 is resolved by upgrading to pkg:cargo/[email protected] which contains zlib 1.2.13, not by making a custom 1.1.11 that uses a patched zlib 1.2.12).

¹ I'm not sure this is useful. Does Debian keep every version of every package forever? I know for Alpine this is not the case, so pkg:apk/ is only going to be useful for describing what you have, not what you want.

PURL-TYPES.rst Outdated Show resolved Hide resolved
@aristotelos
Copy link

aristotelos commented Aug 28, 2024

This discussion seems to be a bit stale, but I think it is still very relevant, because at the moment vcpkg does not use any CPE or Package URL in the SBOMs it produces. That prevents it from being used easily for automated analysis.

So I would like to revive it and add my 2 cents:

  1. Specificity is good, but should not prevent us from starting with a minimal purl first. Vcpkg has the notion of overlay ports, overlay triplets, and many more build parameters. But there is a high probability that when a vulnerability is in the original vcpkg registry port, it will also be there in the overlay port and in multiple build situations. So I think a purl like pkg:vcpkg/[email protected] is best for security purposes. This can match against most vulnerabilities found in the upstream libraries.
  2. To be one step more specific, the port file revision, tracking changes in the packaging files but not in the upstream library, should be supplied as well. Note that it is questionable if the packaging has much influence on security - so this more to reliably identify the package for other purposes. For examples of port file revisions, see e.g. https://vcpkg.link/ports/zlib/versions in which the port file revision is the last digit in e.g. v1.2.1.2#2. It can't be added with # because that is against the purl standard, it could be done with a different separator like zlib@2:1.0 for port file revision2 of zlib 1.0, but the best way to ensure multiple port file revisions can match vulnerabilities in the upstream library easily is to add it as a qualifier i.e. [email protected]?port_revision=2. Adding the registry revision (e.g. 143bc76cc7 that is specified as subtree revision for https://vcpkg.link/ports/zlib/v/1.2.12/2) seems to be counterproductive because many different revisions will still have the same port file revision of zlib.
  3. Vcpkg has the option to use another registry or other registries than the default https://github.com/microsoft/vcpkg. However, I think this feature is not used much at the moment, except when using local filesystem overlay ports. I agree that adding it as a qualifier e.g. [email protected]?repository_url=file:///home/user/project/port-overlays/zlib makes most sense.
  4. Other parameters such as specifying the triplet and all other build parameters etc. can be done. For example, Conan has a documented example pkg:conan/openssl.org/[email protected]?arch=x86_64&build_type=Debug&compiler=Visual%20Studio&compiler.runtime=MDd&compiler.version=16&os=Windows&shared=True&rrev=93a82349c31917d2d674d22065c7a9ef9f380c8e&prev=b429db8a0e324114c25ec387bfd8281f330d7c5c.

To sum up, I would argue for a simple purl like pkg:vcpkg/[email protected] for now with the qualifiers port_revision and repository_url, only to be added if the repository_url is not the default https://github.com/microsoft/vcpkg and the port_revision is not the default 0.

aristotelos added a commit to aristotelos/purl-spec that referenced this pull request Sep 17, 2024
Revive package-url#245 by adding the
following changes:

- `registry_url` -> `repository_url`
- `registry_version` -> `port_revision`
- Remove percent escaping from `repository_url` to be more consistent
  with other uses of `repository_url` in the purl spec
aristotelos added a commit to aristotelos/purl-spec that referenced this pull request Sep 17, 2024
Revive package-url#245 by adding the
following changes:

- `registry_url` -> `repository_url`
- `registry_version` -> `port_revision`
- Remove percent escaping from `repository_url` to be more consistent
  with other uses of `repository_url` in the purl spec
aristotelos added a commit to aristotelos/purl-spec that referenced this pull request Sep 17, 2024
Revive package-url#245 by adding the
following changes:

- `registry_url` -> `repository_url`
- `registry_version` -> `port_revision`
- Remove percent escaping from `repository_url` to be more consistent
  with other uses of `repository_url` in the purl spec
aristotelos and others added 4 commits October 10, 2024 09:18
Revive package-url#245 by adding the
following changes:

- `registry_url` -> `repository_url`
- `registry_version` -> `port_revision`
- Remove percent escaping from `repository_url` to be more consistent
  with other uses of `repository_url` in the purl spec
Co-authored-by: Michael B. Price <[email protected]>
Add optional `repository_revision` so that mistakes in `port_revision` and
`version` can be accounted for. Not relevant for filesystem registries
or overlay ports because that gives no further external traceability.

Along with this, describe the filesystem registries and overlay port
cases.
Extend the port overlay or filesystem registry example with port
revision.

Add an example for additional qualifiers.
@michaelbprice
Copy link
Author

michaelbprice commented Oct 15, 2024

@pombredanne - What steps still remain in order to merge this spec for a vcpkg PURL type?

@jkowalleck jkowalleck added Proposed new type and removed PURL type definition Non-core definitions that describe and standardize PURL types labels Oct 17, 2024
@johnmhoran johnmhoran added the type: vcpkg Proposed new type label Oct 19, 2024
@johnmhoran johnmhoran added the Ecma specification Work on the core specification label Nov 5, 2024
@aristotelos
Copy link

@pombredanne @jkowalleck @BillyONeal @stevespringett @jhutchings1 @matt-phylum This PR has been open for in unchanged state for about 2 months already. As not having purls for vcpkg is a security risk, could you please contribute to reviewing and merging this PR?

@ulfllorenz
Copy link

ulfllorenz commented Jan 9, 2025

Update: Fixed some potentially confusing wording.

I would like to add a few thoughts to the comment by @matt-phylum.

For certain communities like Debian, there is a designated security team and process that ultimately allows you to link a specific PURL to vulnerabilities. Correct me if I am wrong, but I do not see this heavyweight approach for the vcpkg community in the foreseeable future. What you can expect to happen then is that aggregators like the Github Advisory Database try to build a somewhat heuristic mapping from a vcpkg PURL to CPEs. This mapping is difficult and messy enough without details such as port revisions. Hence in practice, for vulnerability searches at least, I would expect that only the package name and version are relevant.

Furthermore, similar to Matt's comment, I am not sure how well-suited the PURL concept is for non-binary packages in the first place. With, say Newtonsoft.Json version x.y.z, the situation is simple: There is just one package, it has a specific content, and you do not care how it was built, because you only ever consume the build artifact (neglecting nasty edge cases like build extensions). For a vcpkg port like libcurl or ffmpeg, this concept is questionable; depending on the chosen features, you have completely different software. You can partially overcome this problem by adding (lots of) qualifiers. My only problem here is: What for?

  • The PURL spec itself is not very specific what a PURL is good for. In particular, what you need a unique identifier including the detailed build configuration for.
  • Relevant regulation that I have seen always refers to PURLs as identifiers into vulnerability databases. As mentioned above, too much detail is probably a liability, not a feature.
  • It might be interesting to note that a CPE is actually not an identifier, but a set description language. Which makes sense then: A vulnerability may affect a specific piece of software (small set), while for a vulnerability search, you my query all vulnerabilities in a larger set (e.g., for build variant "*"). PURLs are only identifiers, although it should not be hard to build a set query language on top of them.

I think my personal recommendation from these thoughts would be not to spend too much effort on trying to construct a good unique identifier at the current state of affairs. Instead, reserve the ability to extend the PURL specification, and focus on those attributes that are likely to be relevant.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.