Replies: 4 comments 1 reply
-
I've also had thoughts recently about unhooking our existing Mostly so we can work towards crafting advisories from whole cloth within the system, without a backing external document, and then mint out the externally-bound document when desired. This could maybe allow for advisory -> document 1-to-many relationship. But I'm unsure, as most of the other relationships do come from a document. So maybe still 1-to-1, but allow for 1-to-zero, from advisory to document, for the internally-managed case. |
Beta Was this translation helpful? Give feedback.
-
Referencing external documents would need a way to describe the relationship, for example a container will 'contain' stuff, a component might depend at build time (static linking) or runtime or a source rpm vs a binary rpm ... SPDX v3 has a relationshipType though this seems like a duplicate (without the enums) of relationship which enumerates all the kinds of relationships we would probably ever need. One point of view is that ab/using external document, in SPDX, is an exercise in data transclusion though it also introduces potential for 'yet another thing' that needs to be synced with. Alternately we might investigate data level techniques for achieving transclusion (JSON references/pointers, though I have no experience to give there). Though admittedly that is probably a dead end. Alternately we might just insist that a single SBOM contains everything it needs to describe all the transitive dependencies (of individual components) and dependencies between things like containers -> rpms/rpm. Stepping back is it not the case that PURL is the unifying thread across SBOM's ... where after processing a number of SBOM's that we could build up a tree of dependencies keying off PURL ? I would rather try to 'get there' using relationship then introduce more complexity on the ingestion side - or more I prefer shifting complexity to 'building up the tree' then put that responsibility on SBOM creators. |
Beta Was this translation helpful? Give feedback.
-
to add - cyclonedx has externalReference |
Beta Was this translation helpful? Give feedback.
-
apologies I just realised this discussion is about supporting external documents ... because they exist in common SBOM formats ... we should look at spdx 3 is going on this ... I think from an ingestion pov things are as clear as we decide to make em (and follow where the specs have 'words' describing behaviour). Most of my initial comment is to do with internal model ... which is somewhat different (yet related) to spirit of this discussion. |
Beta Was this translation helpful? Give feedback.
-
This is the start of a discussion around external documents. I want to add this do the SBOM design doc at some point. But want to open it up to a broader discussion before.
Both SPDX and CycloneDX support referencing nodes in external documents. So far we ignore those, but in a recent
discussion, this topic came up. We need to tackle that issue (#533)
anyway.
Spec
For SPDX, external documents are listed in the header of the document. They are defined with:
This basically provides a mapping table from an internal ID to an external (document namespace), plus a safeguard
with the digest.
The document namespace should be unique for each document created. Changes to the document must result in a new
namespace.
The ID string has the format of
DocumentRef-<id>
, where<id>
is some unique identifier.When using in a relationship, it will be combined with a node id:
Document-Ref-<id>:<node-id>
.Implementation
We already have the digest of the target document. We also have the document namespace.
We cannot use the document namespace to locate a document, as it might be a URI, but the spec says:
So there's no guarantee that the URI actually points to the document.
We would need to store the "ID" of the external reference, which is only valid in the context of a single SPDX SBOM.
Resolving packages
When querying today, we return a single hierarchy by default. That shouldn't be a different when adding those external
references.
When resolving transient dependencies, that might be different. One way to deal with this could be to stop resolving
when an external reference is encountered. Similar to "symlinks" on a Unix system. Traditionally, operations recursing
into a directory stop with a symlink, but report the symlink itself. Unless the user requests to follow symlinks.
Foreign keys
Currently, we ingest the relationships with a foreign key in the node IDs. That won't work for the external references,
as that would require ingesting the referenced document first. Also, it would create issues when deleting such
documents.
One way to deal with this would be to create a second relationship table. One that allows one side of the relationship
to be external, not enforcing any foreign key.
That would duplicate things. But it might also be helpful in lookups, or transient resolve operations, where we might
want to opt out of processing external references.
I don't see an alternative other than giving up foreign keys, which I'd like to avoid if possible.
What would be possible is to create additional "non-foreign key" fields for the reference in the same table. However,
that would basically do the same (two tables) but squeeze them into one, and feels quite messy.
Conflicting documents
I am 100% sure that we will encounter the issue that there's a conflict for the combination of document namespace and
digest. Which comes from updating a document, without updating the document namespace. Which is what we do at RH.
I don't think this should become a problem though. Since we would find multiple (possible) targets for the reference,
but could then eliminate others due to the mismatched digest.
Also, we need to be careful to store the original digest of the document, not the one after applying any fixes (like
license expressions).
Beta Was this translation helpful? Give feedback.
All reactions