Replies: 6 comments 8 replies
-
Thanks for this. First, I'm unclear on what It sounds like SBOMs aggregate/reference a multitude of $things, which may be addressed by various names. Perhaps if we take @JimFuller-RedHat's thought of calling those $things Given that a pURL is a "package URL", it feels like "package" may indeed be "things you can assign a pURL to". Likewise, CPEs are names of $things, but tend to name "products" for want of a better word. Like pURLs, CPEs can be roughly ambiguous, in that they are based around pattern-matching. While every product identifiable by a CPE should have a canonical CPE, an arbitrary CPE (with possible wildcards) may not canonically point to a single product. So, thinking, in human prose (not DB DDL)...
That being said, I think you're 100% right in that our current I also agree that those tables could/should be better named, such as Likewise, we have a Jumping up from human prose to APIs, connecting through the DB DDL... A human wants to find packages and products, and just so happens to need to use a pURL or CPE to communicate their desires. If we stick the the idea that a "package" is "anything addressible using a pURL", then Likewise, a human may want to understand things about a product (RHEL, RHEL8.2, RHEL8.2 on Sparc), and may end up using a CPE to do so. Ergo, still So if we separate the human-facing prose-centric desires from the DB DDL implementation details, I think my proposal is...
Another analogy would be the use-case of "I'd like to call Jim on the telephone". You certainly dial his phone number (the key used to indicate your desires), but you're not talking to the phone number. You're talking to Jim on the other end of the connection.
|
Beta Was this translation helpful? Give feedback.
-
wrt THIS_PACKAGE... Perhaps this is ... another set of tables? Just like a Or possible 1+ pURLs/CPEs, depending? Keep the distinction between $things and $one_of_possibly_many_names_for_a_thing. |
Beta Was this translation helpful? Give feedback.
-
A few (parachiol and maybe obvious) random comments of a DBA nature ... please dont let any of these comments put us off what we have right now which I think is right and good ... offering more as inspiration: In the beginning, the Oracle said there shall be a logical model and a conceptual model ... the conceptual model is for humans (or other machines) , the logical model is for system machine (and for the humans who manage this machine) ... we might consider the REST API the conceptual model ... naming things in the logical model is for developer team consumption... naming things in the conceptual model is for consumer consumption... a lot of times developers try to make logical model 1:1 with conceptual model (as well as the interchange format) which has productivity gains in the short term, then the passage of time reveals its hard to keep data (and nomenclature) in sync everywhere. SBOMs are an interchange format ... if we choose to represent SBOMs in the logical model that is fine ... but interchange format will change over time and 'pouring concrete' on any specific notion of interchange might cause churn later on ... one could argue that SBOM packages are just 'packages' regardless of its membership in a SBOM... it is common for a package to exist in many SBOMs eg. that relationship can be resolved by consulting the SBOM and the package is blissfully unaware. Of course the details matter for performance if one wants to enumerate all the SBOM a particular package exists in. Maybe corgi got it wrong by not making SBOM's central to the logical model ... but rather we used builds which was a surrogate sbom (and also directly referred to advisories). A build coming from a build system we own has some predictability .. sboms are distributed and coming from everywhere. Product is just another container = a set of packages ... though Product also has a hierarchy in terms of release version which has to be catered for (this translates to queries one might want to perform like 'does curl exist in Ansible ProductVersion, ProductStream, et). Its unclear how that will be represented in trustify ... in corgi we had a product taxonomy (graph) and normal product entity tables (product, version, stream, channel). PURL is just a unique id ... we really want a PURL to point to a single 'bag of bits' (in spacetime). It simplifies everything. Reality dictates that probably need to have purl alias or more purls, etc, and associated logical machinery to manage all the complexity but a component really should have one identity (and a bunch of additional labels, with some of those labels being purls). In corgi we choose the purl to be used internally with sbom and product container because it meant whatever change in the system (uuids for example) the relationship would continue to make sense internally. A package could be a set of components and/or just a single component - we did not have this convention in corgi ... as we derived via depends child relationship - maybe we should have ... and in fact I believe we had a ticket to do just that at some point. For me, the challenge is that we need to know what questions are going to be asked of the conceptual model to some level of detail to get a stable logical model in place that has the performance characteristics to answer such questions, efficiently, in a reasonable amount of time - otherwise bolting on things after the logical model has calcified can be painful. For inspiration - here are some questions prodsec wanted to answer with component registry (with OSIDB) ... notice that none of them asked about a specific SBOM:
then the following (with osidb, but I think trustify gets from vex):
|
Beta Was this translation helpful? Give feedback.
-
Part of our ambiguity in pURLs, and our table layout is to support "pointing to more than a single bag of bits". An advisory says Ultimately, to answer the questions you posited above. |
Beta Was this translation helpful? Give feedback.
-
I'd wish there would be a mindmap mode for discussion. Branching off individual topics :) So I'll start slow and try to separate this: The idea of THIS_PACKAGE was to have (without having a proper name) a counterpart to an "SBOM package" but on a global/universal level. An SBOM package is a resource/thing owned by an SBOM. A global package (this_package) is a package which exists outside of any SBOM. There is a reference from an SBOM package to that THIS_PACKAGE. Probably more than one SBOM package points to that global THIS_PACKAGE. Then again, that might just be a virtual thing, as the same can be achieved by doing the lookups we do today. |
Beta Was this translation helpful? Give feedback.
-
I think I mostly agree with your initial comment on this @bobmcwhirter … I am not sure if "component" is a better pick for a name, because it feels like a term that is equally ambiguous and overloaded. Aside from that, SPDX uses "package", CDX uses component. But mostly mean the same thing. What might make sense, is to come up with a mapping glossary: Trustify / SPDX, Trustify / CDX, Trustify / Klingon. And as you said, the endpoints still deal with "packages", that's why I think it makes sense keeping that prefix. But part of this interaction is based on PURLs. So it might make sense to have that And it might be, that those endpoints go to the same service functions internally, just with different enums or ID types. But I think having a I am not sure how to name things internally. I think what would help a lot is to add code comments, to explain what the idea of a function or structure is. Renaming all struct and functions might be overkill. Renaming the database tables is the right thing to do IMO. |
Beta Was this translation helpful? Give feedback.
-
I am sorry for re-iterating over this. But maybe we can find a better way to name and do things. So I'll try to take a step back, maybe it needs a bit more than just renaming. Maybe not.
Assuming the intention is to ingest all kinds of stuff, and then collect/aggregate that data into a model that grows, the more we ingest.
Right now, we have the following tables:
However, these tables only store fragments of PURLs. In a way, that we can easily reference stuff inside the
database. Better names would IMO be:
This set of information grows with each SBOM we ingest, because we extract PURLs from SBOMs. We also extract PURLs from other sources. But let's ignore that for now.
A simplified view on the SBOMs looks like this:
So SBOMs contain packages (SBOM packages) and may (or may not) declare an alternative name for their packages.
We can browse through SBOM packages, get them by ID. Get relationships between them. All without ever touching PURLs.
In some cases (for RH data for most cases), these have PURLs attached. Which we can use to reference with other
documents that use PURLs.
Going through the conversations again, I think we might actually miss another "package".
THIS_PACKAGE is independent of an SBOM. And collects (grows) with each package that gets ingested into the system.
The question to me is: how do we identify this package?
By name (from the SBOM) won't work. By hash of an artifact? Ingesting a new SBOM package, how would we now to
which THIS_PACKAGE it would need to contribute its information to?
And if we move references (like purls and CPEs) from the SBOM package to THIS_PACKAGE, how would know what the
SBOM contributed? How the SBOM named that package (aside from the SBOM package name)?
On the other side, do we really need to store THIS_PACKAGE in the database? All the information is there via the SBOM packages anyway. Using SBOM packages also doesn't cause the issue of not knowing where it came from or finding an identifier that would be required to aggregate information.
Maybe THIS_PACKAGE is just a virtual construct, returned by some APIs, based on the PURL tables and the SBOM packages?
We could call that "package". Maybe there's a better name for that too?
Beta Was this translation helpful? Give feedback.
All reactions