-
Notifications
You must be signed in to change notification settings - Fork 84
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rpm: consult dnf database for repository information #869
base: main
Are you sure you want to change the base?
Conversation
5acce04
to
5a9d76e
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #869 +/- ##
==========================================
- Coverage 55.39% 55.23% -0.17%
==========================================
Files 282 283 +1
Lines 17836 17923 +87
==========================================
+ Hits 9881 9899 +18
- Misses 6923 6990 +67
- Partials 1032 1034 +2 ☔ View full report in Codecov by Sentry. |
rpm/dnf.go
Outdated
if fi, err := fs.Stat(sys, `root/buildinfo/content_manifests`); errors.Is(err, nil) && fi.IsDir() { | ||
// This is a RHEL layer, skip. | ||
return nil, nil | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Update this to check the files for a marker value and continue if found.
Should look for from_dnf_hint
key in the yaml files.
See also: containerbuildsystem/atomic-reactor#2140
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Punting on this unless we determine it's needed.
I created rpm-software-management/rpm#3505 related to this |
10b07ff
to
e59f165
Compare
Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
Using these should give us a nice "todo" list when the time comes. Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
This adds two helpers: one to discover repoids, and one to annotate a stream of Packages. Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
Signed-off-by: Hank Donnay <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
made it through most of the files, but I'll need to come back to this another time. Would you mind making more commits and then squashing things at the end so it's easier to see the diffs after each round? Thanks
// outside of the Red Hat build system's builder's context. See [CLAIRDEV-45] | ||
// for more information. | ||
// | ||
// [CLAIRDEV-45]: https://issues.redhat.com/browse/CLAIRDEV-45 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This ticket specifically calls out OSBS. What about Konflux? Should we update the ticket to mention Konflux, or is this not true for Konflux?
internal/dnf/dnf.go
Outdated
"github.com/quay/claircore/internal/rpm" | ||
"github.com/quay/zlog" | ||
_ "modernc.org/sqlite" // register the sqlite driver |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just sanity checking here
"github.com/quay/claircore/internal/rpm"
is in the same module, so I expected to see this in the last block
internal/dnf/dnf.go
Outdated
Wrap(context.Context, iter.Seq[claircore.Package]) (iter.Seq[claircore.Package], func() error) | ||
} | ||
|
||
// Identity is an [Annotator] who's [Annotator.Wrap] method does nothing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: whose
var Identity Annotator | ||
|
||
func init() { | ||
Identity = ident{} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious why not just var Identity = ident{}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the idea here is the variable can't be (accidentally or deliberately) reassigned
internal/dnf/dnf.go
Outdated
return nil | ||
} | ||
|
||
// NewAnnotator holds book-keeping for producing multiple independent mapping |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is where the convention of doing something like NewAnnotator
instead of newAnnotator
for the godoc confuses me because now we have two of these
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks good, but I don't know the tables well enough to verify this is definitely the right query
continue | ||
default: | ||
final = fmt.Errorf("internal/dnf: database error: %w", err) | ||
return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still getting used to iterators in Go. It's ok to have a path where we don't yield?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the consumer will just see that at the end of the iteration (in this case there's also a consumable error so the reason for it should be known.
switch n { | ||
case `Packages`: | ||
db.kind = kindBDB | ||
if ok, err := testpath(p, bdb.CheckMagic); !ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirming we want to ignore true, err != nil
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, testpath
could return that if fs.File.Close()
fails, I think that case should be accounted for.
blobs, dbErr := db.All(ctx) | ||
seq, parseErr := parseBlob(ctx, blobs) | ||
defer func() { | ||
final = errors.Join(err, parseErr(), dbErr(), db.Close()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just confirming we want to close the DB here?
// | ||
// [yara]: https://github.com/VirusTotal/yara | ||
pat := []string{ | ||
`^.*/[^/]+\.jar$`, // Jar files |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this is copied over, but I'm wondering if we should extend this to the other file extensions we support like WAR, EAR, etc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but maybe let's address that in #1473
for _, db := range found { | ||
ctx := zlog.ContextWithValues(ctx, "db", db.String()) | ||
zlog.Debug(ctx).Msg("examining database") | ||
if _, ok := done[db.Path]; ok { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
confirming this is no longer needed because of the slices.CompactFunc
in rpm.FindDBs
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's what I read it as too, the slices.SortFunc
is putting the found
in order (so dbs with the same path are adjacent), then slices.CompactFunc
is compacting them together, so by the time FindDBs returns the dupes should be taken care of
switch { | ||
case a.kind < b.kind: | ||
cmp = -1 | ||
case a.kind > b.kind: | ||
cmp = +1 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why bother checking this if we only care about the path later? Also not sure if it's possible for paths to be the same but different kind at the moment anyway
|
||
seq, checkPkgs := db.Packages(ctx) | ||
seq, checkDNF := a.Wrap(ctx, seq) | ||
wart.AsPointer(seq)(yield) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was kind of a 🤯 for me, and I had to play with it in Go Playground to understand this: https://go.dev/play/p/j2-vMOOmrGw
/* | ||
for _, pkg := range got { | ||
t.Logf("%s: %#q", pkg.Name, pkg.RepositoryHint) | ||
} | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still desired or to delete?
func (*Scanner) Name() string { return pkgName } | ||
|
||
// Version implements scanner.VersionedScanner. | ||
// Version implements [indexer.VersionedScanner]. | ||
func (*Scanner) Version() string { return pkgVersion } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming: even though the code is very different, we aren't expecting different results, right?
ctx, done := context.WithTimeout(ctx, r.cfg.Timeout) | ||
defer done() | ||
|
||
// !!! This ends up being a weird nonlocal return. !!! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what does this mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's saying that any error returned from the mapContainerAPI
is not evaluated or returned until outside of this function body L#259
} | ||
} | ||
|
||
if r.apiFetcher != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's say this is enabled. we seem to go here, no matter what, right? Before, we only got here if we were unable to read the content-sets files or the prod sec data is faulty.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it seems like if len(repoids) == 0 && r.apiFetcher != nil
would be preferred, I can't see a situation where we'd want to reach out to the container API if we had content-sets either from DNF or the embedded file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These txter test files are not net-new, I think they should be removed from rpm/testdata
for _, cpe := range cpes.CPEs { | ||
s[cpe] = struct{}{} | ||
} | ||
func (m *mappingFile) GetOne(ctx context.Context, repoid string) (cpes []string, ok bool) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs a comment
db *sql.DB | ||
|
||
// Concurrent maps for memoizing database lookups. | ||
absent sync.Map |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason for using an absent
map here and not saving an empty struct in repo
that can be checked to signal if it's absent, readability?
environment.RepositoryIDs[i] = layerArtifacts.Repos[i].ID | ||
v, _ := url.ParseQuery(pkg.RepositoryHint) | ||
if id := v.Get("repoid"); id != "" { | ||
environment.RepositoryIDs = v["repoid"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will leave inconsistencies in how the repo IDs are reported in index report. Packages with no specific repos will report their repo ids as numerical with the ability to be referenced back to repository
objects (like current behaviour):
"repository_ids": [
"1",
"2",
...
But packages that have a specific repoid specified by the DNF DB will have the repo slug that can not be directly referenced back to a repository
object:
"repository_ids": [
"rhel-9-for-x86_64-appstream-rpms"
]
We should think of a way to make these consistent to keep the expectation that "repository_ids"
will refer to keys in the "repository"
section and not a mix of the claircore.Repository.ID
and (what will be persisted as) the claircore.Repository.Name
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do something like this:
func lookupRepositoryID(repos []*claircore.Repository, slug string) string {
for _, r := range repos {
if r.Name == slug {
return r.ID
}
}
return ""
}
and then
if slug := v.Get("repoid"); slug != "" {
if id := lookupRepositoryID(layerArtifacts.Repos, slug); id != "" {
environment.RepositoryIDs = []string{id}
}
This should always find a repo as repository scanner should be finding the same repos as the package scanner.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This suggestion is incomplete in that we need to append
all repositories who's Name
== repoid, not just the first one as the example code does
Rename to NERVA to NEVRA
These changes make the
rhel
machinery consult both therpm
anddnf
databases when possible.See-also: #809