Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use GCS object update time instead of creation time #2454

Merged
merged 4 commits into from
Sep 12, 2024

Conversation

andrewpollock
Copy link
Contributor

@andrewpollock andrewpollock commented Aug 6, 2024

This should achieve the same outcome and is easier to manipulate (it doesn't require object recreation) and so should make triggering a reimport of specific records a little more efficient (doesn't require downloading and reuploading the object to reset the creation time)

Tested in Staging:

Before:

$ gsutil stat gs://osv-test-cve-osv-conversion/osv-output/CVE-2024-7067.json
gs://osv-test-cve-osv-conversion/osv-output/CVE-2024-7067.json:
    Creation time:          Fri, 26 Jul 2024 16:53:59 GMT
    Update time:            Wed, 31 Jul 2024 03:52:06 GMT
    Storage class:          STANDARD
    Content-Length:         3073
    Content-Type:           application/json
    Metadata:
        goog-reserved-file-mtime:1722394200
    Hash (crc32c):          Lo3Xwg==
    Hash (md5):             rdI8478THJQOVCgblWW9UQ==
    ETag:                   COmK0tyVxYcDED4=
    Generation:             1722012839216489
    Metageneration:         62

After:

$ gsutil stat gs://osv-test-cve-osv-conversion/osv-output/CVE-2024-7067.json
gs://osv-test-cve-osv-conversion/osv-output/CVE-2024-7067.json:
    Creation time:          Fri, 26 Jul 2024 16:53:59 GMT
    Update time:            Tue, 06 Aug 2024 07:07:12 GMT
    Storage class:          STANDARD
    Content-Length:         3073
    Content-Type:           application/json
    Metadata:
        goog-reserved-file-mtime:1722394200
    Hash (crc32c):          Lo3Xwg==
    Hash (md5):             rdI8478THJQOVCgblWW9UQ==
    ETag:                   COmK0tyVxYcDED8=
    Generation:             1722012839216489
    Metageneration:         63

I have a sneaking suspicion this may facilitate the intentional reimporting of CVEs that get updated when one of their constituent parts changes (the fact that the original creation and update times were already divergent surprised me)

This also requires combine-to-osv to switch back to using gsutil rsync instead of gcloud storage rsync due to a subtle difference in the treatment of the GCS object modification time for otherwise unchanged objects. (see #2196 for additional context)

This should achieve the same outcome and is easier to manipulate (it
doesn't require object recreation) and so should make triggering a
reimport of specific records a little more efficient.

Tested in Staging:

Before:
```
$ gsutil stat gs://osv-test-cve-osv-conversion/osv-output/CVE-2024-7067.json
gs://osv-test-cve-osv-conversion/osv-output/CVE-2024-7067.json:
    Creation time:          Fri, 26 Jul 2024 16:53:59 GMT
    Update time:            Wed, 31 Jul 2024 03:52:06 GMT
    Storage class:          STANDARD
    Content-Length:         3073
    Content-Type:           application/json
    Metadata:
        goog-reserved-file-mtime:1722394200
    Hash (crc32c):          Lo3Xwg==
    Hash (md5):             rdI8478THJQOVCgblWW9UQ==
    ETag:                   COmK0tyVxYcDED4=
    Generation:             1722012839216489
    Metageneration:         62
```

After:
```
$ gsutil stat gs://osv-test-cve-osv-conversion/osv-output/CVE-2024-7067.json
gs://osv-test-cve-osv-conversion/osv-output/CVE-2024-7067.json:
    Creation time:          Fri, 26 Jul 2024 16:53:59 GMT
    Update time:            Tue, 06 Aug 2024 07:07:12 GMT
    Storage class:          STANDARD
    Content-Length:         3073
    Content-Type:           application/json
    Metadata:
        goog-reserved-file-mtime:1722394200
    Hash (crc32c):          Lo3Xwg==
    Hash (md5):             rdI8478THJQOVCgblWW9UQ==
    ETag:                   COmK0tyVxYcDED8=
    Generation:             1722012839216489
    Metageneration:         63
```

I have a sneaking suspicion this may facilitate the intentional
reimporting of CVEs that get updated when one of their constituent parts
changes (the fact that the original creation and update times were
divergent surprised me)
@hogo6002
Copy link
Contributor

hogo6002 commented Aug 7, 2024

I recall that we previously encountered an issue where gcloud storage rsync --checksums-only was repeatedly updating the modified time on GCS objects (#2196). We are still using this command to write results to the osv-output folder. So I am wondering that our osv-output entries may always update their modification times even when there are no actual changes to the data. This may cause we re-import more times than we need if we switch checks from creation time to update time?

@andrewpollock
Copy link
Contributor Author

I recall that we previously encountered an issue where gcloud storage rsync --checksums-only was repeatedly updating the modified time on GCS objects (#2196)

Ah, if that's the (unfortunate) culprit (which I dare say it will be) we may need to also migrate that away from gcloud storage rsync in favour of gsutil rsync... That difference in behaviour continues to annoy me :-(

andrewpollock and others added 2 commits August 13, 2024 00:14
There is a subtle and unfortunately annoying difference in behaviour for
unchanged files:
`gcloud storage rsync` still causes a modification time change to the object in
GCS, whereas `gsutil rsync` does not.
Copy link
Contributor

@hogo6002 hogo6002 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Andrew!

@andrewpollock andrewpollock changed the title Use GCS object update time instead of creation time feat: use GCS object update time instead of creation time Aug 13, 2024
@andrewpollock
Copy link
Contributor Author

@hogo6002 if I followed #2611 correctly, it makes what's changing in this PR more viable?

@hogo6002
Copy link
Contributor

@hogo6002 if I followed #2611 correctly, it makes what's changing in this PR more viable?

Yeah, I think as long as we use gsutil, it should be ok for us to merge this PR.

@andrewpollock andrewpollock merged commit 32b1035 into google:master Sep 12, 2024
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants