Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preempt Swift slots to enforce mutual exclusion updating Swift container #201

Merged
merged 29 commits into from
Jul 16, 2024

Conversation

zhijie-yang
Copy link
Contributor

@zhijie-yang zhijie-yang commented Jun 20, 2024

Ping the @canonical/rocks team.


Description

This PR adds a Prepare upload job to the Image.yaml workflow, which preempts the Swift slot right after retrieving the revision number from the Swift container to prevent a race condition. For the workflow before this PR, such race conditions can happen when a workflow is triggered, while another workflow with the same image name runs for a job between Build and Upload (both included). Since the job Build can take up 20 minutes, it's easy to trigger such a workflow that races with other workflows. The eventual consequence of such a race condition can result in duplicated revisions of images of different tracks, thus letting the workflow release wrong artefacts to some tracks (tags).

Changelog

  • Using {name}_{commit}_{directory} as identifiers for OCI image artefacts;
  • Validation of the uniqueness of such identifiers;
  • Forming a critical section in the job Prepare upload that makes a write-after-read operation to the Swift container;
  • Preempting slots in Swift container for concurrent uploads in the job Upload;
  • Protecting such critical sections by wrapping them with "lock" and "unlock" operations using a lock file (on a per-image basis);
  • Busy waiting when the lock file exists;
  • Unlocking when the workflow is cancelled or the preemption operation fails;
    • Won't unlock if the lock operation fails (usually means waiting for other Prepare upload jobs for the same image to finish)
  • Adding a timestamp to the revision data cache key to enable the re-run of failed Prepare upload jobs that can give different revision numbers.

Related issues

Before merge:


@zhijie-yang zhijie-yang force-pushed the ROCKS-1239-relocate-swift-get-revision branch 3 times, most recently from d0abdd9 to 3f3cf06 Compare June 21, 2024 07:40
@zhijie-yang zhijie-yang force-pushed the ROCKS-1239-relocate-swift-get-revision branch from 3f3cf06 to ed6d2e6 Compare June 21, 2024 08:14
@linostar linostar self-requested a review June 21, 2024 08:50
@zhijie-yang zhijie-yang changed the title ROCKS 1239 Preempt Swift slots to mitigate race conditions ROCKS 1239 Preempt Swift slots to enforce mutual exclusion updating Swift container Jun 21, 2024
@zhijie-yang zhijie-yang changed the title ROCKS 1239 Preempt Swift slots to enforce mutual exclusion updating Swift container Preempt Swift slots to enforce mutual exclusion updating Swift container Jun 27, 2024
@zhijie-yang zhijie-yang force-pushed the ROCKS-1239-relocate-swift-get-revision branch from 855c6af to 3bd79cb Compare June 27, 2024 13:11
@zhijie-yang zhijie-yang force-pushed the ROCKS-1239-relocate-swift-get-revision branch from 3bd79cb to 8ce4c0d Compare June 27, 2024 13:16
@zhijie-yang zhijie-yang force-pushed the ROCKS-1239-relocate-swift-get-revision branch from ef55c3c to 5b4db04 Compare June 27, 2024 14:29
@zhijie-yang zhijie-yang force-pushed the ROCKS-1239-relocate-swift-get-revision branch 2 times, most recently from da4320b to 291e92e Compare June 27, 2024 15:03
@zhijie-yang zhijie-yang force-pushed the ROCKS-1239-relocate-swift-get-revision branch from 7185c1f to b2c8e7b Compare June 28, 2024 14:12
examples/mock-rock/1.2/rockcraft.yaml Outdated Show resolved Hide resolved
src/image/requirements.txt Outdated Show resolved Hide resolved
src/uploads/infer_image_track.py Show resolved Hide resolved
src/uploads/swift_lockfile_lock.sh Outdated Show resolved Hide resolved
src/image/validate_unique_trigger_upload_entry.py Outdated Show resolved Hide resolved
.github/workflows/Image.yaml Outdated Show resolved Hide resolved
.github/workflows/Image.yaml Show resolved Hide resolved
.github/workflows/Image.yaml Show resolved Hide resolved
.github/workflows/Image.yaml Show resolved Hide resolved
.github/workflows/Image.yaml Show resolved Hide resolved
.github/workflows/Image.yaml Outdated Show resolved Hide resolved
.github/workflows/Image.yaml Outdated Show resolved Hide resolved
.github/workflows/Image.yaml Show resolved Hide resolved
src/uploads/swift_lockfile_lock.sh Show resolved Hide resolved
Copy link
Collaborator

@cjdcordeiro cjdcordeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ty for the changes, can we merge?

@zhijie-yang
Copy link
Contributor Author

Yes, I reverted the changes disabling ppc64el for mock-rock. We can merge now (even if the lpci build in the workflow can fail).

@cjdcordeiro
Copy link
Collaborator

there's a conflict that needs resolution

@zhijie-yang
Copy link
Contributor Author

Done.

Copy link
Collaborator

@cjdcordeiro cjdcordeiro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is the prometheus rock being built in the CI?

@zhijie-yang
Copy link
Contributor Author

Because of the merge commit 2acd80e.

@cjdcordeiro cjdcordeiro self-assigned this Jul 16, 2024
@cjdcordeiro cjdcordeiro merged commit 84ab4a9 into main Jul 16, 2024
@cjdcordeiro cjdcordeiro deleted the ROCKS-1239-relocate-swift-get-revision branch July 16, 2024 09:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants