Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convenience wrapper for disk uploads to SDK #840

Merged
merged 1 commit into from
Sep 23, 2024
Merged

Conversation

wfchandler
Copy link
Contributor

From a user perspective uploading a disk is a single operation, but doing so via the SDK is a very involved process, involving stringing together multiple API calls with any interruption leaving the disk in a state that cannot be directly removed.

Add a new extras module to the SDK that will contain convenience wrappers around actions that functions that are overly complicated. Create a disk_import function in extras that will perform a disk upload as a single step, and attempts to cleanup partially imported disks on failure. We make the disk upload CLI command a thin wrapper around the new SDK function, responsible only for cancellation and progress. We manually recreate the builder interface, as creating a procmacro to do this would require creating a new crate. If/when extras continues to grow we should reconsider this.

disk_import also give callers access to a DiskImportHandle with a watch channel with the current number of bytes uploaded for use in progress tracking, and provides a cancel function that allows users to gracefully stop the disk upload. A disk that has been successfully created will not be removed, only those that are partially imported.

The cancel function uses tokio::select to cancel the Future of the upload task. This should be safe in our in our case no locks are held across await points. No operations need to be safe to restart as we only call tokio::select a single time, they will be dropped immediately and losing a message is not a problem.

We are forced to remove tests test_disk_import_bulk_import_start_fail and test_disk_import_bulk_write_import_fail as we now poll for disks in Created state, which may happen if a user immediately cancels a new request. This state is not eligible to be finalized, so we need to wait for the disk to progress to the next state, but HttpMock does not provide a way to programatically delete or update a mock after sending n responses. This means that the disk will always be returned as non-existing when querying, and cleanup will be skipped. I have opened alexliesenfeld/httpmock#108 to add a delete_after_n method, but this seems unlikely to be merged. We may want to consider moving to wiremock in the long term, which provides this functionality.

@wfchandler

This comment was marked as outdated.

@david-crespo
Copy link
Contributor

That is so great. "Cleaning up disk" is a little vague. I wonder if it's worth throwing a little more info in like the web console does, both before and after the main upload step. Like you could say "Creating disk xxxxxx", putting it in import mode, then the progress bar, and then when you cancel, the cleanup step makes more sense because it's undoing the steps you called out before. I'm a little on the fence because it's borderline too much detail, but I do like educating the user on what is actually taking place because it empowers them to debug and read the docs if something goes wrong.

@wfchandler
Copy link
Contributor Author

That is so great. "Cleaning up disk" is a little vague. I wonder if it's worth throwing a little more info in like the web console does, both before and after the main upload step. Like you could say "Creating disk xxxxxx", putting it in import mode, then the progress bar, and then when you cancel, the cleanup step makes more sense because it's undoing the steps you called out before. I'm a little on the fence because it's borderline too much detail, but I do like educating the user on what is actually taking place because it empowers them to debug and read the docs if something goes wrong.

Listing the name makes sense, I've added that and replaced the debug formatting with quoted names throughout. Updated demo:

Disk.Upload.Cleanup.-.Updated.mov

@wfchandler wfchandler marked this pull request as ready for review September 16, 2024 19:06
Copy link
Contributor

@augustuswm augustuswm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really appreciate this work! Will look through it, just a couple initial notes.

thread_ct: Option<usize>,
) -> Result<
(
impl Future<Output = Result<()>> + 'a,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still need to read through everything, but are we able to returned a typed error instead of an anyhow error? I think it is fine at the CLI level to return them, but anyhow errors are quite difficult to handle when getting them back from an sdk.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good call, I've moved to a custom error type.

Copy link
Collaborator

@ahl ahl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. I think the approach works.

return Err(e.into());
}
pb.finish_and_clear();
pb.println("Cleaning up disk. Press CTRL+C again to exit immediately.");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest we omit the "... exit immediately" text in that basically we don't want people to do that... and in my experience they don't need an invitation to ^C

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha, very true

) -> Result<(), DiskImportError> {
self.check_for_existing_disk().await?;

let result = tokio::select! {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want biased; here? i.e. I think we want to prioritize cancelation over progress.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

Ok(())
}

async fn import_disk(&self) -> Result<(), DiskImportError> {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
async fn import_disk(&self) -> Result<(), DiskImportError> {
async fn do_disk_import(&self) -> Result<(), DiskImportError> {

What do you think? Hoping for a name that indicates this is the inner loop of disk_import rather than just reversing the two words

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense 👍

Comment on lines 116 to 125

Ok(())
}));
let pb = start_progress_bar(handle.progress(), disk_info.file_size, &self.disk)?;
watch_for_ctrl_c(handle, pb);

senders.push(tx);
}

// Read chunks from the file in the file system and send them to the
// upload threads.
let mut file = File::open(&self.path)?;
let mut i = 0;
let mut offset = 0;

let read_result: Result<()> = loop {
let mut chunk = Vec::with_capacity(CHUNK_SIZE as usize);

let n = match file.by_ref().take(CHUNK_SIZE).read_to_end(&mut chunk) {
Ok(n) => n,
Err(e) => {
eprintln_nopipe!(
"reading from {} failed with {:?}",
self.path.to_string_lossy(),
e,
);
break Err(e.into());
}
};
import_future.await?;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this is going to be more complex or simpler, but what do you think of doing something like this:

let real_work_handle = tokio::spawn(import_future);

select! {
    // ^C thing
    // progress thing
}

real_work_handle.await?;

The difference I see is inlining the work that's spread between the progress bar and ctrl_c watch functions which don't seem to be particularly self-contained.

Perhaps it's about the same, but I wanted to float the idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'a bit cleaner, I combined the two tasks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up reverting this as exiting the task when the progress bar completes causes us to stop listening for the cancel prior to the task completing, preventing users from canceling late in the request.

From a user perspective uploading a disk is a single operation, but
doing so via the SDK is a very involved process, involving stringing
together multiple API calls with any interruption leaving the disk in a
state that cannot be directly removed.

Add a new `extras` module to the SDK that will contain convenience
wrappers around actions that functions that are overly complicated.
Create a `disk_import` function in `extras` that will perform a disk
upload as a single step, and attempts to cleanup partially imported
disks on failure. We make the `disk upload` CLI command a thin wrapper
around the new SDK function, responsible only for cancellation and
progress. We manually recreate the builder interface, as creating a
procmacro to do this would require creating a new crate. If/when
`extras` continues to grow we should reconsider this.

`disk_import` also give callers access to a `DiskImportHandle` with a
`watch` channel with the current number of bytes uploaded for use in
progress tracking, and provides a `cancel` function that allows users
to gracefully stop the disk upload. A disk that has been successfully
created will not be removed, only those that are partially imported.

The `cancel` function uses `tokio::select` to cancel the `Future` of
the upload task. This should be safe in our in our case no locks are
held across `await` points. No operations need to be safe to restart as
we only call `tokio::select` a single time, they will be dropped
immediately and losing a message is not a problem.

We are forced to remove tests `test_disk_import_bulk_import_start_fail`
and `test_disk_import_bulk_write_import_fail` as we now poll for disks
in `Created` state, which may happen if a user immediately cancels a new
request. This state is not eligible to be finalized, so we need to wait
for the disk to progress to the next state, but `HttpMock` does not
provide a way to programatically delete or update a mock after sending
`n` responses. This means that the disk will always be returned as
non-existing when querying, and cleanup will be skipped. I have opened
alexliesenfeld/httpmock#108 to add a
`delete_after_n` method, but this seems unlikely to be merged. We may
want to consider moving to `wiremock` in the long term, which provides
this functionality.

Co-authored-by: Adam Leventhal <[email protected]>
@wfchandler wfchandler merged commit a2bbd78 into main Sep 23, 2024
16 checks passed
@wfchandler wfchandler deleted the wc/disk-import-sdk branch September 23, 2024 17:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants