Async File uploads #4572

exalate-issue-sync · 2022-09-13T11:41:49Z

After a client finished the upload of a file (ie. transfered the last byte), it should get the upload success code immediately if all bytes were transfered successfully.

From the clients perspective, upload means transfering bytes through the network to the server. As soon as that has happened, the client should get a positive answer immediately.

Things that technically might take long time (assembling of huge files after upload) should be transparent to the client. It does not have to wait for it and should not have to poll for success.

Operation steps like virus scan which also might take long time are actually a workflow step and should be handled as that.

It needs to be defined how this can be done properly from the technical and the user experience POV.

exalate-issue-sync · 2022-09-13T11:41:50Z

Klaas Freitag commented: # Specification:

The upload workflow in oCIS should - as a first step - work like this:

Client defined UUIDs

Clients are allowed to send a UUID in a header of the upload with the first request. That UUID is used as the file ID in oCIS. That way the client does not have to wait for the server side processing to be completed to get the file ID. The file ID is simply set by the client. oCIS needs to check if there is a file ID header and use that for the file.

Questions:

The server maintains a specific format of a UUID. Is it important that the clients use the same format? Capability?

Async Client Upload

The following Async Upload will make uploading faster from the clients point of view, because it does not have to wait for any postprocessing or updating of the meta data (which can take some time in oC10).

Also, in situations where storage of Metadata and binary data is split (eg. S3) we need this.

The processing is as follows:

Once the data of the file is uploaded completely, which means that the last byte arrived on the server, it immediately returns with an status code 202 (Accepted). That indicates to the uploading client that postprocessing is ongoing.
The client uploads the data to a fast, temporar POSIX based storage temporarily which is for example in case of TUS needed anyway.
The reply to the upload contains an arbitrary, changed ETag as usual. The ETag of the touched file is changed already.
In subsequent listings of the directory (via PROPFIND or GraphAPI) the server will list the file from the moment that the upload completed. If a status is included in the reply, the status 425 (TOO EARLY) will be returned.
If a client tries to download a file while postprocessing is ongoing, the server replies with 425
If a client tries to change or delete a file while postprocessing is ongoing, the server replies 425
After postprocessing has completed, the server changes the ETag of the parent dir of the file.

The most asked question:

If a file is removed, because it contains a virus that makes the virus check (which is running in post-processing) deleting the file, how does the client get to know that?

Answer: Currently, there is no special info for the client. The file just disappears. It is up to the virus app (or any other that does similar things) to inform the user, ie. through the activity app.

Checksums

To do a proper checksum handling (the checksum of the complete file is not validated after the last upload) the TUS extension of partial checksums should be implemented. See Partial Checksum Extension tus/tus-resumable-upload-protocol#172
As long as the extension is not there, the server has to implement a rolling checksum while the file is uploaded and do the check prior the reply on the last chunk upload.

exalate-issue-sync · 2022-09-13T11:41:51Z

Klaas Freitag commented: Since older clients do not handle the 403 forbidden status for a single file correctly, we will return "423 Locked" instead. That is handled as a Softerror (at least in Desktop), which means: The client does not bother the user with that error, but silently tries to get the file later, which is the correct behaviour.

exalate-issue-sync · 2022-09-13T11:41:51Z

Jörn Friedrich Dreyer commented: In effect, this behavior means we queue uploads on the server side. Even if we directly upload all chunks to the final destination, eg as multipart upload for an S3 storage, the final assembly may fail. How do we handle that case? retry indefinitely? ping admin?

I actually think this would be ok, because when we shard the uploads per space, the bytes are still available in the space. We can notify admins so he can fix the problem. Or we give a metric on how many uploads are still in progress, so it can be monitored properly.

How do we deal with the case the where the client sucessfully transfers all bytes, but a server side workflow like anti virus scanning or file classification makes the finalize request fail?

I think we should treat file upload, as in transferring bytes, different than postprocessing.
When the last byte is transferred we return and the client can continue. But then we need to make transparent that a postprocessing workflow is in progress. For S3 that includes file assembly. It might include antivirus scanning or classification. Some workflow steps might cause the new bytes to be rejected. What is going to happen then?

The TUS upload ID can remain accessible until postprocessing is done ... so clients could check the status of the upload ... we could expose workflow steps using headers similar to the Server-Timing header? https://w3c.github.io/server-timing/#the-server-timing-header-field

But how often would they need to do that? how will clients get notified when a necessary postprocessing step fails. And ... should clients then delete the file locally? How can they even retry if all they got was a 202 Accepted and later a new etag for the same file ... AFAICT they will redownload and overwrite local changes ...

exalate-issue-sync · 2022-09-13T11:41:52Z

Klaas Freitag commented: Reply to [~jfd]:

We do not have to queue uploads. While the upload is running, the server has to check if the if-match condition is still correct. If another upload is faster, the if-match will break, which is replied to the client. (Attention: An empty if-match header indicates a new file upload).
If postprocessing is running, other clients will get a reply accordingly and try again later.
From the clients POV, if all parts of a file are uploaded with a verified checksum (!), the file is on the server. If the final assembly fails, that is not the business of any client. Rather call the admin, yes. Nothing the client should bother. Or in other words: The final assembly CAN NEVER fail from the clients POV.
If the file does not pass the anti virus, it is just removed on the server, and as a result of that, on all clients. Nothing the client need to understand.

exalate-issue-sync · 2022-09-13T11:41:53Z

Michael Barz commented: Was discussed in cross-platform meeting:

Outcome regarding the clients: New clients should work seamless with the new protocol, old clients should not explode.
Further usability improvements like showing the postprocessing, process name and the progress should only be visible in newly developed clients

micbar · 2022-09-13T11:43:09Z

This was implemented on the experimental branches of reva and ocis.

stale · 2023-03-18T00:13:00Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 10 days if no further activity occurs. Thank you for your contributions.

exalate-issue-sync bot added p3-medium Type:Epic Epic is the parent of user stories labels Sep 13, 2022

micbar removed the KOKO_TESTSTELLUNG label Sep 13, 2022

micbar closed this as completed Sep 13, 2022

exalate-issue-sync bot reopened this Dec 16, 2022

exalate-issue-sync bot added the p3-medium label Dec 16, 2022

stale bot added the Status:Stale label Mar 18, 2023

stale bot closed this as completed Apr 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Async File uploads #4572

Async File uploads #4572

exalate-issue-sync bot commented Sep 13, 2022

exalate-issue-sync bot commented Sep 13, 2022 •

edited

Loading

exalate-issue-sync bot commented Sep 13, 2022

exalate-issue-sync bot commented Sep 13, 2022

exalate-issue-sync bot commented Sep 13, 2022

exalate-issue-sync bot commented Sep 13, 2022

micbar commented Sep 13, 2022

stale bot commented Mar 18, 2023

Async File uploads #4572

Async File uploads #4572

Comments

exalate-issue-sync bot commented Sep 13, 2022

exalate-issue-sync bot commented Sep 13, 2022 • edited Loading

Client defined UUIDs

Async Client Upload

Checksums

exalate-issue-sync bot commented Sep 13, 2022

exalate-issue-sync bot commented Sep 13, 2022

exalate-issue-sync bot commented Sep 13, 2022

exalate-issue-sync bot commented Sep 13, 2022

micbar commented Sep 13, 2022

stale bot commented Mar 18, 2023

exalate-issue-sync bot commented Sep 13, 2022 •

edited

Loading