-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix/slow cmu #19
Fix/slow cmu #19
Conversation
7f996f2
to
a658d74
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm. But I would like @tristanessquare to take a second look and approve.
@tristanessquare: It's ok that this doesn't have the potential to be merged to the upstream. So please don't take this as criteria.
windows 1.17.x and ubuntu 1.16 testing stages are currently failing. We should fix this before merging. |
@tristanessquare that's fine by me. |
This became more important and I started to look more into the failing unit tests. I'm immensely frustrated and confused why the tests seem to be succeeding for the upstream repository, but fail on our fork. #!/bin/bash
set -o errexit
set -o nounset
set -o pipefail
sed -i 's/go-version: \[.*$/go-version: \[1.17.x\]/' .github/workflows/go-lint.yml
sed -i 's/os: \[ubuntu-latest, windows-latest\]/os: \[ubuntu-latest\]/' .github/workflows/go-lint.yml
RC=0
act -W .github/workflows/go-lint.yml pull_request || RC=1
git reset --hard
exit $RC I initially wanted to run a I then fell back to running ~/bisect.sh | tee <branch>.log and fix-slow-cmu.log Looking at the diff shows that iternity-rb introduced a few tests that fail in a CI environment (they don't when run separately). The same tests fail between iternity-rb and this pull request, which kinda makes this safe to merge. We should address the failing tests ASAP, but I don't see any reason why this merge should be blocked by failing tests. I'm saying this mostly because the fix gained importance due to a customer being blocked and it might become necessary to merge this without the unit tests succeeding, @tristanessquare . At least it looks like I didn't break anything else. I only looked at the diff for failed tests, i.e. |
I've removed some additional code to make clear that what was previously used as a fallback operation is now the default. We have no choice but to merge this even with the failing unit tests. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
There's no need to continously stat all object parts on disk if most of them have already been appended.
The backgroundAppend was meant as a performance optimization where uploaded chunks would be aggregated to a complete file in the background while more parts were uploaded at the same time. This might be an optimization on actual filesystem paths, but on Gluster it made the operation a lot slower because many redundant filesystem calls were executed. The CompleteMultipartUpload request which requires the uploaded chunks to be put together has to 1. wait for any ongoing backgroundAppend operations to complete, 2. enumerate all chunks and check if they were put together in the right order (otherwise start from scratch), 3. move the aggregated file. Removing this piece of code started out as an experiment, because I expected the chunks to not be aggregated at all. It turned out that there is a fallback which is also necessary in case the final object should have a different order or not contain some of the uploaded parts. This also ensures that a final aggregate is created. At least on GlusterFS this makes any CMU request run almost twice as fast as when using backgroundAppend.
The same tests failed as always, so I'd say this is safe to merge. |
Description
In combination with a GlusterFS backend we realized that the CompleteMultipartUpload request becomes very slow. The larger the uploaded filesize is, the longer the request will take.
Motivation and Context
We noticed that for larger filesizes some clients would run into a timeout during CompleteMultipartUpload requests. The default timeout seems to be 1h, which is pretty big. Nevertheless uploading 20 or 50GiB files caused CMU requests that took way longer to complete.
Compared to AWS it is non-intuitive that a CMU request should take so long, but we're willing to accept it, because it's a limitation of the current implementation, which has to put all chunks together into one big file.
This fixes #13 and #11 .
How to test this PR?
Tests can easily be performed manually.
time
is helpful in tracking the total duration on the client side, whereasmc admin trace -v -a
will log all requests including their duration.If large files are required
truncate -s 100G file
can help you quickly create a file containing only zeros. Without using compression, this is just as good as creating such a big file with actual content, but waaaaay faster.Types of changes
Checklist:
commit-id
orPR #
here)