Fix/slow cmu #19

rluetzner · 2022-02-25T10:56:02Z

Description

In combination with a GlusterFS backend we realized that the CompleteMultipartUpload request becomes very slow. The larger the uploaded filesize is, the longer the request will take.

Motivation and Context

We noticed that for larger filesizes some clients would run into a timeout during CompleteMultipartUpload requests. The default timeout seems to be 1h, which is pretty big. Nevertheless uploading 20 or 50GiB files caused CMU requests that took way longer to complete.

Compared to AWS it is non-intuitive that a CMU request should take so long, but we're willing to accept it, because it's a limitation of the current implementation, which has to put all chunks together into one big file.

This fixes #13 and #11 .

How to test this PR?

Tests can easily be performed manually. time is helpful in tracking the total duration on the client side, whereas mc admin trace -v -a will log all requests including their duration.
If large files are required truncate -s 100G file can help you quickly create a file containing only zeros. Without using compression, this is just as good as creating such a big file with actual content, but waaaaay faster.

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Optimization (provides speedup with no functional changes)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

Fixes a regression (If yes, please add commit-id or PR # here)
Documentation updated
Unit tests added/updated

aweisser

Lgtm. But I would like @tristanessquare to take a second look and approve.
@tristanessquare: It's ok that this doesn't have the potential to be merged to the upstream. So please don't take this as criteria.

tristanexsquare · 2022-04-21T13:46:00Z

windows 1.17.x and ubuntu 1.16 testing stages are currently failing. We should fix this before merging.

rluetzner · 2022-04-29T11:36:46Z

@tristanessquare that's fine by me.

rluetzner · 2022-05-18T08:47:15Z

This became more important and I started to look more into the failing unit tests. I'm immensely frustrated and confused why the tests seem to be succeeding for the upstream repository, but fail on our fork.
I've used https://github.com/nektos/act to run the GitHub workflows locally and limited myself to go-lint.yml. To save some resources I've also limited the execution to ubuntu-latest and Go version 1.17.x. I've used the following script:

#!/bin/bash
set -o errexit
set -o nounset
set -o pipefail

sed -i 's/go-version: \[.*$/go-version: \[1.17.x\]/' .github/workflows/go-lint.yml
sed -i 's/os: \[ubuntu-latest, windows-latest\]/os: \[ubuntu-latest\]/' .github/workflows/go-lint.yml

RC=0
act -W .github/workflows/go-lint.yml pull_request || RC=1
git reset --hard
exit $RC

I initially wanted to run a git bisect, but I discovered that our tests never worked at all. I've run different official RELEASE tags from upstream, the master from upstream and also commits early on when the go-lint.yml was added. None of them succeeded.

I then fell back to running

~/bisect.sh | tee <branch>.log

and diffing the results. I'll attach the log files here.

fix-slow-cmu.log
iternity-rb.log
RELEASE.2021-10-13T00-23-17Z.log

Looking at the diff shows that iternity-rb introduced a few tests that fail in a CI environment (they don't when run separately). The same tests fail between iternity-rb and this pull request, which kinda makes this safe to merge. We should address the failing tests ASAP, but I don't see any reason why this merge should be blocked by failing tests.

I'm saying this mostly because the fix gained importance due to a customer being blocked and it might become necessary to merge this without the unit tests succeeding, @tristanessquare . At least it looks like I didn't break anything else.

I only looked at the diff for failed tests, i.e. diff iternity-rb.log fix-slow-cmu.log | grep 'FAIL:'.

rluetzner · 2022-05-30T10:44:02Z

I've removed some additional code to make clear that what was previously used as a fallback operation is now the default.

We have no choice but to merge this even with the failing unit tests.

aweisser

lgtm

There's no need to continously stat all object parts on disk if most of them have already been appended.

The backgroundAppend was meant as a performance optimization where uploaded chunks would be aggregated to a complete file in the background while more parts were uploaded at the same time. This might be an optimization on actual filesystem paths, but on Gluster it made the operation a lot slower because many redundant filesystem calls were executed. The CompleteMultipartUpload request which requires the uploaded chunks to be put together has to 1. wait for any ongoing backgroundAppend operations to complete, 2. enumerate all chunks and check if they were put together in the right order (otherwise start from scratch), 3. move the aggregated file. Removing this piece of code started out as an experiment, because I expected the chunks to not be aggregated at all. It turned out that there is a fallback which is also necessary in case the final object should have a different order or not contain some of the uploaded parts. This also ensures that a final aggregate is created. At least on GlusterFS this makes any CMU request run almost twice as fast as when using backgroundAppend.

rluetzner · 2022-06-01T08:52:19Z

The same tests failed as always, so I'd say this is safe to merge.

rluetzner mentioned this pull request Apr 4, 2022

Performance: Optimize speed and disk usage of backgroudAppend() for load balanced NAS cases #11

Open

rluetzner force-pushed the fix/slow-cmu branch from e61a333 to f4bebb9 Compare April 20, 2022 12:40

rluetzner changed the base branch from iternity to iternity-rb April 20, 2022 12:40

rluetzner force-pushed the fix/slow-cmu branch 2 times, most recently from 7f996f2 to a658d74 Compare April 20, 2022 12:50

rluetzner requested a review from aweisser April 20, 2022 13:33

rluetzner self-assigned this Apr 20, 2022

rluetzner force-pushed the fix/slow-cmu branch from a658d74 to 96165ac Compare April 20, 2022 13:35

rluetzner requested a review from tristanexsquare April 20, 2022 13:36

aweisser reviewed Apr 20, 2022

View reviewed changes

rluetzner force-pushed the fix/slow-cmu branch from 96165ac to fd6d874 Compare May 17, 2022 14:52

rluetzner force-pushed the fix/slow-cmu branch from fd6d874 to d896cf8 Compare May 30, 2022 10:42

rluetzner requested a review from aweisser May 30, 2022 10:44

aweisser approved these changes May 30, 2022

View reviewed changes

Robert Lützner added 2 commits June 1, 2022 08:34

fs-v1-multipart: Only stat remaining object parts

3d4fb6a

There's no need to continously stat all object parts on disk if most of them have already been appended.

rluetzner force-pushed the fix/slow-cmu branch from d896cf8 to 4ac0b96 Compare June 1, 2022 06:34

fs-v1: Remove unused property

9137a44

rluetzner merged commit 33a1155 into iternity-rb Jun 1, 2022

rluetzner deleted the fix/slow-cmu branch June 1, 2022 08:52

rluetzner mentioned this pull request Jun 1, 2022

Revert "Fix/slow cmu" #33

Merged

rluetzner restored the fix/slow-cmu branch June 2, 2022 11:48

rluetzner deleted the fix/slow-cmu branch June 2, 2022 11:51

rluetzner mentioned this pull request Jun 2, 2022

Fix/slow cmu #35

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/slow cmu #19

Fix/slow cmu #19

rluetzner commented Feb 25, 2022 •

edited

Loading

aweisser left a comment

tristanexsquare commented Apr 21, 2022

rluetzner commented Apr 29, 2022

rluetzner commented May 18, 2022 •

edited

Loading

rluetzner commented May 30, 2022

aweisser left a comment

rluetzner commented Jun 1, 2022

Fix/slow cmu #19

Fix/slow cmu #19

Conversation

rluetzner commented Feb 25, 2022 • edited Loading

Description

Motivation and Context

How to test this PR?

Types of changes

Checklist:

aweisser left a comment

Choose a reason for hiding this comment

tristanexsquare commented Apr 21, 2022

rluetzner commented Apr 29, 2022

rluetzner commented May 18, 2022 • edited Loading

rluetzner commented May 30, 2022

aweisser left a comment

Choose a reason for hiding this comment

rluetzner commented Jun 1, 2022

rluetzner commented Feb 25, 2022 •

edited

Loading

rluetzner commented May 18, 2022 •

edited

Loading