Performance: Optimize speed and disk usage of backgroudAppend() for load balanced NAS cases #11

rluetzner · 2021-10-07T09:48:07Z

Current Behavior

We've observed that CompleteMultipartUpload requests can potentially take long to complete and waste a lot of disk space when the uploads are done in a distributed load balanced setup.

This is because every MinIO instance involved in the multipart upload (i.e. any instance receiving a PutObjectPart request) will start fs-v1-multipart.go --> backgroundAppend() as a go routine. Every instance generates a unique file for this, so other instances cannot benefit from this optimization. Additionally only the MinIO instance that receives the CompleteMultipartUpload request will move it's aggregated temp file to the final location. All other temp files remain in .minio.sys/tmp until the cleanup process removes them (24h later).

Expected Behavior

When multiple MinIO instances are serving a NAS backend in a load balanced setup, the CompleteMultipartUpload should finish as quickly as possible or at least with a semi-consistent behavior / time taken.

Possible Solution

We will need some kind of cluster-aware backgroundAppend() logic that works for gateway nas and nasxl cases. Maybe etcd can be used to elect a leader or otherwise negotiate the process.

Steps to Reproduce (for bugs)

At least the wasted disk space can also be observed when serving the same local directory with multiple MinIO instances.

Start multiple minio instances with different ports in gateway nas mode, e.g. minio gateway nas --address 9000 ./data.
Use a load balancer, e.g. sidekick: ./sidekick --health-path "/minio/health/live" http://localhost:900{0...3}.
Generate a big file to upload, e.g. truncate -s 350M test.txt
Use mc to upload the test file.
Take a look at .minio.sys/tmp.

Context

We're wasting disk space and completing multipart uploads often take long to complete, depending on which MinIO instance is hit with the CompleteMultipartUpload request.

Regression

Is this issue a regression? No. This issue is also present in the official MinIO code, but they decided not to fix it: minio#13270

rluetzner · 2021-10-18T15:04:51Z

Distributed locking might help us here. These links might help us with implementing something like this using etcd:

rluetzner · 2022-04-04T12:17:07Z

Will also be solved by #19 .

- Also simplify helper function to cleanup transitioned objects on expiry or when they are 'overwritten' - Use expireTransitionedObjects in delete-object, multi-delete-objects and put-object API handlers

rluetzner · 2022-06-01T08:53:04Z

#19 should improve the situation.

rluetzner · 2022-06-01T11:50:43Z

We decided to revert this change for now.

iternity-dotcom added the performance label Oct 7, 2021

rluetzner mentioned this issue Oct 18, 2021

WIP: Implement distributed locking with etcd #18

Closed

7 tasks

rluetzner linked a pull request Oct 18, 2021 that will close this issue

WIP: Implement distributed locking with etcd #18

Closed

7 tasks

rluetzner removed a link to a pull request Oct 20, 2021

WIP: Implement distributed locking with etcd #18

Closed

7 tasks

rluetzner mentioned this issue Apr 4, 2022

Fix/slow cmu #19

Merged

7 tasks

rluetzner closed this as completed Jun 1, 2022

rluetzner reopened this Jun 1, 2022

rluetzner mentioned this issue Jun 2, 2022

Fix/slow cmu #35

Merged

7 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Optimize speed and disk usage of backgroudAppend() for load balanced NAS cases #11

Performance: Optimize speed and disk usage of backgroudAppend() for load balanced NAS cases #11

rluetzner commented Oct 7, 2021

rluetzner commented Oct 18, 2021

rluetzner commented Apr 4, 2022

rluetzner commented Jun 1, 2022

rluetzner commented Jun 1, 2022

Performance: Optimize speed and disk usage of backgroudAppend() for load balanced NAS cases #11

Performance: Optimize speed and disk usage of backgroudAppend() for load balanced NAS cases #11

Comments

rluetzner commented Oct 7, 2021

Current Behavior

Expected Behavior

Possible Solution

Steps to Reproduce (for bugs)

Context

Regression

rluetzner commented Oct 18, 2021

rluetzner commented Apr 4, 2022

rluetzner commented Jun 1, 2022

rluetzner commented Jun 1, 2022