Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large upload failed - allow custom options for "zfs send" #29

Open
jmohan007 opened this issue Dec 24, 2018 · 3 comments
Open

Large upload failed - allow custom options for "zfs send" #29

jmohan007 opened this issue Dec 24, 2018 · 3 comments

Comments

@jmohan007
Copy link

I have been using z3 for over a month successfully. Thank you! I have recently run into a problem and have a potential enhancement suggestion.

I have a large dataset that is stored with compression on. Recently, full uploads to S3 started failing. The error from S3 indicates the dataset exceeded 5T (6.3T). Two issues:

  1. Firstly, I expected pput to break this into chunks, but it did not seem to do that:
    zfs send 'pool/data@zfs-auto-snap_daily-2018-12-23-0832' | pput --quiet --estimated 6947160277512 --meta size=6947160277512 --meta isfull=true z3-backup/pool/data@zfs-auto-snap_daily-2018-12-23-0832

  2. Secondly the dataset itself, in its compressed state is ~2.6T. So, I modified snap.py code to add "-Lce" to "zfs send". This seems better. The upload is still running.
    zfs send -Lce 'pool/data@zfs-auto-snap_daily-2018-12-23-0832' | pput --quiet --estimated 2655377785352 --meta size=2655377785352 --meta isfull=true z3-backup/pool/data@zfs-auto-snap_daily-2018-12-23-0832

  3. It would be good to allow custom options to be passes to zfs send as part of z3 backup. E.g. --zfs-options "Lce"

@rciorba
Copy link
Contributor

rciorba commented Dec 28, 2018

Just adding some thoughts since I'm no longer actively involved with this project.

Firstly, I expected pput to break this into chunks, but it did not seem to do that:
zfs send 'pool/data@zfs-auto-snap_daily-2018-12-23-0832' | pput --quiet --estimated 6947160277512 --meta size=6947160277512 --meta isfull=true z3-backup/pool/data@zfs-auto-snap_daily-2018-12-23-0832

Indeed, z3 doesn't split up files if they exceed 5T. It chunks up the stream for multipart upload, but only stores the output of zfs send as one object in s3. This is currently a limitation of the tool.

Secondly the dataset itself, in its compressed state is ~2.6T. So, I modified snap.py code to add "-Lce" to "zfs send". This seems better. The upload is still running.
zfs send -Lce 'pool/data@zfs-auto-snap_daily-2018-12-23-0832' | pput --quiet --estimated 2655377785352 --meta size=2655377785352 --meta isfull=true z3-backup/pool/data@zfs-auto-snap_daily-2018-12-23-0832

Hmm. Have you tried compressing the stream with pigz. See --compressor
Also, I'm unfamiliar with zfs send -c I don't think I saw this option when we first implemented z3, but maybe it should become one of the compressors (perhaps the default if it's built in).
The few references I find of it online suggest it adjusts compression levels based on line-speed, so in your particular case you might want to use something that just gives you constant high compression.

It would be good to allow custom options to be passes to zfs send as part of z3 backup. E.g. --zfs-options "Lce"

This proposal requires some thought. My only concern would be that options passed to zfs send then require equivalent options passed to zfs recv. Ideally one should be able to just type z3 restore snapshot-name. If these options require anything on the recv side, we need a way to add them to the snap metadata in S3 (like we already do for compressor), so the correct zfs recv invocation is used.

@jmohan007
Copy link
Author

Hello. Not sure how to comment inline (new to github):

  1. 5T limit: Got it.
  2. I would rather not first decompress (lz4) and recompress (pigz) to perhaps gain a bit more compression.
  3. "-Lce" and in particular, -c tells zsend to not decompress. zfs has many built-in compressors. Most people recommend to use lz4 by default - no performance penalty and 2x space gain. WRT, z3, the assumption is that the receiving dataset will also be similarly compressed. This is a good assumption as z3 restore as we will put back a snapshot into the original dataset. Consequently, "zfs recv" does NOT require any additional parameters. Full disclosure, I have not yet verified it via z3. I have done it manually on command line. "zfs recv" did not need anything special. Looking at the z3 -dryrun commands, it should work.

@jmohan007
Copy link
Author

Hi,
I have implemented the changes required to support arbitrary size uploads. Files greater than ~4T are broken into smaller parts (.part_nn) by pput.py and get.py looks for these an assembles them into one stream. Thanks to your nice code structure, the changes were fairly simple, even though I barely know python. They seem to work fine. Would it be possible for you to review them? Thanks!
z3_5T.zip

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants