Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sync changed files to S3 based on checksum, if present #9074

Closed
2 tasks
tgpfeiffer opened this issue Nov 14, 2024 · 2 comments
Closed
2 tasks

sync changed files to S3 based on checksum, if present #9074

tgpfeiffer opened this issue Nov 14, 2024 · 2 comments
Assignees
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue s3

Comments

@tgpfeiffer
Copy link

tgpfeiffer commented Nov 14, 2024

Describe the feature

This is the same as #8377 and #7011: aws s3 sync should be able to detect files that need synchronization based on the checksum stored in S3, not size/timestamp.

Both issues were closed in favor of #6750 but #6750 only goes half the way: it uploads the checksum as metadata, but doesn't take it into account when computing the sync candidates.

Use Case

A prototypical use case looks like

  • generate files (e.g., documentation) from a source in CI
  • sync the output to S3.

Because the files are generated in CI, they always have the current timestamp, so all of them will be synced to S3, even though only few of them may have changed. --size-only is not a viable alternative, as it skips, for example, typo fixes that often don't change the file size.

Proposed Solution

Add a flag --checksum-only or the like. If that flag is present, rather than retrieving file timestamp and size from S3 and comparing it to the local values, retrieve the checksum that was stored during the previous upload using --checksum-algorithm and compare it to the locally computed value.

Other Information

Current behavior:

$ aws s3 sync --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/bar.txt to s3://my-bucket/bar.txt     
upload: testdata/foo.txt to s3://my-bucket/foo.txt      
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64

$ touch testdata/hoge/yq_linux_amd64

$ aws s3 sync --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64

Desired behavior:

$ aws s3 sync --checksum-only --checksum-algorithm=SHA256 testdata s3://my-bucket
upload: testdata/bar.txt to s3://my-bucket/bar.txt     
upload: testdata/foo.txt to s3://my-bucket/foo.txt      
upload: testdata/hoge/yq_linux_amd64 to s3://my-bucket/hoge/yq_linux_amd64

$ touch testdata/hoge/yq_linux_amd64

$ aws s3 sync --checksum-only --checksum-algorithm=SHA256 testdata s3://my-bucket
(no output)

Acknowledgements

  • I may be able to implement this feature request
  • This feature might incur a breaking change

CLI version used

aws-cli/2.21.0

Environment details (OS name and version, etc.)

exe/x86_64.opensuse-tumbleweed.20241107

@tgpfeiffer tgpfeiffer added feature-request A feature should be added or improved. needs-triage This issue or PR still needs to be triaged. labels Nov 14, 2024
@tgpfeiffer tgpfeiffer changed the title sync changed files based on checksum, if present sync changed files to S3 based on checksum, if present Nov 14, 2024
@RyanFitzSimmonsAK RyanFitzSimmonsAK self-assigned this Nov 22, 2024
@RyanFitzSimmonsAK RyanFitzSimmonsAK added p2 This is a standard priority issue s3 needs-review This issue or pull request needs review from a core team member. and removed needs-triage This issue or PR still needs to be triaged. labels Nov 22, 2024
@RyanFitzSimmonsAK RyanFitzSimmonsAK removed the needs-review This issue or pull request needs review from a core team member. label Feb 24, 2025
@RyanFitzSimmonsAK
Copy link
Contributor

Hi @tgpfeiffer, thanks for reaching out and for your patience. This feature has been requested many times, and it's definitely one we see value in. #599 is the tracking issue for s3 sync issues, including checksum comparison. I'll be closing this as a duplicate, and we'll be continuing to track this in the other issue. Thank you for this feature request.

Copy link

This issue is now closed. Comments on closed issues are hard for our team to see.
If you need more assistance, please open a new issue that references this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature-request A feature should be added or improved. p2 This is a standard priority issue s3
Projects
None yet
Development

No branches or pull requests

2 participants