Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[v2] S3 high level checksums #8933

Open
wants to merge 19 commits into
base: v2
Choose a base branch
from
Open

Conversation

aemous
Copy link
Contributor

@aemous aemous commented Sep 20, 2024

Issue #, if available:

Low-level aws s3api commands support checksums other than MD5 for verifying end-to-end data integrity. This pull request ports over similar functionality to high-level aws s3 commands.

Description of changes:

  • Added --checksum-algorithm flag for file uploads using aws s3 cp, aws s3 sync, and aws s3 mv.
    • Supported algorithms are CRC32, SHA256, SHA1, CRC32C.
  • Added --checksum-mode flag for file downloads using aws s3 cp, aws s3 sync, and aws s3 mv.
  • Added unit and functional tests to assert the correctness of these changes.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@aemous aemous changed the base branch from develop to v2 September 20, 2024 15:31
Copy link
Contributor

@hssyoo hssyoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a solid start.

My main feedback here is that --checksum-algorithm should be supported for copies (ie s3s3). I can definitely see strong use cases for wanting to copy the object checksums between different S3 paths (or even generating different checksums). In order to support this, we're going to need to port this PR for the vended s3transfer package. Let's make this port a separate PR to reduce noise in this one.

We should also prefer f-strings for string formatting in all new code. ie f"var: {my_var}" instead of "var: %s" % myvar.

awscli/customizations/s3/utils.py Outdated Show resolved Hide resolved
}

CHECKSUM_ALGORITHM = {
'name': 'checksum-algorithm', 'choices': ['CRC32', 'SHA256', 'SHA1', 'CRC32C'],
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-blocking: We should consider retrieving the choices from the service model instead of hardcoding the values here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback; could be worth looking into in a future PR.

awscli/customizations/s3/subcommands.py Outdated Show resolved Hide resolved
awscli/customizations/s3/subcommands.py Outdated Show resolved Hide resolved
awscli/customizations/s3/subcommands.py Outdated Show resolved Hide resolved
awscli/customizations/s3/subcommands.py Outdated Show resolved Hide resolved
awscli/customizations/s3/subcommands.py Outdated Show resolved Hide resolved
tests/unit/customizations/s3/test_utils.py Outdated Show resolved Hide resolved
tests/unit/customizations/s3/test_utils.py Outdated Show resolved Hide resolved
tests/functional/s3/test_cp_command.py Outdated Show resolved Hide resolved
@aemous aemous force-pushed the s3-high-level-checksums branch 2 times, most recently from 90dfb1f to fcc9b99 Compare September 24, 2024 16:44
@aemous aemous requested a review from hssyoo September 24, 2024 16:46
@hssyoo hssyoo changed the title S3 high level checksums [v2] S3 high level checksums Sep 27, 2024
awscli/customizations/s3/subcommands.py Show resolved Hide resolved
awscli/customizations/s3/subcommands.py Outdated Show resolved Hide resolved
tests/functional/s3/test_cp_command.py Show resolved Hide resolved
tests/functional/s3/test_sync_command.py Show resolved Hide resolved
tests/functional/s3/test_sync_command.py Show resolved Hide resolved
tests/unit/customizations/s3/test_subcommands.py Outdated Show resolved Hide resolved
tests/unit/customizations/s3/test_utils.py Show resolved Hide resolved
tests/unit/customizations/s3/test_utils.py Show resolved Hide resolved
Copy link
Contributor

@hssyoo hssyoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Playing around with it some more, I'm not sure --checksum-mode ENABLED provides anything useful without the debug flag:

aws s3 cp s3://hssyoo-brawn-test/README.rst ./ --checksum-mode ENABLED

download: s3://hssyoo-brawn-test/README.rst to ./README.rst

Compare this to the low-level get-object:

aws s3api get-object --bucket hssyoo-brawn-test --key README.rst --checksum-mode ENABLED ./README.rst

{
    ...
    "ChecksumCRC32C": "ac1Isw==",
    ...
}

With the current changes, the --checksum-mode parameter in high-level S3 commands:

  • Doesn't do client-side validation of the object
  • Doesn't output the checksum the user could use to validate the object

We should decide what behavior we want to support.

EDIT: Sorry, validation is automatically supported with botocore. Ignore.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants