Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correctly add compression extensions to the generated S3 sink keys #3196

Merged

Conversation

dlvenable
Copy link
Member

@dlvenable dlvenable commented Aug 18, 2023

Description

Supports adding the compression extension for keys generated for the S3 sink. This also avoids adding the extension when the codec supports internal compression.

Issues Resolved

Resolves #3158

Check List

  • New functionality includes testing.
  • New functionality has a documentation issue. Please link to it in this PR.
    • New functionality has javadoc added
  • Commits are signed with a real name per the DCO

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

…f compression is internal, does not utilize. Resolves opensearch-project#3158.

Signed-off-by: David Venable <[email protected]>
@dlvenable dlvenable changed the title Correctly add compression extensions to the generated S3 sink keys. I… Correctly add compression extensions to the generated S3 sink keys Aug 18, 2023
return new StandardExtensionProvider(codecExtension);
}

String extension = compressionOption.getExtension()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will the configured extension be ignored if there is a internal compression?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it will be ignored.

With internal compression the file itself is not compressed. Different parts are independently compressed. A user should not try to run it through GZip/Snappy decompression.

With Parquet specifically, the compression is stored as metadata inside the file. And compressed files retain the .parquet extension.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test verifies that exact behavior: getExtension_returns_extension_of_codec_when_compression_internal.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarity

@dlvenable dlvenable merged commit 1368a21 into opensearch-project:main Aug 21, 2023
24 checks passed
@dlvenable dlvenable deleted the 3158-compression-extension branch October 6, 2023 18:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] S3 Sink Avro Output issues
3 participants