-
Notifications
You must be signed in to change notification settings - Fork 134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Enhancement] - Enable Unloading Redshift Tables to S3 in either JSON, PARQUET or CSV format #1052
base: main
Are you sure you want to change the base?
Conversation
… json, parquet (for example we cant use ADDQUOTES with csv)
…s-redshift-addition
Thanks so much @NirTatcher! This looks good to me but I'd love it if someone with more Redshift knowledge gave it a look too. Maybe @austinweisgrau or @Jason94? |
Sure thing @shaunagm! |
@austinweisgrau once you or anyone else gets to it I will be glad to get feedback on this so we can start letting people to unload tables into S3 buckets in another format than TXT. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me, let's wait and see if there's a bit more availability to review right now.
statement += f"EXTENSION '{extension}' \n" | ||
if aws_region: | ||
statement += f"REGION {aws_region} \n" | ||
statement += "ALLOWOVERWRITE \n" if allow_overwrite else "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I don't know the underlying tooling at all, but reading this the only thing that concerns me is that the ordering of the string created by the new version is different. Is order relatively flexible here?
format
parameter to the unload from redshift to S3 function. We can use either CSV or JSON or PARQUET (highly recommended for large tables, using PARQUET files are smaller than CSVs, can be read and written much faster compared to CSV), currently we only enable to unload as a TXT file by default.ADDQUOTES
.DELIMITER
,ADDQUOTES
,ESCAPE
,NULL AS
,HEADER
,GZIP (compression)
options.DELIMITER
,HEADER
,ADDQUOTES
,ESCAPE
,NULL AS
options.Hope this will be beneficial to anyone except me and pass all of the live tests. @shaunagm please let me know if there are any changes that need to be made to get this implemented.
Thank you!
More on PARQUET here.
More on UNLOAD here.