Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add support for S3 Extra arguments in Seed file upload to s3 #397

Merged
merged 12 commits into from
Sep 11, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -88,6 +88,7 @@ A dbt profile can be configured to run against AWS Athena using the following co
| aws_profile_name | Profile to use from your AWS shared credentials file. | Optional | `my-profile` |
| work_group | Identifier of Athena workgroup | Optional | `my-custom-workgroup` |
| num_retries | Number of times to retry a failing query | Optional | `3` |
| seed_s3_upload_args | Dictionary containing boto3 ExtraArgs when uploading to S3 | Optional | `{"ACL": "bucket-owner-full-control"}` |
| lf_tags_database | Default LF tags for new database if it's created by dbt | Optional | `tag_key: tag_value` |

**Example profiles.yml entry:**
Expand All @@ -105,6 +106,8 @@ athena:
database: awsdatacatalog
aws_profile_name: my-profile
work_group: my-workgroup
seed_s3_upload_args:
ACL: bucket-owner-full-control
```

_Additional information_
Expand Down
2 changes: 2 additions & 0 deletions dbt/adapters/athena/connections.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@ class AthenaCredentials(Credentials):
num_retries: Optional[int] = 5
s3_data_dir: Optional[str] = None
s3_data_naming: Optional[str] = "schema_table_unique"
seed_s3_upload_args: Optional[Dict[str, Any]] = None
# Unfortunately we can not just use dict, must by Dict because we'll get the following error:
# Credentials in profile "athena", target "athena" invalid: Unable to create schema for 'dict'
lf_tags_database: Optional[Dict[str, str]] = None
Expand Down Expand Up @@ -86,6 +87,7 @@ def _connection_keys(self) -> Tuple[str, ...]:
"s3_data_dir",
"s3_data_naming",
"debug_query_state",
"seed_s3_upload_args",
"lf_tags_database",
)

Expand Down
3 changes: 2 additions & 1 deletion dbt/adapters/athena/impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -314,6 +314,7 @@ def upload_seed_to_s3(
s3_data_dir: Optional[str] = None,
s3_data_naming: Optional[str] = None,
external_location: Optional[str] = None,
seed_s3_upload_args: Optional[Dict[str, Any]] = None,
) -> str:
conn = self.connections.get_thread_connection()
client = conn.handle
Expand All @@ -332,7 +333,7 @@ def upload_seed_to_s3(
# This ensures cross-platform support, tempfile.NamedTemporaryFile does not
tmpfile = os.path.join(tempfile.gettempdir(), os.urandom(24).hex())
table.to_csv(tmpfile, quoting=csv.QUOTE_NONNUMERIC)
s3_client.upload_file(tmpfile, bucket, object_name)
s3_client.upload_file(tmpfile, bucket, object_name, ExtraArgs=seed_s3_upload_args)
os.remove(tmpfile)

return str(s3_location)
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -96,6 +96,7 @@
{%- set s3_data_dir = config.get('s3_data_dir', default=target.s3_data_dir) -%}
{%- set s3_data_naming = config.get('s3_data_naming', target.s3_data_naming) -%}
{%- set external_location = config.get('external_location', default=none) -%}
{%- set seed_s3_upload_args = config.get('seed_s3_upload_args', default=target.seed_s3_upload_args) -%}

{%- set tmp_relation = api.Relation.create(
identifier=identifier + "__dbt_tmp",
Expand All @@ -110,6 +111,7 @@
s3_data_dir,
s3_data_naming,
external_location,
seed_s3_upload_args=seed_s3_upload_args
) -%}

-- create target relation
Expand Down
3 changes: 3 additions & 0 deletions dbt/include/athena/profile_template.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,9 @@ prompts:
hint: Specify the database (Data catalog) to build models into (lowercase only)
default: awsdatacatalog

seed_s3_upload_args:
hint: Specify any extra arguments to use in the S3 Upload, e.g. ACL, SSEKMSKeyId

threads:
hint: '1 or more'
type: 'int'
Expand Down
1 change: 1 addition & 0 deletions dev-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
autoflake~=1.7
black~=23.9
boto3-stubs[s3]~=1.28
dbt-tests-adapter~=1.6.2
flake8~=6.1
Flake8-pyproject~=1.2
Expand Down