-
Notifications
You must be signed in to change notification settings - Fork 82
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for table sharing when a catalog account is being used #904
Labels
effort: medium
priority: high
status: in-progress
This issue has been picked and is being implemented
Milestone
Comments
Discussion happening directly in PR #905 |
dlpzx
added
status: in-progress
This issue has been picked and is being implemented
priority: high
effort: medium
labels
Jan 3, 2024
TejasRGitHub
added a commit
to TejasRGitHub/aws-dataall
that referenced
this issue
Jan 30, 2024
TejasRGitHub
added a commit
to TejasRGitHub/aws-dataall
that referenced
this issue
Jan 30, 2024
TejasRGitHub
pushed a commit
to TejasRGitHub/aws-dataall
that referenced
this issue
Feb 15, 2024
# Conflicts: # backend/dataall/modules/dataset_sharing/aws/glue_client.py # backend/dataall/modules/dataset_sharing/services/data_sharing_service.py # backend/dataall/modules/dataset_sharing/services/share_managers/lf_share_manager.py # backend/dataall/modules/dataset_sharing/services/share_processors/lf_process_cross_account_share.py # tests/modules/datasets/tasks/test_lf_share_manager.py
TejasRGitHub
pushed a commit
to TejasRGitHub/aws-dataall
that referenced
this issue
Feb 21, 2024
# Conflicts: # backend/dataall/modules/dataset_sharing/services/share_processors/lakeformation_process_share.py
noah-paige
pushed a commit
that referenced
this issue
Feb 23, 2024
### Feature or Bugfix - Feature ### Detail PR containing all the code raised in PR - #905 + Unit Tests + Addressing comments raised on that PR. Copy pasting details from PR - Detect if the source database is a resource link If it is a resource link, check that the catalog account has been onboarded to data.all Check for the presence of owner_account_id tag on the database The tag needs to exist and the value has to match the account id of the share approver Credits - @blitzmohit ## Testing Running Unit tests - ✅ Testing on AWS Deployed data.all instance with the Original PR - ✅ Sanity testing after addressing comments - **[EDIT]** ✅ ( Testing done ) ### Relates - #904 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? No - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? No - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? No - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? - Are you introducing any new policies/roles/users? Yes - Have you used the least-privilege principle? How? Yes By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: trajopadhye <[email protected]>
Closing this issue - as completed in #1021 |
noah-paige
pushed a commit
that referenced
this issue
Mar 4, 2024
### Feature or Bugfix - Bugfix ### Detail When using worksheet with a share made with a catalog account ( by using steps as described here in this PR - #1021 ) , the worksheet drop down list doesn't display the correct DB name. This is due to the fact that DB name is picked from the producer account ( where the S3 bucket is present and where the actualDB is not present ) which has the resource linked DB. Thus, the autogenerated querying doesn't work . Please refer to the screenshot <img width="1482" alt="image" src="https://github.com/data-dot-all/dataall/assets/71188245/fbc28286-0ca7-47de-a6ae-3020b1188dcb"> Also, on the share view, the db name mentioned on the query ( in the "Data Consumption details" ) is the resource linked DB name and not the correct DB name. ### Relates - #904 ### Security Please answer the questions below briefly where applicable, or write `N/A`. Based on [OWASP 10](https://owasp.org/Top10/en/). - Does this PR introduce or modify any input fields or queries - this includes fetching data from storage outside the application (e.g. a database, an S3 bucket)? No - Is the input sanitized? - What precautions are you taking before deserializing the data you consume? - Is injection prevented by parametrizing queries? - Have you ensured no `eval` or similar functions are used? - Does this PR introduce any functionality or component that requires authorization? No - How have you ensured it respects the existing AuthN/AuthZ mechanisms? - Are you logging failed auth attempts? - Are you using or adding any cryptographic features? No - Do you use a standard proven implementations? - Are the used keys controlled by the customer? Where are they stored? No - Are you introducing any new policies/roles/users? - Have you used the least-privilege principle? How? By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. --------- Co-authored-by: trajopadhye <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
effort: medium
priority: high
status: in-progress
This issue has been picked and is being implemented
Is your feature request related to a problem? Please describe.
In certain data mesh architectures such as the ones described in https://aws.amazon.com/blogs/big-data/design-a-data-mesh-architecture-using-aws-lake-formation-and-aws-glue/ and https://aws.amazon.com/blogs/big-data/how-jpmorgan-chase-built-a-data-mesh-architecture-to-drive-significant-value-to-enhance-their-enterprise-data-platform/
a catalog account owns the Glue Database & Tables instead of the producer.
Currently data.all does not account for or support sharing of tables using a catalog account.
If a dataset is imported using a database which was shared to them from a catalog account i.e. a resource link, the import works fine. However if any attempt to share access to any of the tables in such a dataset outside the same producer account is made data.all would fail because LakeFormation does not allow resharing of Databases/tables
Describe the solution you'd like
Proposed solution is as follows:
Additional context
In terms of support, the catalog could be an additional high level object in data.all that could power additional use cases
The text was updated successfully, but these errors were encountered: