Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(ingestion/powerbi): support multiple tables as upstream in native SQL parsing #8592

Conversation

siddiquebagwan-gslab
Copy link
Contributor

No description provided.

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Aug 8, 2023

parsed_result: Optional["SqlParsingResult"] = None
try:
schema_resolver = (
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some of this logic looks like it was copy-pasted from elsewhere

can we instead extract it into a common helper method?

Copy link
Contributor Author

@siddiquebagwan-gslab siddiquebagwan-gslab Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a function create_lineage_sql_parsed_result in sqlglot_lineage.py

@@ -352,7 +460,11 @@ def resolve_to_data_platform_table_list(self) -> List[DataPlatformTable]:
continue

table_full_name_creator: AbstractDataPlatformTableCreator = (
supported_resolver.get_table_full_name_creator()()
supported_resolver.get_table_full_name_creator()(
ctx=ctx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

eventually, will it be possible to get rid of this AbstractDataPlatformTableCreator / move some of that logic into the sqlglot_lineage parser itself?

Overall I found it pretty tricky to follow this code and understand exactly what it was doing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AbstractDataPlatformTableCreator is an abstraction over M-Query parsing and different classes are extending it to support special construct of M-Query, like RedShift, Snowflake, Postgres, MSSQL and some of them has native query inside M-Query code. In current code SNOWFLAKE and AMAZON_REDSHIFT supports M-Query way to create transformation as well as custom sql to create transformation inside M-Query. Both are using AbstractDataPlatformTableCreator.parse_custom_sql to generate lineage if during M-Query parsing they found custom query inside M-Query

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments to this file about how it is structured at a high level and why it was done that way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -825,3 +825,43 @@ def sqlglot_lineage(
table_error=e,
),
)


def create_lineage_sql_parsed_result(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please also refactor tableau to use this method

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -352,7 +460,11 @@ def resolve_to_data_platform_table_list(self) -> List[DataPlatformTable]:
continue

table_full_name_creator: AbstractDataPlatformTableCreator = (
supported_resolver.get_table_full_name_creator()()
supported_resolver.get_table_full_name_creator()(
ctx=ctx,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add some comments to this file about how it is structured at a high level and why it was done that way

@hsheth2 hsheth2 added the merge-pending-ci A PR that has passed review and should be merged once CI is green. label Aug 22, 2023
@anshbansal
Copy link
Collaborator

Failure not related to ingestion code so merging

@anshbansal anshbansal merged commit 8ee58af into datahub-project:master Aug 23, 2023
49 of 50 checks passed
asikowitz pushed a commit to asikowitz/datahub that referenced this pull request Aug 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ingestion PR or Issue related to the ingestion of metadata merge-pending-ci A PR that has passed review and should be merged once CI is green.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants