feat(ingest): add output schema inference for sql parser #8989

hsheth2 · 2023-10-11T17:50:06Z

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

asikowitz

I'm not following how some of these methods get called. Just want to make sure there's no performance concerns here, as I know sql parsing can be kinda slow as is

asikowitz · 2023-10-11T18:07:56Z

metadata-ingestion/src/datahub/utilities/sqlglot_lineage.py

+    arbitrary_types_allowed=True,
+    json_encoders={
+        SchemaFieldDataTypeClass: lambda v: v.to_obj(),
+    },


I'm gonna need a mini pydantic tutorial at some point

I'm honestly not too happy with this setup, but it's fine for now

asikowitz · 2023-10-12T00:00:28Z

metadata-ingestion/src/datahub/utilities/sqlglot_lineage.py

+    # Try to figure out the types of the output columns.
+    try:
+        statement = sqlglot.optimizer.annotate_types.annotate_types(
+            statement, schema=sqlglot_db_schema
+        )
+    except sqlglot.errors.OptimizeError as e:
+        # This is not a fatal error, so we can continue.
+        logger.debug("sqlglot failed to annotate types: %s", e)


Could this be slow? I think it'd be nice to only do this if a config option is specified

this step should be pretty fast

asikowitz · 2023-10-12T00:18:31Z

metadata-ingestion/tests/integration/powerbi/test_m_parser.py

-        ),
+    # TODO: None of these columns have upstreams?
+    # That doesn't seem right - we probably need to add fake schemas for the two tables above.
+    cols = [


hsheth2 · 2023-10-12T05:31:15Z

Merging through flaky smoke tests.

hsheth2 added 3 commits October 11, 2023 10:30

sql parser output column type inference

623b512

add native data type in sql parser

e2e79d3

add cast test

dce2087

hsheth2 requested a review from asikowitz October 11, 2023 17:50

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 11, 2023

fx lint

8cde866

vercel bot deployed to Preview October 11, 2023 19:19 View deployment

fix tests

87732ba

vercel bot deployed to Preview October 11, 2023 22:39 View deployment

hsheth2 mentioned this pull request Oct 11, 2023

feat(ingest/dbt): dbt column-level lineage #8991

Merged

7 tasks

fix powerbi test

2ff1ede

vercel bot deployed to Preview October 11, 2023 23:49 View deployment

asikowitz approved these changes Oct 12, 2023

View reviewed changes

fix lint

1b8e0ad

vercel bot deployed to Preview October 12, 2023 01:25 View deployment

hsheth2 merged commit 84bba4d into datahub-project:master Oct 12, 2023
52 of 55 checks passed

hsheth2 deleted the sqlparser-schemas branch October 12, 2023 05:31

maggiehays added the hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ label Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest): add output schema inference for sql parser #8989

feat(ingest): add output schema inference for sql parser #8989

hsheth2 commented Oct 11, 2023

asikowitz left a comment

asikowitz Oct 11, 2023

hsheth2 Oct 12, 2023

asikowitz Oct 12, 2023

hsheth2 Oct 12, 2023

asikowitz Oct 12, 2023

hsheth2 commented Oct 12, 2023

feat(ingest): add output schema inference for sql parser #8989

feat(ingest): add output schema inference for sql parser #8989

Conversation

hsheth2 commented Oct 11, 2023

Checklist

asikowitz left a comment

Choose a reason for hiding this comment

asikowitz Oct 11, 2023

Choose a reason for hiding this comment

hsheth2 Oct 12, 2023

Choose a reason for hiding this comment

asikowitz Oct 12, 2023

Choose a reason for hiding this comment

hsheth2 Oct 12, 2023

Choose a reason for hiding this comment

asikowitz Oct 12, 2023

Choose a reason for hiding this comment

hsheth2 commented Oct 12, 2023