feat(ingest/teradata): Teradata source #8977

treff7es · 2023-10-10T09:06:49Z

Checklist

The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
Links to related issues (if applicable)
Tests for the changes have been added/updated (if applicable)
Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

…ort Teradata properly

treff7es · 2023-10-12T08:57:20Z

hsheth2 · 2023-10-12T18:07:06Z

metadata-ingestion/src/datahub/utilities/sqlglot_lineage.py

@@ -482,6 +482,7 @@ def _column_level_lineage(  # noqa: C901
        # Our snowflake source lowercases column identifiers, so we are forced
        # to do fuzzy (case-insensitive) resolution instead of exact resolution.
        "snowflake",
+        "teradata",


let's add a comment around this

asikowitz

Can we add an image to the frontend for teradata. I think that's an important part of the experience (lol). I think the lineage query needs to be changed, and I think we should pass ingested schema metadata into the schema resolver

asikowitz · 2023-10-12T17:34:44Z

metadata-ingestion/docs/sources/teradata/teradata_pre.md

+
+    If you want to run profiling, you need to grant select permission on all the tables you want to profile.
+
+3. If linege or usage extraction is enabled, please, check if query logging is enabled and it is set to size which


Suggested change

3. If linege or usage extraction is enabled, please, check if query logging is enabled and it is set to size which

3. If lineage or usage extraction is enabled, please, check if query logging is enabled and it is set to size which

asikowitz · 2023-10-12T17:40:36Z

metadata-ingestion/docs/sources/teradata/teradata_recipe.yml

+  type: teradata
+  config:
+    host_port: "myteradatainstance.teradata.com:1025"
+    #platform_instance: "myteradatainstance"


I'd just remove this

asikowitz · 2023-10-12T17:41:22Z

metadata-ingestion/docs/sources/teradata/teradata_recipe.yml

+    password: mypassword
+    #database_pattern:
+    #  allow:
+    #    - "demo_user"


Maybe an example that has less to do with users, like my_database? Idk, just trying to make these docs as easy to follow

asikowitz · 2023-10-12T21:58:52Z

metadata-ingestion/src/datahub/ingestion/source/sql/teradata.py

+    use_schema_resolver: bool = Field(
+        default=True,
+        description="Read SchemaMetadata aspects from DataHub to aid in SQL parsing. Turn off only for testing.",
+        hidden_from_docs=True,
+    )
+


Let's remove this, don't think we need to handle the false case. Think this lets us remove the local self.urns too and always use schema_resolver.urns

asikowitz · 2023-10-12T21:59:52Z

metadata-ingestion/src/datahub/ingestion/source/sql/teradata.py

+    LINEAGE_QUERY: str = """SELECT ProcID, UserName as "user", StartTime AT TIME ZONE 'GMT' as "timestamp", DefaultDatabase as default_database, QueryText as query
+     FROM "DBC".DBQLogTbl
+     where ErrorCode = 0
+     and QueryText like 'create table demo_user.test_lineage%'


Artifact of testing?

asikowitz · 2023-10-12T22:29:21Z

metadata-ingestion/src/datahub/ingestion/source/sql/teradata.py

+        if self.graph:
+            if self.config.use_schema_resolver:
+                self.schema_resolver = (
+                    self.graph.initialize_schema_resolver_from_datahub(
+                        platform=self.platform,
+                        platform_instance=self.config.platform_instance,
+                        env=self.config.env,
+                    )
+                )
+                self.urns = self.schema_resolver.get_urns()
+            else:
+                self.schema_resolver = self.graph._make_schema_resolver(
+                    platform=self.platform,
+                    platform_instance=self.config.platform_instance,
+                    env=self.config.env,
+                )
+                self.urns = None
+        else:
+            self.schema_resolver = SchemaResolver(
+                platform=self.platform,
+                platform_instance=self.config.platform_instance,
+                graph=None,
+                env=self.config.env,
+            )
+            self.urns = None


We shouldn't have to do this sort of logic to start. I think for now we should just always ingest schema metadata, pass that to the schema resolver, and never pass a graph into the schema resolver

treff7es added 2 commits October 10, 2023 11:03

Adding initial support for teradata

96331e2

fixing linter issues

de50545

github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Oct 10, 2023

vercel bot deployed to Preview October 10, 2023 09:32 View deployment

treff7es added 4 commits October 10, 2023 12:37

Removing unneeded configs

2375f2d

Adding teradata documentation

8003ee0

Adding additional capabilities

7980121

Removing unneeded config

246b28d

vercel bot deployed to Preview October 10, 2023 12:15 View deployment

treff7es added 2 commits October 11, 2023 16:35

Adding lineage/usage/operation aspect generation

ff4a65e

Adding config example to enable usage/lineage generation

b86c7a0

vercel bot deployed to Preview October 11, 2023 15:14 View deployment

Adding test for Teradata column lineage and modify sql parser to supp…

f26349a

…ort Teradata properly

vercel bot deployed to Preview October 12, 2023 07:54 View deployment

treff7es added 2 commits October 12, 2023 11:00

Merge branch 'master' into teradata_source

1ebbdca

Adding column types as well to test

6abf101

vercel bot deployed to Preview October 12, 2023 09:50 View deployment

hsheth2 reviewed Oct 12, 2023

View reviewed changes

Adding comment about column case insetivity in teradata

6245a7b

vercel bot deployed to Preview October 12, 2023 20:26 View deployment

hsheth2 approved these changes Oct 12, 2023

View reviewed changes

hsheth2 merged commit a8f0080 into datahub-project:master Oct 12, 2023
52 of 54 checks passed

asikowitz requested changes Oct 12, 2023

View reviewed changes

maggiehays added the hacktoberfest-accepted Acceptance for hacktoberfest https://hacktoberfest.com/participation/ label Oct 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(ingest/teradata): Teradata source #8977

feat(ingest/teradata): Teradata source #8977

treff7es commented Oct 10, 2023

treff7es commented Oct 12, 2023

hsheth2 Oct 12, 2023

asikowitz left a comment

asikowitz Oct 12, 2023

asikowitz Oct 12, 2023

asikowitz Oct 12, 2023

asikowitz Oct 12, 2023

asikowitz Oct 12, 2023

asikowitz Oct 12, 2023


		If you want to run profiling, you need to grant select permission on all the tables you want to profile.

		3. If linege or usage extraction is enabled, please, check if query logging is enabled and it is set to size which

	3. If linege or usage extraction is enabled, please, check if query logging is enabled and it is set to size which
	3. If lineage or usage extraction is enabled, please, check if query logging is enabled and it is set to size which

feat(ingest/teradata): Teradata source #8977

feat(ingest/teradata): Teradata source #8977

Conversation

treff7es commented Oct 10, 2023

Checklist

treff7es commented Oct 12, 2023

hsheth2 Oct 12, 2023

Choose a reason for hiding this comment

asikowitz left a comment

Choose a reason for hiding this comment

asikowitz Oct 12, 2023

Choose a reason for hiding this comment

asikowitz Oct 12, 2023

Choose a reason for hiding this comment

asikowitz Oct 12, 2023

Choose a reason for hiding this comment

asikowitz Oct 12, 2023

Choose a reason for hiding this comment

asikowitz Oct 12, 2023

Choose a reason for hiding this comment

asikowitz Oct 12, 2023

Choose a reason for hiding this comment