-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(ingest/teradata): Teradata source #8977
Conversation
…ort Teradata properly
@@ -482,6 +482,7 @@ def _column_level_lineage( # noqa: C901 | |||
# Our snowflake source lowercases column identifiers, so we are forced | |||
# to do fuzzy (case-insensitive) resolution instead of exact resolution. | |||
"snowflake", | |||
"teradata", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's add a comment around this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we add an image to the frontend for teradata. I think that's an important part of the experience (lol). I think the lineage query needs to be changed, and I think we should pass ingested schema metadata into the schema resolver
|
||
If you want to run profiling, you need to grant select permission on all the tables you want to profile. | ||
|
||
3. If linege or usage extraction is enabled, please, check if query logging is enabled and it is set to size which |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
3. If linege or usage extraction is enabled, please, check if query logging is enabled and it is set to size which | |
3. If lineage or usage extraction is enabled, please, check if query logging is enabled and it is set to size which |
type: teradata | ||
config: | ||
host_port: "myteradatainstance.teradata.com:1025" | ||
#platform_instance: "myteradatainstance" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just remove this
password: mypassword | ||
#database_pattern: | ||
# allow: | ||
# - "demo_user" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe an example that has less to do with users, like my_database
? Idk, just trying to make these docs as easy to follow
use_schema_resolver: bool = Field( | ||
default=True, | ||
description="Read SchemaMetadata aspects from DataHub to aid in SQL parsing. Turn off only for testing.", | ||
hidden_from_docs=True, | ||
) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's remove this, don't think we need to handle the false
case. Think this lets us remove the local self.urns
too and always use schema_resolver.urns
LINEAGE_QUERY: str = """SELECT ProcID, UserName as "user", StartTime AT TIME ZONE 'GMT' as "timestamp", DefaultDatabase as default_database, QueryText as query | ||
FROM "DBC".DBQLogTbl | ||
where ErrorCode = 0 | ||
and QueryText like 'create table demo_user.test_lineage%' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Artifact of testing?
if self.graph: | ||
if self.config.use_schema_resolver: | ||
self.schema_resolver = ( | ||
self.graph.initialize_schema_resolver_from_datahub( | ||
platform=self.platform, | ||
platform_instance=self.config.platform_instance, | ||
env=self.config.env, | ||
) | ||
) | ||
self.urns = self.schema_resolver.get_urns() | ||
else: | ||
self.schema_resolver = self.graph._make_schema_resolver( | ||
platform=self.platform, | ||
platform_instance=self.config.platform_instance, | ||
env=self.config.env, | ||
) | ||
self.urns = None | ||
else: | ||
self.schema_resolver = SchemaResolver( | ||
platform=self.platform, | ||
platform_instance=self.config.platform_instance, | ||
graph=None, | ||
env=self.config.env, | ||
) | ||
self.urns = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't have to do this sort of logic to start. I think for now we should just always ingest schema metadata, pass that to the schema resolver, and never pass a graph into the schema resolver
Checklist