-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/athena): use schema if db_name is "none" #9271
fix(ingest/athena): use schema if db_name is "none" #9271
Conversation
@@ -383,7 +383,7 @@ def get_database_container_key(self, db_name: str, schema: str) -> DatabaseKey: | |||
# Based on community feedback, db_name only available if it is explicitly specified in the connection string. | |||
# If it is not available then we should use schema as db_name | |||
|
|||
if not db_name: | |||
if not db_name or db_name == "none": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a pretty safe change, but I'm a bit confused about when it would be none
in lowercase
Is this from str(None).lower()
being "none"
, or something else?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
db_name
is "none"
when the database
parameter in the recipe is not set.
To be honest, I haven't checked where this value comes from and why the original if clause does not work.
But I think your assumption sounds reasonable.
This PR is intended as a quick fix, but I'm open to fixing the underlying cause instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should change this in sql_common to return empty string if the database name is None
Something like this:
def get_db_name(self, inspector: Inspector) -> str:
engine = inspector.engine
if engine and hasattr(engine, "url") and hasattr(engine.url, "database"):
if engine.url.database is None:
return ""
return str(engine.url.database).strip('"').lower()
else:
raise Exception("Unable to get database name from Sqlalchemy inspector")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the file where I think the above should be changed ->
def get_db_name(self, inspector: Inspector) -> str: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds good, I'll adapt accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
indeed 😅 |
When not passing the
database
parameter for Athena in Datahub0.12.0
, all tables and views were collected in the container callednone
.This PR uses the schema name if there is no database name provided, e.g., when filtering the assets via
schema_pattern
.I kept the previous logical condition, because I'm not sure if this can occur as well.
See error report in slack: https://datahubspace.slack.com/archives/C029A3M079U/p1700121726907619
Checklist