Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/athena): use schema if db_name is "none" #9271

Closed

Conversation

bossenti
Copy link
Contributor

When not passing the database parameter for Athena in Datahub 0.12.0, all tables and views were collected in the container called none.
This PR uses the schema name if there is no database name provided, e.g., when filtering the assets via schema_pattern.
I kept the previous logical condition, because I'm not sure if this can occur as well.

See error report in slack: https://datahubspace.slack.com/archives/C029A3M079U/p1700121726907619

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added the ingestion PR or Issue related to the ingestion of metadata label Nov 19, 2023
@maggiehays maggiehays added the community-contribution PR or Issue raised by member(s) of DataHub Community label Nov 29, 2023
@@ -383,7 +383,7 @@ def get_database_container_key(self, db_name: str, schema: str) -> DatabaseKey:
# Based on community feedback, db_name only available if it is explicitly specified in the connection string.
# If it is not available then we should use schema as db_name

if not db_name:
if not db_name or db_name == "none":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a pretty safe change, but I'm a bit confused about when it would be none in lowercase

Is this from str(None).lower() being "none", or something else?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

db_name is "none" when the database parameter in the recipe is not set.
To be honest, I haven't checked where this value comes from and why the original if clause does not work.
But I think your assumption sounds reasonable.

This PR is intended as a quick fix, but I'm open to fixing the underlying cause instead.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should change this in sql_common to return empty string if the database name is None
Something like this:

    def get_db_name(self, inspector: Inspector) -> str:
        engine = inspector.engine

        if engine and hasattr(engine, "url") and hasattr(engine.url, "database"):
            if engine.url.database is None:
                return ""
            return str(engine.url.database).strip('"').lower()
        else:
            raise Exception("Unable to get database name from Sqlalchemy inspector")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the file where I think the above should be changed ->

def get_db_name(self, inspector: Inspector) -> str:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I'll adapt accordingly

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@hsheth2
Copy link
Collaborator

hsheth2 commented Dec 7, 2023

@bossenti It looks like the change set is empty now? I guess @treff7es integrated this into this PR #9354?

@bossenti
Copy link
Contributor Author

bossenti commented Dec 7, 2023

indeed 😅
Then let's just close this PR

@bossenti bossenti closed this Dec 7, 2023
@bossenti bossenti deleted the fix-athena-database-is-none branch December 7, 2023 06:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants