Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(ingest/dremio): update dremio sql query to retrieve queried datasets in sql jobs #11801

Merged
merged 2 commits into from
Nov 7, 2024

Conversation

acrylJonny
Copy link
Contributor

Checklist

  • The PR conforms to DataHub's Contributing Guideline (particularly Commit Message Format)
  • Links to related issues (if applicable)
  • Tests for the changes have been added/updated (if applicable)
  • Docs related to the changes have been added/updated (if applicable). If a new feature has been added a Usage Guide has been added for the same.
  • For any breaking change/potential downtime/deprecation/big changes an entry has been made in Updating DataHub

@github-actions github-actions bot added ingestion PR or Issue related to the ingestion of metadata community-contribution PR or Issue raised by member(s) of DataHub Community labels Nov 6, 2024
@@ -251,7 +251,7 @@ class DremioSQLQueries:
SELECT
*
FROM
SYS.PROJECT.HISTORY.JOBS
sys.project.history.jobs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this case change necessary ? Is it also required for other sql queries from this file ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dremio-brock has been doing some testing and found that this query specifically was giving a null pointer exception with Dremio unless this was changed to lowercase. This is a workaround to a Dremio Cloud bug.

@mayurinehate mayurinehate merged commit 0d57fbd into datahub-project:master Nov 7, 2024
72 of 73 checks passed
@@ -242,7 +242,7 @@ class DremioSQLQueries:
SYS.JOBS_RECENT
WHERE
STATUS = 'COMPLETED'
AND LENGTH(queried_datasets)>0
AND ARRAY_SIZE(queried_datasets)>0

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Software, the datatype is a varchar, LENGTH should be used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dremio Software docs suggest queried_datasets is an array of varchar [varchar]-> https://docs.dremio.com/current/reference/sql/system-tables/jobs_recent/#fields

Is the type in the doc incorrect ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For Dremio SW this is a varchar - the docs are incorrect though as it states an array of strings, i.e. [varchar] vs varchar. In SW this is coming through as a varchar (although it is a string that is an array representation), whereas on Cloud this correctly comes through as an array of strings. there is a PR open to fix this here

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for confirming. Can you please add this inconsistency of queried_datasets type as NOTE comment on fix PR ?

@acrylJonny acrylJonny deleted the dremio-sql-fix branch November 7, 2024 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
community-contribution PR or Issue raised by member(s) of DataHub Community ingestion PR or Issue related to the ingestion of metadata
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants