-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(ingest/unity): Remove metastore from ingestion and urns; standardize platform instance; add notebook filter #8943
fix(ingest/unity): Remove metastore from ingestion and urns; standardize platform instance; add notebook filter #8943
Conversation
…ize platform instance; add notebook filter
6116c18
to
d224996
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some notes about messaging, but otherwise lgtm
docs/how/updating-datahub.md
Outdated
|
||
### Breaking Changes | ||
|
||
- #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now. | ||
- #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details. | ||
- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include `pip install 'acryl-datahub-airflow-plugin[plugin-v2]'`. To continue using the v1 plugin, set the `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN` environment variable to `true`. | ||
- #8943 All Unity Catalog urns are changed if a new config param, `include_metastore`, is set to `false`. | ||
This is set to `true` by default at the moment, but this default will be changed in the future. | ||
To handle the change in urns, we recommend soft deleting all databricks data via the DataHub CLI: `datahub delete --platform databricks --soft`, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add a note that if you have stateful ingest enabled, you don't need to do anything
" All databricks urns will change if you re-ingest with this disabled." | ||
" We recommend soft deleting all databricks data and re-ingesting with include_metastore set to False." | ||
) | ||
logger.warning(msg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's also call add_global_warning
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's that do?
@@ -247,10 +251,14 @@ def build_service_principal_map(self) -> None: | |||
|
|||
def process_notebooks(self) -> Iterable[MetadataWorkUnit]: | |||
for notebook in self.unity_catalog_api_proxy.workspace_notebooks(): | |||
if not self.config.notebook_pattern.allowed(notebook.path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was this just missing before?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added the pattern as part of this PR, forgot it last time
docs/how/updating-datahub.md
Outdated
|
||
### Breaking Changes | ||
|
||
- #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now. | ||
- #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details. | ||
- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include `pip install 'acryl-datahub-airflow-plugin[plugin-v2]'`. To continue using the v1 plugin, set the `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN` environment variable to `true`. | ||
- #8943 All Unity Catalog urns are changed if a new config param, `include_metastore`, is set to `false`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The phrasing is a bit confusing, and splitting this across the two sections isn't ideal
maybe let's do "The unity catalog ingestion source has a new option include_metastore
, which will cause all urns to be changed when disabled. This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future. If stateful ingestion is enabled, simply setting include_metastore: true
will perform all required cleanup. Otherwise, we recommend soft deleting all databricks data via the DataHub CLI: datahub delete --platform databricks --soft
and then reingest with include_metastore: false
.
It's not pretty but it gets the job done...
Checklist