diff --git a/docs/how/updating-datahub.md b/docs/how/updating-datahub.md index 512811804024a..5d0ad5eaf8f7e 100644 --- a/docs/how/updating-datahub.md +++ b/docs/how/updating-datahub.md @@ -3,19 +3,17 @@ This file documents any backwards-incompatible changes in DataHub and assists people when migrating to a new version. ## Next -- #8943 - There is a new config param, `include_metastore`, that provides the option of not ingesting -the Unity Catalog metastore associated with your Databricks workspace. We recommend setting this to `false`. -However, if you have previously ingested from unity catalog, setting this to `false` is a breaking change; see that section for details. ### Breaking Changes - #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now. - #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details. - #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include `pip install 'acryl-datahub-airflow-plugin[plugin-v2]'`. To continue using the v1 plugin, set the `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN` environment variable to `true`. -- #8943 All Unity Catalog urns are changed if a new config param, `include_metastore`, is set to `false`. -This is set to `true` by default at the moment, but this default will be changed in the future. -To handle the change in urns, we recommend soft deleting all databricks data via the DataHub CLI: `datahub delete --platform databricks --soft`, -and then re-ingesting from unity catalog with `include_metastore: false`. +- #8943 The Unity Catalog ingestion source has a new option `include_metastore`, which will cause all urns to be changed when disabled. +This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future. +If stateful ingestion is enabled, simply setting `include_metastore: false` will perform all required cleanup. +Otherwise, we recommend soft deleting all databricks data via the DataHub CLI: +`datahub delete --platform databricks --soft` and then reingesting with `include_metastore: false`. ### Potential Downtime diff --git a/metadata-ingestion/src/datahub/ingestion/source/unity/config.py b/metadata-ingestion/src/datahub/ingestion/source/unity/config.py index 8491565cceb37..871bdac8f7f18 100644 --- a/metadata-ingestion/src/datahub/ingestion/source/unity/config.py +++ b/metadata-ingestion/src/datahub/ingestion/source/unity/config.py @@ -22,6 +22,7 @@ OperationConfig, is_profiling_enabled, ) +from datahub.utilities.global_warning_util import add_global_warning logger = logging.getLogger(__name__) @@ -106,9 +107,10 @@ class UnityCatalogSourceConfig( "Whether to ingest the workspace's metastore as a container and include it in all urns." " Changing this will affect the urns of all entities in the workspace." " This will be disabled by default in the future," - " so it is recommended to set this to False for new ingestions." - " If you have an existing unity catalog ingestion, we recommend deleting existing data" - " via the cli: `datahub delete --platform databricks` and re-ingesting." + " so it is recommended to set this to `False` for new ingestions." + " If you have an existing unity catalog ingestion, you'll want to avoid duplicates by soft deleting existing data." + " If stateful ingestion is enabled, running with `include_metastore: false` should be sufficient." + " Otherwise, we recommend deleting via the cli: `datahub delete --platform databricks` and re-ingesting with `include_metastore: false`." ), ) @@ -211,10 +213,11 @@ def workspace_url_should_start_with_http_scheme(cls, workspace_url: str) -> str: def include_metastore_warning(cls, v: bool) -> bool: if v: msg = ( - "include_metastore is enabled." + "`include_metastore` is enabled." " This is not recommended and will be disabled by default in the future, which is a breaking change." " All databricks urns will change if you re-ingest with this disabled." - " We recommend soft deleting all databricks data and re-ingesting with include_metastore set to False." + " We recommend soft deleting all databricks data and re-ingesting with `include_metastore` set to `False`." ) logger.warning(msg) + add_global_warning(msg) return v