Skip to content

Commit

Permalink
updated copy
Browse files Browse the repository at this point in the history
  • Loading branch information
asikowitz committed Oct 4, 2023
1 parent d224996 commit 3cbce63
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 12 deletions.
12 changes: 5 additions & 7 deletions docs/how/updating-datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,19 +3,17 @@
This file documents any backwards-incompatible changes in DataHub and assists people when migrating to a new version.

## Next
- #8943 - There is a new config param, `include_metastore`, that provides the option of not ingesting
the Unity Catalog metastore associated with your Databricks workspace. We recommend setting this to `false`.
However, if you have previously ingested from unity catalog, setting this to `false` is a breaking change; see that section for details.

### Breaking Changes

- #8810 - Removed support for SQLAlchemy 1.3.x. Only SQLAlchemy 1.4.x is supported now.
- #8853 - The Airflow plugin no longer supports Airflow 2.0.x or Python 3.7. See the docs for more details.
- #8853 - Introduced the Airflow plugin v2. If you're using Airflow 2.3+, the v2 plugin will be enabled by default, and so you'll need to switch your requirements to include `pip install 'acryl-datahub-airflow-plugin[plugin-v2]'`. To continue using the v1 plugin, set the `DATAHUB_AIRFLOW_PLUGIN_USE_V1_PLUGIN` environment variable to `true`.
- #8943 All Unity Catalog urns are changed if a new config param, `include_metastore`, is set to `false`.
This is set to `true` by default at the moment, but this default will be changed in the future.
To handle the change in urns, we recommend soft deleting all databricks data via the DataHub CLI: `datahub delete --platform databricks --soft`,
and then re-ingesting from unity catalog with `include_metastore: false`.
- #8943 The Unity Catalog ingestion source has a new option `include_metastore`, which will cause all urns to be changed when disabled.
This is currently enabled by default to preserve compatibility, but will be disabled by default and then removed in the future.
If stateful ingestion is enabled, simply setting `include_metastore: false` will perform all required cleanup.
Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
`datahub delete --platform databricks --soft` and then reingesting with `include_metastore: false`.

### Potential Downtime

Expand Down
13 changes: 8 additions & 5 deletions metadata-ingestion/src/datahub/ingestion/source/unity/config.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@
OperationConfig,
is_profiling_enabled,
)
from datahub.utilities.global_warning_util import add_global_warning

logger = logging.getLogger(__name__)

Expand Down Expand Up @@ -106,9 +107,10 @@ class UnityCatalogSourceConfig(
"Whether to ingest the workspace's metastore as a container and include it in all urns."
" Changing this will affect the urns of all entities in the workspace."
" This will be disabled by default in the future,"
" so it is recommended to set this to False for new ingestions."
" If you have an existing unity catalog ingestion, we recommend deleting existing data"
" via the cli: `datahub delete --platform databricks` and re-ingesting."
" so it is recommended to set this to `False` for new ingestions."
" If you have an existing unity catalog ingestion, you'll want to avoid duplicates by soft deleting existing data."
" If stateful ingestion is enabled, running with `include_metastore: false` should be sufficient."
" Otherwise, we recommend deleting via the cli: `datahub delete --platform databricks` and re-ingesting with `include_metastore: false`."
),
)

Expand Down Expand Up @@ -211,10 +213,11 @@ def workspace_url_should_start_with_http_scheme(cls, workspace_url: str) -> str:
def include_metastore_warning(cls, v: bool) -> bool:
if v:
msg = (
"include_metastore is enabled."
"`include_metastore` is enabled."
" This is not recommended and will be disabled by default in the future, which is a breaking change."
" All databricks urns will change if you re-ingest with this disabled."
" We recommend soft deleting all databricks data and re-ingesting with include_metastore set to False."
" We recommend soft deleting all databricks data and re-ingesting with `include_metastore` set to `False`."
)
logger.warning(msg)
add_global_warning(msg)
return v

0 comments on commit 3cbce63

Please sign in to comment.