Skip to content

Commit

Permalink
chore(cli): drop support for python 3.7 (#9731)
Browse files Browse the repository at this point in the history
  • Loading branch information
hsheth2 authored Jan 29, 2024
1 parent f3cc4e0 commit 1498c36
Show file tree
Hide file tree
Showing 22 changed files with 805 additions and 863 deletions.
4 changes: 2 additions & 2 deletions .github/workflows/metadata-ingestion.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ jobs:
# DATAHUB_LOOKML_GIT_TEST_SSH_KEY: ${{ secrets.DATAHUB_LOOKML_GIT_TEST_SSH_KEY }}
strategy:
matrix:
python-version: ["3.7", "3.10"]
python-version: ["3.8", "3.10"]
command:
[
"testQuick",
Expand All @@ -40,7 +40,7 @@ jobs:
"testIntegrationBatch2",
]
include:
- python-version: "3.7"
- python-version: "3.8"
- python-version: "3.10"
fail-fast: false
steps:
Expand Down
2 changes: 1 addition & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ source venv/bin/activate # activate the environment
Once inside the virtual environment, install `datahub` using the following commands

```shell
# Requires Python 3.7+
# Requires Python 3.8+
python3 -m pip install --upgrade pip wheel setuptools
python3 -m pip install --upgrade acryl-datahub
# validate that the install was successful
Expand Down
11 changes: 8 additions & 3 deletions docs/how/updating-datahub.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,25 +10,30 @@ This file documents any backwards-incompatible changes in DataHub and assists pe
- Neo4j 5.x, may require migration from 4.x
- Build requires JDK17 (Runtime Java 11)
- Build requires Docker Compose > 2.20
- #9731 - The `acryl-datahub` CLI now requires Python 3.8+
- #9601 - The Unity Catalog(UC) ingestion source config `include_metastore` is now disabled by default. This change will affect the urns of all entities in the workspace.<br/>
Entity Hierarchy with `include_metastore: true` (Old)
Entity Hierarchy with `include_metastore: true` (Old)

```
- UC Metastore
- Catalog
- Schema
- Table
```

Entity Hierarchy with `include_metastore: false` (New)
Entity Hierarchy with `include_metastore: false` (New)

```
- Catalog
- Schema
- Table
```

We recommend using `platform_instance` for differentiating across metastores.

If stateful ingestion is enabled, running ingestion with latest cli version will perform all required cleanup. Otherwise, we recommend soft deleting all databricks data via the DataHub CLI:
`datahub delete --platform databricks --soft` and then reingesting with latest cli version.
`datahub delete --platform databricks --soft` and then reingesting with latest cli version.

- #9601 - The Unity Catalog(UC) ingestion source config `include_hive_metastore` is now enabled by default. This requires config `warehouse_id` to be set. You can disable `include_hive_metastore` by setting it to `False` to avoid ingesting legacy hive metastore catalog in Databricks.

### Potential Downtime
Expand Down
2 changes: 1 addition & 1 deletion docs/quickstart.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ If you're interested in a managed version, [Acryl Data](https://www.acryldata.io
| Linux | [Docker for Linux](https://docs.docker.com/desktop/install/linux-install/) and [Docker Compose](https://docs.docker.com/compose/install/linux/) |

- **Launch the Docker engine** from command line or the desktop app.
- Ensure you have **Python 3.7+** installed & configured. (Check using `python3 --version`).
- Ensure you have **Python 3.8+** installed & configured. (Check using `python3 --version`).

:::note Docker Resource Allocation

Expand Down
14 changes: 3 additions & 11 deletions metadata-ingestion-modules/airflow-plugin/setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,16 +18,10 @@ def get_long_description():
_self_pin = f"=={_version}" if not _version.endswith("dev0") else ""


rest_common = {"requests", "requests_file"}

base_requirements = {
# Compatibility.
"dataclasses>=0.6; python_version < '3.7'",
"mypy_extensions>=0.4.3",
f"acryl-datahub[datahub-rest]{_self_pin}",
# Actual dependencies.
"pydantic>=1.5.1",
"apache-airflow >= 2.0.2",
*rest_common,
}

plugins: Dict[str, Set[str]] = {
Expand All @@ -42,9 +36,8 @@ def get_long_description():
},
"plugin-v1": set(),
"plugin-v2": {
# The v2 plugin requires Python 3.8+.
f"acryl-datahub[sql-parser]{_self_pin}",
"openlineage-airflow==1.2.0; python_version >= '3.8'",
"openlineage-airflow==1.2.0",
},
}

Expand Down Expand Up @@ -144,7 +137,6 @@ def get_long_description():
"Programming Language :: Python",
"Programming Language :: Python :: 3",
"Programming Language :: Python :: 3 :: Only",
"Programming Language :: Python :: 3.7",
"Programming Language :: Python :: 3.8",
"Programming Language :: Python :: 3.9",
"Programming Language :: Python :: 3.10",
Expand All @@ -161,7 +153,7 @@ def get_long_description():
],
# Package info.
zip_safe=False,
python_requires=">=3.7",
python_requires=">=3.8",
package_data={
"datahub_airflow_plugin": ["py.typed"],
},
Expand Down
204 changes: 100 additions & 104 deletions metadata-ingestion-modules/airflow-plugin/tests/unit/test_airflow.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,6 @@
import datetime
import json
import os
import sys
from contextlib import contextmanager
from typing import Iterator
from unittest import mock
Expand Down Expand Up @@ -318,137 +317,134 @@ def test_lineage_backend(mock_emit, inlets, outlets, capture_executions):
# Check that the right things were emitted.
assert mock_emitter.emit.call_count == 17 if capture_executions else 9

# Running further checks based on python version because args only exists in python 3.8+
if sys.version_info > (3, 8):
assert mock_emitter.method_calls[0].args[0].aspectName == "dataFlowInfo"
# TODO: Replace this with a golden file-based comparison.
assert mock_emitter.method_calls[0].args[0].aspectName == "dataFlowInfo"
assert (
mock_emitter.method_calls[0].args[0].entityUrn
== "urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod)"
)

assert mock_emitter.method_calls[1].args[0].aspectName == "ownership"
assert (
mock_emitter.method_calls[1].args[0].entityUrn
== "urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod)"
)

assert mock_emitter.method_calls[2].args[0].aspectName == "globalTags"
assert (
mock_emitter.method_calls[2].args[0].entityUrn
== "urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod)"
)

assert mock_emitter.method_calls[3].args[0].aspectName == "dataJobInfo"
assert (
mock_emitter.method_calls[3].args[0].entityUrn
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task2)"
)

assert mock_emitter.method_calls[4].args[0].aspectName == "dataJobInputOutput"
assert (
mock_emitter.method_calls[4].args[0].entityUrn
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task2)"
)
assert (
mock_emitter.method_calls[4].args[0].aspect.inputDatajobs[0]
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task1_upstream)"
)
assert (
mock_emitter.method_calls[4].args[0].aspect.inputDatajobs[1]
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,testDag,PROD),testTask)"
)
assert (
mock_emitter.method_calls[4].args[0].aspect.inputDatasets[0]
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableConsumed,PROD)"
)
assert (
mock_emitter.method_calls[4].args[0].aspect.outputDatasets[0]
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableProduced,PROD)"
)

assert mock_emitter.method_calls[5].args[0].aspectName == "status"
assert (
mock_emitter.method_calls[5].args[0].entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableConsumed,PROD)"
)

assert mock_emitter.method_calls[6].args[0].aspectName == "status"
assert (
mock_emitter.method_calls[6].args[0].entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableProduced,PROD)"
)

assert mock_emitter.method_calls[7].args[0].aspectName == "ownership"
assert (
mock_emitter.method_calls[7].args[0].entityUrn
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task2)"
)

assert mock_emitter.method_calls[8].args[0].aspectName == "globalTags"
assert (
mock_emitter.method_calls[8].args[0].entityUrn
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task2)"
)

if capture_executions:
assert (
mock_emitter.method_calls[0].args[0].entityUrn
== "urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod)"
mock_emitter.method_calls[9].args[0].aspectName
== "dataProcessInstanceProperties"
)

assert mock_emitter.method_calls[1].args[0].aspectName == "ownership"
assert (
mock_emitter.method_calls[1].args[0].entityUrn
== "urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod)"
mock_emitter.method_calls[9].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)

assert mock_emitter.method_calls[2].args[0].aspectName == "globalTags"
assert (
mock_emitter.method_calls[2].args[0].entityUrn
== "urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod)"
mock_emitter.method_calls[10].args[0].aspectName
== "dataProcessInstanceRelationships"
)

assert mock_emitter.method_calls[3].args[0].aspectName == "dataJobInfo"
assert (
mock_emitter.method_calls[3].args[0].entityUrn
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task2)"
mock_emitter.method_calls[10].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)

assert (
mock_emitter.method_calls[4].args[0].aspectName == "dataJobInputOutput"
mock_emitter.method_calls[11].args[0].aspectName
== "dataProcessInstanceInput"
)
assert (
mock_emitter.method_calls[4].args[0].entityUrn
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task2)"
mock_emitter.method_calls[11].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)
assert (
mock_emitter.method_calls[4].args[0].aspect.inputDatajobs[0]
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task1_upstream)"
mock_emitter.method_calls[12].args[0].aspectName
== "dataProcessInstanceOutput"
)
assert (
mock_emitter.method_calls[4].args[0].aspect.inputDatajobs[1]
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,testDag,PROD),testTask)"
mock_emitter.method_calls[12].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)
assert mock_emitter.method_calls[13].args[0].aspectName == "status"
assert (
mock_emitter.method_calls[4].args[0].aspect.inputDatasets[0]
mock_emitter.method_calls[13].args[0].entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableConsumed,PROD)"
)
assert mock_emitter.method_calls[14].args[0].aspectName == "status"
assert (
mock_emitter.method_calls[4].args[0].aspect.outputDatasets[0]
mock_emitter.method_calls[14].args[0].entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableProduced,PROD)"
)

assert mock_emitter.method_calls[5].args[0].aspectName == "status"
assert (
mock_emitter.method_calls[5].args[0].entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableConsumed,PROD)"
mock_emitter.method_calls[15].args[0].aspectName
== "dataProcessInstanceRunEvent"
)

assert mock_emitter.method_calls[6].args[0].aspectName == "status"
assert (
mock_emitter.method_calls[6].args[0].entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableProduced,PROD)"
mock_emitter.method_calls[15].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)

assert mock_emitter.method_calls[7].args[0].aspectName == "ownership"
assert (
mock_emitter.method_calls[7].args[0].entityUrn
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task2)"
mock_emitter.method_calls[16].args[0].aspectName
== "dataProcessInstanceRunEvent"
)

assert mock_emitter.method_calls[8].args[0].aspectName == "globalTags"
assert (
mock_emitter.method_calls[8].args[0].entityUrn
== "urn:li:dataJob:(urn:li:dataFlow:(airflow,test_lineage_is_sent_to_backend,prod),task2)"
mock_emitter.method_calls[16].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)

if capture_executions:
assert (
mock_emitter.method_calls[9].args[0].aspectName
== "dataProcessInstanceProperties"
)
assert (
mock_emitter.method_calls[9].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)

assert (
mock_emitter.method_calls[10].args[0].aspectName
== "dataProcessInstanceRelationships"
)
assert (
mock_emitter.method_calls[10].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)
assert (
mock_emitter.method_calls[11].args[0].aspectName
== "dataProcessInstanceInput"
)
assert (
mock_emitter.method_calls[11].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)
assert (
mock_emitter.method_calls[12].args[0].aspectName
== "dataProcessInstanceOutput"
)
assert (
mock_emitter.method_calls[12].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)
assert mock_emitter.method_calls[13].args[0].aspectName == "status"
assert (
mock_emitter.method_calls[13].args[0].entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableConsumed,PROD)"
)
assert mock_emitter.method_calls[14].args[0].aspectName == "status"
assert (
mock_emitter.method_calls[14].args[0].entityUrn
== "urn:li:dataset:(urn:li:dataPlatform:snowflake,mydb.schema.tableProduced,PROD)"
)
assert (
mock_emitter.method_calls[15].args[0].aspectName
== "dataProcessInstanceRunEvent"
)
assert (
mock_emitter.method_calls[15].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)
assert (
mock_emitter.method_calls[16].args[0].aspectName
== "dataProcessInstanceRunEvent"
)
assert (
mock_emitter.method_calls[16].args[0].entityUrn
== "urn:li:dataProcessInstance:5e274228107f44cc2dd7c9782168cc29"
)
2 changes: 1 addition & 1 deletion metadata-ingestion/build.gradle
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ def get_coverage_arg(test_name) {

task checkPythonVersion(type: Exec) {
commandLine python_executable, '-c',
'import sys; assert (3, 11) > sys.version_info >= (3, 7), f"Python version {sys.version_info[:2]} not allowed"'
'import sys; assert (3, 11) > sys.version_info >= (3, 8), f"Python version {sys.version_info[:2]} not allowed"'
}

task environmentSetup(type: Exec, dependsOn: checkPythonVersion) {
Expand Down
Loading

0 comments on commit 1498c36

Please sign in to comment.