Skip to content

Commit

Permalink
docs(airflow): update min version for plugin v2 (#11065)
Browse files Browse the repository at this point in the history
  • Loading branch information
hsheth2 authored Aug 1, 2024
1 parent 66ecfae commit 2369032
Showing 1 changed file with 6 additions and 10 deletions.
16 changes: 6 additions & 10 deletions docs/lineage/airflow.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ There's two actively supported implementations of the plugin, with different Air

| Approach | Airflow Version | Notes |
| --------- | --------------- | --------------------------------------------------------------------------- |
| Plugin v2 | 2.3+ | Recommended. Requires Python 3.8+ |
| Plugin v2 | 2.3.4+ | Recommended. Requires Python 3.8+ |
| Plugin v1 | 2.1+ | No automatic lineage extraction; may not extract lineage if the task fails. |

If you're using Airflow older than 2.1, it's possible to use the v1 plugin with older versions of `acryl-datahub-airflow-plugin`. See the [compatibility section](#compatibility) for more details.
Expand Down Expand Up @@ -66,7 +66,7 @@ enabled = True # default
```

| Name | Default value | Description |
|----------------------------|----------------------|------------------------------------------------------------------------------------------|
| -------------------------- | -------------------- | ---------------------------------------------------------------------------------------- |
| enabled | true | If the plugin should be enabled. |
| conn_id | datahub_rest_default | The name of the datahub rest connection. |
| cluster | prod | name of the airflow cluster, this is equivalent to the `env` of the instance |
Expand Down Expand Up @@ -132,7 +132,7 @@ conn_id = datahub_rest_default # or datahub_kafka_default
```

| Name | Default value | Description |
|----------------------------|----------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| -------------------------- | -------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| enabled | true | If the plugin should be enabled. |
| conn_id | datahub_rest_default | The name of the datahub connection you set in step 1. |
| cluster | prod | name of the airflow cluster |
Expand Down Expand Up @@ -240,6 +240,7 @@ See this [example PR](https://github.com/datahub-project/datahub/pull/10452) whi
There might be a case where the DAGs are removed from the Airflow but the corresponding pipelines and tasks are still there in the Datahub, let's call such pipelines ans tasks, `obsolete pipelines and tasks`

Following are the steps to cleanup them from the datahub:

- create a DAG named `Datahub_Cleanup`, i.e.

```python
Expand All @@ -263,8 +264,8 @@ with DAG(
)

```
- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`

- ingest this DAG, and it will remove all the obsolete pipelines and tasks from the Datahub based on the `cluster` value set in the `airflow.cfg`

## Get all dataJobs associated with a dataFlow

Expand All @@ -274,12 +275,7 @@ If you are looking to find all tasks (aka DataJobs) that belong to a specific pi
query {
dataFlow(urn: "urn:li:dataFlow:(airflow,db_etl,prod)") {
childJobs: relationships(
input: {
types: ["IsPartOf"],
direction: INCOMING,
start: 0,
count: 100
}
input: { types: ["IsPartOf"], direction: INCOMING, start: 0, count: 100 }
) {
total
relationships {
Expand Down

0 comments on commit 2369032

Please sign in to comment.