Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid dependency graph for tasks #1499

Closed
singhsatnam opened this issue Mar 28, 2024 · 1 comment
Closed

Invalid dependency graph for tasks #1499

singhsatnam opened this issue Mar 28, 2024 · 1 comment
Labels
bug Something isn't working

Comments

@singhsatnam
Copy link

Describe the bug
While creating a dependency between two tasks created using DatabricksTaskOperator() does not use the task_key specified, but uses dagName__groupId__taskKey. This is inconsistent with the tasks created on Databricks because they correctly use the task_key specified.

To Reproduce
Steps to reproduce the behavior:

  1. Run the following code with a valid cluster config and update the path to two notebooks on databricks which could simply print hello.
from airflow.decorators import dag
from astro_databricks.operators.common import DatabricksTaskOperator
from astro_databricks.operators.workflow import DatabricksWorkflowTaskGroup
from pendulum import datetime

 
DATABRICKS_JOB_CLUSTER_KEY: str = "Airflow_Shared_job_cluster"
DATABRICKS_CONN_ID: str = "databricks_default"

 
job_cluster_spec: list[dict] = [
# A valid cluster config
]

 
@dag(start_date=datetime(2024, 1, 1), schedule=None, catchup=False)
def dynamic_template():
    task_group = DatabricksWorkflowTaskGroup(
        group_id="projectv2",
        databricks_conn_id=DATABRICKS_CONN_ID,
        job_clusters=job_cluster_spec,
    )
    with task_group:
        print_1 = DatabricksTaskOperator(
            task_id="print_1",
            databricks_conn_id=DATABRICKS_CONN_ID,
            job_cluster_key=DATABRICKS_JOB_CLUSTER_KEY,
            task_config={
                "task_key": "print_1",
                "notebook_task": {
                    "notebook_path": "path_to_notebook/print_test1",
                    "source": "WORKSPACE",
                },
            },
        )

        print_2 = DatabricksTaskOperator(
            task_id="print_2",
            databricks_conn_id=DATABRICKS_CONN_ID,
            job_cluster_key=DATABRICKS_JOB_CLUSTER_KEY,
            task_config={
                "task_key": "print_2",
                "notebook_task": {
                    "notebook_path": "path_to_notebook/print_test2",
                    "source": "WORKSPACE",
                },
            },
        )
        print_2.set_upstream(print_1)
dynamic_template()

Expected behavior
This should create a DAG with two tasks - print_1 and print_2 - and print_2 should be dependent on print_1.

Screenshots
image

image

Desktop (please complete the following information):

  • OS: macos Ventura 13.6.1
  • Browser Firefox
  • Version 123.0.1
@singhsatnam singhsatnam added the bug Something isn't working label Mar 28, 2024
@singhsatnam
Copy link
Author

Created bug in the correct repo: astronomer/astro-provider-databricks#71

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant