Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Post hook on workflow_job model #925

Open
talperetz1 opened this issue Jan 31, 2025 · 0 comments
Open

Post hook on workflow_job model #925

talperetz1 opened this issue Jan 31, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@talperetz1
Copy link

Describe the bug

When running python model with workflow_job post hook not running

Steps To Reproduce

Running python model with workflow_job with post hook

version: 2

models:
  - name: model_name
    config:
      +post_hook: "OPTIMIZE {{ this }}"
      job_cluster_config:
        policy_id: XXX
        data_security_mode: SINGLE_USER
        single_user_name: XXXX
        runtime_engine: STANDARD
        spark_version: 16.1.x-scala2.12
        node_type_id: ....
        driver_node_type_id:.....
        cluster_log_conf:
        num_workers: 4
        .
        .
        .
def model(dbt, session):

    dbt.config(submission_method='workflow_job')
    dbt.config(materialized='incremental')
    dbt.config(file_format='delta')
    dbt.config(unique_key=['id'])
    dbt.config(liquid_clustered_by=['id'])
    dbt.config(incremental_strategy='merge')
    dbt.config(on_schema_change='append_new_columns')
    dbt.config(location_root='s3://......')

    df = spark.table("catalog.schema.table_name").where("date = '2025-01-22'")

    return df

Expected behavior

After running the model and writing the data to the target table I excpected to the post_hook to run in this case optimize command

Screenshots and log output

Image

You can see that not optimize command ran in the table history which is the post_hook command

System information

The output of dbt --version:

Core:
  - installed: 1.9.1
  - latest:    1.9.1

Plugins:
  - databricks: 1.9.1 
  - spark:      1.9.0 

The operating system you're using:
Linux

The output of python --version:
3.11.2

Additional context

This is what I observe when running the post_hook (int this case OPTIMIZE) in a Python model within the workflow_job.

To create the workflow, you need to run your model using dbt run --select .... In my case, this is executed with a SQL warehouse. This triggers the creation of a workflow, which then starts a job run. The job cluster is created and begins executing the model logic, essentially running the Python code and writing the results to the target table. Once the job run is complete, it finishes with a "success" status.

However, I notice that the dbt run process has not yet finished because there are still ongoing operations in the SQL warehouse. In fact, I sometimes see the post_hook running within the SQL warehouse, suggesting that the post_hook is not actually part of the workflow job but is instead executed separately on the SQL warehouse. Additionally, the post_hook rarely runs.
So its look like it is inconsistent, and you can't rely on it actually running, given that the vast majority of the time, it doesn't.

@talperetz1 talperetz1 added the bug Something isn't working label Jan 31, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant