Post hook on workflow_job model #925

talperetz1 · 2025-01-31T21:27:45Z

Describe the bug

When running python model with workflow_job post hook not running

Steps To Reproduce

Running python model with workflow_job with post hook

version: 2

models:
  - name: model_name
    config:
      +post_hook: "OPTIMIZE {{ this }}"
      job_cluster_config:
        policy_id: XXX
        data_security_mode: SINGLE_USER
        single_user_name: XXXX
        runtime_engine: STANDARD
        spark_version: 16.1.x-scala2.12
        node_type_id: ....
        driver_node_type_id:.....
        cluster_log_conf:
        num_workers: 4
        .
        .
        .

def model(dbt, session):

    dbt.config(submission_method='workflow_job')
    dbt.config(materialized='incremental')
    dbt.config(file_format='delta')
    dbt.config(unique_key=['id'])
    dbt.config(liquid_clustered_by=['id'])
    dbt.config(incremental_strategy='merge')
    dbt.config(on_schema_change='append_new_columns')
    dbt.config(location_root='s3://......')

    df = spark.table("catalog.schema.table_name").where("date = '2025-01-22'")

    return df

Expected behavior

After running the model and writing the data to the target table I excpected to the post_hook to run in this case optimize command

Screenshots and log output

You can see that not optimize command ran in the table history which is the post_hook command

System information

The output of dbt --version:

Core:
  - installed: 1.9.1
  - latest:    1.9.1

Plugins:
  - databricks: 1.9.1 
  - spark:      1.9.0

The operating system you're using:
Linux

The output of python --version:
3.11.2

Additional context

This is what I observe when running the post_hook (int this case OPTIMIZE) in a Python model within the workflow_job.

To create the workflow, you need to run your model using dbt run --select .... In my case, this is executed with a SQL warehouse. This triggers the creation of a workflow, which then starts a job run. The job cluster is created and begins executing the model logic, essentially running the Python code and writing the results to the target table. Once the job run is complete, it finishes with a "success" status.

However, I notice that the dbt run process has not yet finished because there are still ongoing operations in the SQL warehouse. In fact, I sometimes see the post_hook running within the SQL warehouse, suggesting that the post_hook is not actually part of the workflow job but is instead executed separately on the SQL warehouse. Additionally, the post_hook rarely runs.
So its look like it is inconsistent, and you can't rely on it actually running, given that the vast majority of the time, it doesn't.

The text was updated successfully, but these errors were encountered:

talperetz1 added the bug Something isn't working label Jan 31, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Post hook on workflow_job model #925

Post hook on workflow_job model #925

talperetz1 commented Jan 31, 2025

Post hook on workflow_job model #925

Post hook on workflow_job model #925

Comments

talperetz1 commented Jan 31, 2025

Describe the bug

Steps To Reproduce

Expected behavior

Screenshots and log output

System information

Additional context