Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Delete+insert incremental strategy doesn't support incremental predicates when a list of unique keys is used #230

Open
2 tasks done
rlh1994 opened this issue May 24, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@rlh1994
Copy link

rlh1994 commented May 24, 2024

Is this a new bug?

  • I believe this is a new bug
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

This is probably halfway between a bug and a feature, I can't find documented anywhere that this isn't supported, and the docs suggest it should be possible, but it's also not something that I guess has ever worked.

The delete+insert strategy when a list of unique keys is provided uses the delete from X using Y syntax and then attaches the predicates further down in the condition. However, because the predicate may reference a column that exists in both tables (which in most cases it would), this fails due to an ambiguous column reference (at least on redshift+postgres, i haven't yet tested other warehouses).

https://github.com/dbt-labs/dbt-adapters/blob/a2292c8ec76e4c9f03869ad95817a2ad82dfb34b/dbt/include/global_project/macros/materializations/models/incremental/merge.sql#L65:L77

The docs suggest that you can use DBT_INTERNAL_DEST and DBT_INTERNAL_SOURCE values for the merge strategy, but there is no equivalent for the delete+insert. This means it is not possible to use incremental predicates with the delete+insert strategy with a list-based unique key.

I do acknowledge that this is largely due to the fact that aliasing tables in a delete statement is in many cases not actually supported, however I believe there are potential solutions to this (none of which are perfect) discussed below.

Expected Behavior

I would expect the delete+insert strategy to support some way to use incremental predicates while using a list-based unique key field.

One way this could be achieved would be to still support the use of DBT_INTERNAL_DEST and DBT_INTERNAL_SOURCE in the incremental predicates, but to replace these with the absolute target and source table names at compile time in the macro before returning the SQL. This isn't ideal as you're actively editing the predicates that were provided, but would ensure a consistent behaviour between strategies.

Steps To Reproduce

  1. New project, add a seed file called my_seed.csv with the following data
    run,id,id2,start_tstamp
    1,1,1,2021-01-01 00:00:00
    1,2,2,2021-03-01 00:00:00
    1,2,2,2021-03-03 00:00:00
    1,3,3,2021-03-03 00:00:00
    1,4,4,2021-03-04 00:00:00
    2,1,1,2021-03-05 00:00:00
    2,2,2,2021-03-02 00:00:00
    2,3,3,2021-03-07 00:00:00
    2,5,5,2021-03-08 00:00:00
    
  2. Create a model called my_modelsql` with the following content
    {{
      config(
        materialized='incremental',
        incremental_strategy='delete+insert',
        unique_key=['id', 'id2'],
        upsert_date_key='start_tstamp',
        incremental_predicates= ["start_tstamp > date '2021-01-01'"]
      )
    }}
    
    with data as (
      select * from {{ ref('my_seed') }}
    )
    
    {% if is_incremental() %}
    
      select
        id,
        id2,
        start_tstamp
    
      from data
      where run = 2
    
    {% else %}
    
      select
        id,
        id2,
        start_tstamp
    
      from data
      where run = 1
    
    {% endif %}
  3. dbt seed, dbt run, dbt run again. On the second run you should get the following error
    column reference "start_tstamp" is ambiguous
      LINE 17:                         and start_tstamp > date '2021-01-01'
                                           ^
      compiled Code at target/run/dbt_demo/models/my_models.sql
    

Relevant log output

No response

Environment

- OS: MacOSx
- Python: 3.9.14
- dbt-adapter: postgres 1.7.11 (although this issue likely applied in other warehouses)

Additional Context

No response

@rlh1994 rlh1994 added bug Something isn't working triage labels May 24, 2024
@rlh1994 rlh1994 changed the title [Bug] Delete+insert incremental strategy doesn't support incremental predicated when a list of unique keys is used [Bug] Delete+insert incremental strategy doesn't support incremental predicates when a list of unique keys is used May 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants