Skip to content
This repository has been archived by the owner on May 17, 2024. It is now read-only.

Incorrect prod_database being used for --dbt diffs in v0.7.8 and beyond #642

Closed
mmoyer-pax opened this issue Jul 17, 2023 · 4 comments
Closed
Assignees
Labels
awaiting_response bug Something isn't working stale Issues/PRs that have gone stale

Comments

@mmoyer-pax
Copy link

Describe the bug
I'm using the --dbt flag with Snowflake. Versions 0.7.8 and beyond don't diff the correct prod_database that is configured in dbt_project.yml. Works fine in v0.7.7 and earlier.

Useful info:

  • The command or code I used: data-diff --dbt
  • dbt_project.yml configuration:
vars:
  data_diff:
    prod_database: prod_pdp # ***NOTE THIS VALUE***
    prod_schema: analytics
    prod_custom_schema: ANALYTICS_<custom_schema>
  • The run output I'm getting:
Running with data-diff=0.7.8 (Update 0.7.14 is available!)

TEST_PDP.ANALYTICS_BASE.PAXOS_TOKENS_MKT_CAP_DAILY <> TEST_PDP.ANALYTICS_BASE.PAXOS_TOKENS_MKT_CAP_DAILY 
No row differences
- Notice above that it is using `TEST_PDP` instead of the correct prod database of `PROD_PDP`

Likely cause:
In the changelog for v0.7.7 --> v0.7.8, I see this change in how the prod_database is set, so I'm thinking this change could be the cause (lines 141-147): v0.7.7...v0.7.8#diff-8a5550cee0de299f3906d3b40b82bec419ce23c065b04923c3638906b21bda70R141-R147

@mmoyer-pax mmoyer-pax added the bug Something isn't working label Jul 17, 2023
@mimoyer21
Copy link

I'm on dbt v1.5.2.

@dlawin
Copy link
Contributor

dlawin commented Jul 19, 2023

Heya @mimoyer21, this was changed in order to support the use of dbt's custom database configs:

{% macro generate_database_name(custom_database_name=none, node=none) -%}

    {%- set default_database = target.database -%}
    {%- if custom_database_name is none -%}

        {{ default_database }}

    {%- else -%}

        {{ custom_database_name | trim }}

    {%- endif -%}

{%- endmacro %}

Using data-diff in the config based setup follows similar logic to dbt's above. prod_database: represents the default_database for the prod target, which is overridden when a custom database is set in a model config or the dbt_project.yml

I'm a little hesitant to change this as we'll never be able to cover all of the edge cases

Continuing our slack convo:

models:
  +database: |
      {%- if target.name == "prod" -%} prod_db
      {%- elif target.name == "test" -%} test_db
      {%- else -%} dev_prod_db
      {%- endif -%}

Is it possible in your setup to do this in the profile targets instead? I think that would be more in line with dbt's intent

For example:

snowflake-profile:
  target: dev
  outputs:
    dev:
      type: snowflake
      database: dev_prod_db
      ... etc
    test:
      type: snowflake
      database: test_db
      ... etc
    test:
      type: snowflake
      database: prod_db
      ... etc

@dlawin dlawin self-assigned this Jul 19, 2023
@github-actions
Copy link
Contributor

This issue has been marked as stale because it has been open for 60 days with no activity. If you would like the issue to remain open, please comment on the issue and it will be added to the triage queue. Otherwise, it will be closed in 7 days.

@github-actions github-actions bot added the stale Issues/PRs that have gone stale label Sep 18, 2023
@github-actions
Copy link
Contributor

Although we are closing this issue as stale, it's not gone forever. Issues can be reopened if there is renewed community interest. Just add a comment and it will be reopened for triage.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Sep 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
awaiting_response bug Something isn't working stale Issues/PRs that have gone stale
Projects
None yet
Development

No branches or pull requests

3 participants