Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CT-2819] default__alter_relation_add_remove_columns macro does not use quoting with case sensitive Snowflake relation #256

Draft
wants to merge 26 commits into
base: main
Choose a base branch
from

Conversation

McKnight-42
Copy link
Contributor

resolves #250
docs dbt-labs/docs.getdbt.com/#

Problem

Try to fix issue around case sensitive column quoting in macro default__alter_relation_add_remove_columns

Solution

update the dbt-adapters default implementation test against postgres and other adapted as needed to maintain compaitiablity, update dbt-snowflake implementation of macro

create test in dbt-tests-adapters and inherit in adapters as needed.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have run this code in development, and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • This PR has no interface changes (e.g. macros, cli, logs, json artifacts, config files, adapter interface, etc.) or this PR has already received feedback and approval from Product or DX

@McKnight-42 McKnight-42 self-assigned this Jul 1, 2024
@cla-bot cla-bot bot added the cla:yes label Jul 1, 2024
Copy link

github-actions bot commented Jul 1, 2024

Thank you for your pull request! We could not find a changelog entry for this change. For details on how to document a change, see the contributing guide.

@McKnight-42
Copy link
Contributor Author

McKnight-42 commented Jul 2, 2024

NOTES: usage across adapters

Adapters that use own implementation of alter_relation_add_remove_columns (defined in dot-adapters)
- Dbt-snowflake
- Dbt-redshift
- Dbt-spark

Adapters that would use default if at all (not referenced in either adapter directly)
- Dbt-postgres
- Dbt-bigquery

@@ -123,11 +123,11 @@
alter {{ relation.type }} {{ relation.render() }}

{% for column in add_columns %}
add column {{ column.name }} {{ column.data_type }}{{ ',' if not loop.last }}
add column {{ adapter.quote(column.name) }} {{ column.data_type }}{{ ',' if not loop.last }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we always want to quote? Or is there a config that we need to condition on, similar to the quote policy for relation names?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think adapter.quote leverages the same quote policy as relation names

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mikealfare are you thinking something like this (for relations) or this (for columns)?

In this particular case, I think we need to match the quoting from here (which would mean always quoting via adapter.quote()). I forget the exact details -- and I could be wrong! -- but there's some notes under the "Root Cause" toggle in #250 (comment).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: here are the primary resources I've been using when researching quoting or case-sensitive issues:

Copy link
Contributor Author

@McKnight-42 McKnight-42 Jul 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dbeatty10 oh these are really cool.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colin-rogers-dbt adapter.quote() will always quote.

@dbeatty10 If the objective is to always quote the column in this case, that's fine. I just know we've run into issues before where we quote something that doesn't get quoted, or vice versa, and then it fails a condition in a filter where it's supposed to pass. I haven't seen it with columns, so maybe it doesn't apply here.

assert "Job" in logs


class TestIncrementalCaseSenstivityOnSchemaChange(BaseIncrementalCaseSenstivityOnSchemaChange):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need this class. The reason TestIncrementalOnSchemaChange is there is to preserve backwards compatibility. These tests used to run when testing dbt-postgres. Now that that's in its own repo and explicitly implements tests from dbt-tests-adapter, we no longer need to create these classes here.


-- add a non-zero version to the end of the command to get a different version:
-- --vars "{'version': 1}"
select {{ dbt.current_timestamp() }} as inserted_at, 'john' as Name, 'engineer' as "Job"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if using adapter.quote and/or string_literal would enable this test to work across more adapters without modification?

Suggested change
select {{ dbt.current_timestamp() }} as inserted_at, 'john' as Name, 'engineer' as "Job"
select {{ dbt.current_timestamp() }} as inserted_at, {{ dbt.string_literal("john") }} as Name, {{ dbt.string_literal("engineer") }} as {{ adapter.quote("Job") }}

@@ -303,3 +303,34 @@

from source_data
"""

_MODELS__SRC_ARTISTS = """
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: The content of the model looks like "jobs" now instead of "artists". So ideally they'd match instead of being different.

Comment on lines 97 to 110
def test_run_incremental_check_quoting_on_new_columns(self, project):
select = "src_artists dim_artists"
run_dbt(["run", "--models", select, "--full-refresh"])
run_dbt(["show", "--inline", "select * from {{ ref('dim_artists') }}"])
res, logs = run_dbt_and_capture(
["show", "--inline", "select * from {{ ref('dim_artists') }}"]
)
assert "Job" not in logs
run_dbt(["run", "--vars", "{'version': 1}"])
run_dbt(["show", "--inline", "select * from {{ ref('dim_artists') }}"])
res, logs = run_dbt_and_capture(
["show", "--inline", "select * from {{ ref('dim_artists') }}"]
)
assert "Job" in logs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The run_dbt(["show" ... portions probably aren't necessary. They were just so people could see the state of the data within this reprex: #250 (comment)

Instead of the assertions here, you might be able to just rely upon expect_pass=True as the default for run_dbt and run_dbt_and_capture.

From the original bug report in #250, my understanding was that it actually produced an error.

@McKnight-42 McKnight-42 removed their assignment Aug 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
4 participants