Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Override list_relations_without_caching #428

Merged

Conversation

jiezhen-chen
Copy link
Contributor

@jiezhen-chen jiezhen-chen commented May 3, 2023

resolves #179

Description

Currently, redshift__list_relations_without_caching leverages postgres__list_relation_without_caching. Since redshift_connector has its own metadata call(get_tables) that does the same thing as postgres__list_relation_without_caching, we are migrating to use the native redshift driver.

Changes were made to override the method list_relations_without_caching directly, instead of changing the underlying macro sql query. Unit tests were also created under test_redshift_adapter, functional tests created in test_adapter_methods.

Because redshift_connector get_tables api call does not support MV, we are switching back to writing hard coded queries to fetch this metadata. Instead of pg_views and pg_tables the new queries use information_schema.views and information_schema.tables.

Checklist

  • I have read the contributing guide and understand what's expected of me
  • I have signed the CLA
  • I have run this code in development and it appears to resolve the stated issue
  • This PR includes tests, or tests are not required/relevant for this PR
  • Docs changes are not required/relevant for this PR
  • I have run changie new to create a changelog entry

@cla-bot cla-bot bot added the cla:yes label May 3, 2023
@jiezhen-chen jiezhen-chen changed the title Override list relations without caching Override list_relations_without_caching May 4, 2023
@jiezhen-chen jiezhen-chen changed the title Override list_relations_without_caching Override list_relations_without_caching to use redshift driver May 4, 2023
@jiezhen-chen jiezhen-chen marked this pull request as ready for review May 4, 2023 19:33
@jiezhen-chen jiezhen-chen requested a review from a team as a code owner May 4, 2023 19:33
@dbeatty10
Copy link
Contributor

What are your thoughts on the default value for database_metadata_current_db_only?

Is there a way to dynamically infer whether the user needs multi-db querying and/or datashare support? It would be awesome if we could have both of these without the user needing to add extra configuration themselves:

  • support for multi-db querying and datashares
  • best performance and lowest cost possible

What would be the performance and/or cost implications to always allowing cross-db querying? e.g., How much extra latency would it add? And how much monetary cost?

@jiezhen-chen jiezhen-chen changed the title Override list_relations_without_caching to use redshift driver Override list_relations_without_caching to use information_schema Jul 12, 2023
@jiezhen-chen jiezhen-chen changed the title Override list_relations_without_caching to use information_schema Override list_relations_without_caching Jul 12, 2023
@colin-rogers-dbt
Copy link
Contributor

colin-rogers-dbt commented Jul 12, 2023

Is there a way to dynamically infer whether the user needs multi-db querying and/or datashare support?

Since dbt is only creating tables in the target database the only time we would need this set would be if the user wanted to refer to a source that was outside the database. We could have this set based on the source config

In any case I'd suggest we split this out into it's own PR as I don't think it's necessary to the rest of the PR right?

@mikealfare mikealfare merged commit 92dbeb6 into dbt-labs:main Jul 13, 2023
34 checks passed
@rarup1
Copy link

rarup1 commented Jul 13, 2023

Great that this has been merged into main! @mikealfare @jiezhen-chen which dbt-redshift release will this be available in? We are operating on dbt 1.3. Do you know if I will need to upgrade that to 1.5 before taking advantage of this upgrade?

@dbeatty10
Copy link
Contributor

@rarup1 currently, this is slated to require dbt 1.6, which will have its final release in about two weeks.

@dbeatty10
Copy link
Contributor

@jiezhen-chen the changelog entry indicated that this issue resolves #17, so I added resolves dbt-labs/dbt-core#17 to formally link this PR as closing that issue. And also added this comment: #17 (comment).

But maybe I shouldn't have done either!

Now that I'm looking at this implementation more closely, I'm not seeing where external tables are covered. I wonder if support for external tables did exist in this PR when get_tables was included, but that support was dropped when switching over to information_schema views in ced9cd1?

If this PR does not resolve dbt-labs/dbt-core#17, then we'll want to do a couple follow-up items:

  • Remove resolves dbt-labs/dbt-core#17 from this PR description
  • Re-open #17
  • Within a new PR, update the original changelog entry for this PR so it doesn't include #17

Bonus follow-up items:

  • add a note to #17 (and/or #273) indicating the recommended approach to add support for external tables
  • sketch out ideas how to add test cases to cover external tables

@jiezhen-chen
Copy link
Contributor Author

@dbeatty10 You're correct. Because we've pivoted our solution to NOT use get_tables, this change won't solve issue 17 and 217. I'm sorry I didn't update the PR with our latest decision prior to the merge. I'll open a new PR to update the changelog to reflect this. Thanks!

@dbeatty10
Copy link
Contributor

No prob @jiezhen-chen. Thank you for updating that changelog entry to reflect the relevant issue in dbt-labs/dbt-core#535, and thanks to @dataders for re-opening dbt-labs/dbt-core#17 and dbt-labs/dbt-core#217 👍

abbywh pushed a commit to abbywh/dbt-redshift that referenced this pull request Oct 11, 2023
* add support for datashared objects and external tables for redshift list_relation_without_caching

* changie

* refactor inner joins to cte

* shorten list_relations_without_caching query

* add database filter to get_columns_in_relation to only query inbound datashared objects of a connected database

* leverage redshift_conn method for list_schemas

* add list_schemas method

* migrate from macros to redshift driver

* add database_metadata_current_db_only to unit tests

* override get_columns_in_relation

* cast relation types to lower case

* add functional test to ensure list_relations_without_chacing works

* revert changes in macros adapter.sql

* revert changes in macros adapter.sql

* revert changes in macros adapter.sql

* update test_adapter_methods with project test

* remove snowflake comment in test_adapter_methods

Co-authored-by: Doug Beatty <[email protected]>

* unpack driver metadata results with dictionary

* refactor parse_relation_results to its own method

* replace dict[] with Dict[]

Co-authored-by: colin-rogers-dbt <[email protected]>

* correct unit tests and functional tests

* remove irrelevant comment

* set current_db_only to default true

* remove unused import

* remove overriding list_relations_without_caching and rewrite the sql query

* update unit tests

* remove overriding list_relations_without_caching and rewrite the sql query

* remove unused tests

* add schema filter

---------

Co-authored-by: Anders <[email protected]>
Co-authored-by: Doug Beatty <[email protected]>
Co-authored-by: colin-rogers-dbt <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[CT-1122] [CT-1116] [Bug] adapter.get_relation for relations in other databases [RA3]
9 participants