Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ADAP-979] [Bug] Database Error: Only user perm tables supported in producer views #656

Open
2 tasks done
Tracked by #742
alison985 opened this issue Nov 2, 2023 · 13 comments
Open
2 tasks done
Tracked by #742
Labels
bug Something isn't working ra3_node issues relating to ra3 node support

Comments

@alison985
Copy link

alison985 commented Nov 2, 2023

Is this a new bug in dbt-redshift?

  • I believe this is a new bug in dbt-redshift
  • I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

When the dbt Cloud connection is set to database "development" and your dbt_project.yml file states all model nodes occur in database sources then dbt compile and dbt build will not work.

Expected Behavior

The database specified in dbt_project.yml for a model is used as specified. At the least, when using RA3 nodes and a database user that has permissions, dbt should be able to run dbt compile successfully regardless of the database specified in the Connection.

Steps To Reproduce

  1. Have a Redshift cluster with RA3 nodes. Have a database connection to database A. Have a Redshift db user with, at least, database permissions to database A and B. Have a dbt_project.yml file that specifies +database: B.
  2. Run dbt compile.
  3. Receive error message.

Relevant log output

22:19:23  Began running node model.redshift_us_east_1.clean_table_coconut
22:19:23  Re-using an available connection from the pool (formerly list_sources_dbt_developer_contracts, now model.redshift_us_east_1.clean_table_coconut)
22:19:23  Began compiling node model.redshift_us_east_1.clean_table_coconut
22:19:23  Using redshift connection "model.redshift_us_east_1.clean_table_coconut"
22:19:23  On model.redshift_us_east_1.clean_table_coconut: BEGIN
22:19:23  Opening a new connection, currently in state closed
22:19:23  Redshift adapter: Establishing connection using ssl with `sslmode` set to 'prefer'.To connect without ssl, set `sslmode` to 'disable'.
22:19:23  Redshift adapter: Connecting to redshift with username/password based auth...
22:19:23  SQL status: SUCCESS in 0.0 seconds
22:19:23  Using redshift connection "model.redshift_us_east_1.clean_table_coconut"
22:19:23  On model.redshift_us_east_1.clean_table_coconut: /* {"app": "dbt", "dbt_version": "1.5.9", "profile_name": "user", "target_name": "default", "node_id": "model.redshift_us_east_1.clean_table_coconut"} */
SELECT *
    FROM sources.information_schema.tables 
    WHERE table_name = 'clean_table_coconut' and table_schema = 'dbt_developer_transforms'
22:19:23  Redshift adapter: Redshift error: Only user perm tables supported in producer views
22:19:23  On model.redshift_us_east_1.clean_table_coconut: ROLLBACK
22:19:23  Timing info for model.redshift_us_east_1.clean_table_coconut (compile): 22:19:23.485015 => 22:19:23.666072
22:19:23  On model.redshift_us_east_1.clean_table_coconut: Close
22:19:23  Database Error in model clean_table_coconut (models/upstream_prepared/sqlserverdbname_dbo/clean_table_coconut.sql)
  Only user perm tables supported in producer views
22:19:23  Finished running node model.redshift_us_east_1.clean_table_coconut

Environment

- OS: dbt Cloud
- Python: dbt Cloud
- dbt-core: 1.5
- dbt-redshift: dbt Cloud

Additional Context

Why can't I just change the database name in the dbt Cloud Connection to be correct? Because that will generate a new ssh key set which means the public_key has to get PRed by DevOps into the right place which will take an unspecified amount of time. (So yes, I'm going to go through that process, but also it would be great for this to be fixed so someone else doesn't have to deal with it. Also, supporting RA3 nodes in general.)

Why? Well it's not any of these:
A) user permissions. The Only user perm tables supported in producer views in the error message implies it's a permission issue. So I explicitly permissioned the user account for allllll the things - usage, select, all alter default privileges on tables, schema and added it to groups with at least SELECT permissions on the same. No go. This database user also created and owns the database it's referencing and all the objects it's trying to read information about. In addition, the database user in my dbt Cloud profile settings is a superuser. (Technically, it doesn't own this particular schema.)
B) It occurred to me that the dbt-redshift adapter should be doing something like USE DATABASE sources but a) Redshift doesn't seem to have the USE keyword from a web search and b) dbt is correctly writing the database name sources in the query it runs against the information_schema.
C) Maybe it was that Redshift doesn't support cross-database work - unless you use RA3 nodes and then it supports SELECTs. These are RA3 nodes, so it should be able to run a SELECT on the sources database information_schema schema.
D) The database connection just doesn't work. Incorrect. i) dbt seed runs fine and creates 3 tables, albeit in the wrong place - database apple. This is expected - RA3 nodes don't support writes. (Specifying database B for seed writes also results, as expected, in an error.) ii)There's a SUCCESS message for the connection in the log.
E) Fluke one off. No, this happened multiple times.
G) Something about whether an object exists or not. I received the error whether the object I was trying to refresh already existed or didn't(deleted manually outside of dbt).

You can possibly argue what the bug is/bugs are, but something doesn't work correctly. Possible interpretations of this bug are:

  1. Redshift itself has a problem.
  2. dbt-core is missing some code/syntax around supporting RA3 nodes and/or cross-database reads.
  3. dbt-core could return a better error message.
  4. dbt Cloud Connection page could have a warning about what database name you put in.
  5. User error. I'm completely missing something.
  6. Something else.
@alison985 alison985 added bug Something isn't working triage labels Nov 2, 2023
@github-actions github-actions bot changed the title [Bug] Using a different database than the dbt Cloud Connection [ADAP-979] [Bug] Using a different database than the dbt Cloud Connection Nov 2, 2023
@dbeatty10
Copy link
Contributor

Thanks for detailed write-up @alison985 !

I'm wondering if we can reproduce the essence of your experience outside of dbt Cloud. Are the following files a reasonable representation of the setup and experience you are describing?

Files

profiles.yml

sandcastle-redshift:
  target: connection_a
  outputs:

    connection_a:
      type: redshift
      dbname: A
      ra3_node: true
      # ...

    connection_b:
      type: redshift
      dbname: B
      ra3_node: true
      # ...

dbt_project.yml

name: "my_project"
version: "1.0.0"
config-version: 2
profile: "sandcastle-redshift"

models:
  my_project:
    +database: B

models/my_model

select 1 as id

Commands

This works?

dbt run -s my_model --target connection_b

But this doesn't?

dbt run -s my_model --target connection_a

@alison985
Copy link
Author

Hi @dbeatty10 ! That file setup is correct for what I had except for the ra3_node: true and that I switched to testing with dbt compile since that theoretically has the least amount of database dependency.

I have been using dbt-core locally with this Redshift cluster for weeks, so I can do the outside dbt Cloud test.

Yes, it works fine with connection_b. (relevant database B in profile.yml)

Yes, it doesn't work with connection_a. (non-relevant database A in profile.yml) I get this error message locally:

04:16:43  Encountered an error:
ERROR: Cross-db references allowed only in RA3.* node. (sources vs data_platform_development)

When I add ra3_node: true with non-relevant database A in profiles.yml I get this error message:

Runtime Error
  Database Error in model clean_table_coconut (models/upstream_prepared/sqlserverdbname_dbo/clean_table_coconut.sql)
    Only user perm tables supported in producer views

When I keep ra3_node: true and switch back to relevant database B, it compiles successfully.

FWIW, locally I'm on dbt-core v1.4.8 and dbt-redshift v1.4.0. The dbt_project.yml file says require-dbt-version: [">=1.4.6", "<1.6.0"] which allows for local use of a SQL Server connection. Because dbt Cloud project is configured to 1.5, the same code runs on dbt v1.5 in the cloud.

@dbeatty10
Copy link
Contributor

Nice research @alison985

I haven't seen this error before and didn't have much luck with Google either, so we might need to ask some folks from AWS what it means:

    Only user perm tables supported in producer views

In the meantime, could you try installing dbt-redshift 1.7.0 and setup a brand new standalone project like this and see if you get that same error?

@alison985
Copy link
Author

@dbeatty10 , I can't upgrade the code base to 1.7 because then I would lose the ability to function against SQL Server.

@dbeatty10
Copy link
Contributor

Could you help me understand how SQL Server and Redshift are related to each other in this case?

Is there some way by which you are connecting to a SQL Server database from within a single dbt project that uses the dbt-redshift adapter?

Or do you have both the dbt-redshift adapter and the dbt-sqlserver adapter installed at the same time with a shared version of dbt-core? And you have totally different dbt projects that use each of those adapters separately?

Virtual environment for dbt-redshift 1.7

Assuming that you have a separate dbt project that only uses Redshift (and doesn't use SQL Server at all), could you try to install 1.7 within a virtual enviornment like this? The details will vary based on your operating system and shell. If you let me know your operating system, I can try coming up with custom instructions for you.

These are the instructions for macOS using zsh or bash:

python3 -m venv redshift_1.7
source redshift_1.7/bin/activate
python3 -m pip install --upgrade pip
python3 -m pip install --upgrade --pre dbt-redshift~=1.7.0.dev0 dbt-postgres~=1.7.0.dev0 dbt-core~=1.7.0.dev0
source redshift_1.7/bin/activate
dbt --version

👉 Then could you try setting up a brand new standalone project like #656 (comment) and see if you get that same error?

When you are all done using this virtual environment, you can get out of it by running:

deactivate

Determining your python command

You can run the following to find your python command that points to Python 3 (rather than Python 2):

python3 --version
python --version

If both of them say something like Python 3.x.x, then you can use either interchangeably.

@dbeatty10 dbeatty10 self-assigned this Nov 7, 2023
@alison985
Copy link
Author

dbt-sqlserver shouldn't have anything to do with it. I was trying to say I can't upgrade my dbt-core version until dbt-sqlserver supports dbt-core versions > 1.4.8. Or at least, I'm not going to attempt adding it to the complexity of the project.

I have 1 dbt code repo(and dbt_project.yml). Locally, within 1 Docker container of dbt-core I have dbt-sqlserver and dbt-redshift adapters. There are 2 different profiles in profiles.yml, one for each dialect. I can switch back-and-forth through environment variables. I've done enough abstraction, macros, variables, and changes that the dbt code will run on either dialect.

Whether using the Redshift adapter and Redshift development endpoint on a) dbt-core v1.4.8 locally and/or b) dbt Cloud v1.5 with 1 dbt Cloud project, dbt can not handle reading across databases on RA3 nodes for at least some operations(e.g. compile).

Assuming that you have a separate dbt project that only uses Redshift (and doesn't use SQL Server at all),

I don't. I was already setup to do the outside-dbt-Cloud test you requested and I was happy to do it. Unfortunately, I won't have time to create the separate project test you're describing.

I can report that dbt Cloud works now that the database name in the connection string is changed to the database name I'll be working in and the dbt Cloud generated public key is installed on our side.

@jiezhen-chen
Copy link
Contributor

Hi @alison985 , looks like you're trying to run a dbt project with models on a database while the connection is made to another database. Is that correct? You're right that RA3 nodes support SELECTs on another database, but CREATEs are not supported. I think the issue here is that the models being run in your project will attempt to run CREATE TABLE or CREATE VIEW on database B while connected to database A. Your project will be able to doSELECTs in database B using database A connection, but cannot do write operations. This is also why dbt seed created tables in database A. If you only want to read from database B using connection_a, consider using other dbt objects such as a macro that won't attempt to create anything in the unconnected database.

@dbeatty10 dbeatty10 changed the title [ADAP-979] [Bug] Using a different database than the dbt Cloud Connection [ADAP-979] [Bug] Database Error: Only user perm tables supported in producer views Nov 7, 2023
@dbeatty10
Copy link
Contributor

@jiezhen-chen thanks for jumping in with your expertise here! 🧠

In this case, it sounds like @alison985 is only doing a dbt compile with a project similar to this and still getting this error:

  Only user perm tables supported in producer views

From the logs that @alison985 shared, it looks like it may be this query that is triggering the error:

SELECT *
    FROM sources.information_schema.tables 
    WHERE table_name = 'clean_table_coconut' and table_schema = 'dbt_developer_transforms'

Do you know what this error message means and what might be causing it?

@jiezhen-chen
Copy link
Contributor

jiezhen-chen commented Nov 10, 2023

Ah, thanks for the clarification @dbeatty10 . I did an experiment querying on a ra3 cluster using redshift query editor. I ran the following snippet while connected to cluster_b and got the same "perm tables" error.

SELECT *
   FROM cluster_a.information_schema.tables 

This may be a redshift limitation with cross-database querying information_schema, guessing that it is not accessible from another database. An alternative would be to use svv_redshift_tables. Try the following:

SELECT *
    FROM svv_redshift_tables
    WHERE database_name  = cluster_a 

The query above should return a list of all tables in cluster_a.

@dbeatty10
Copy link
Contributor

Thanks for that insight @jiezhen-chen !

Potential fix

Searching through the source code, the fix might be updating here (and maybe here too) to use svv_redshift_tables like you mentioned.

I'm not sure one way or the other if there are other queries against information_schema.tables, etc. that dbt-redshift has directly or inherits indirectly from dbt-postgres.

Acceptance criteria

  • The following test case breaks without the fix, but works after the fix
Test case details

Files

profiles.yml

sandcastle-redshift:
  target: connection_a
  outputs:

    connection_a:
      type: redshift
      dbname: database_a
      ra3_node: true
      # ...

    connection_b:
      type: redshift
      dbname: database_b
      ra3_node: true
      # ...

dbt_project.yml

name: "my_project"
version: "1.0.0"
config-version: 2
profile: "sandcastle-redshift"

models:
  my_project:
    +database: database_b

models/my_table.sql

{{ config(materialized="table") }}

select 1 as id

models/my_view.sql

{{ config(materialized="view") }}

select 1 as id

models/my_mv.sql

{{ config(materialized="materialized_view") }}

select 1 as id

models/my_ephemeral.sql

{{ config(materialized="ephemeral") }}

select 1 as id

Commands

This is expected to break until a fix is in place:

dbt run --target connection_a

This is expected to work just fine both with and without a fix:

dbt run --target connection_b

Note

I didn't actually try reproducing this error report myself.

@dbeatty10 dbeatty10 removed the triage label Nov 10, 2023
@dbeatty10 dbeatty10 removed their assignment Nov 10, 2023
@nathaniel-may
Copy link
Contributor

Will likely be addressed as a part of #555

@nathaniel-may nathaniel-may removed the bug Something isn't working label Jan 10, 2024
@mikealfare mikealfare added the refinement Product or leadership input needed label Jan 17, 2024
@mikealfare mikealfare added support and removed refinement Product or leadership input needed labels Jan 30, 2024
@dataders dataders added the ra3_node issues relating to ra3 node support label Mar 26, 2024
@jayceslesar
Copy link

jayceslesar commented Apr 30, 2024

Lacking some context as I am not a DBT user but from what I can tell Redshift does not support cross DB queries on some system level tables...This might be able to be bypassed with some really really broad permission scopes and maybe thats okay depending on your user but AWS support was unable to help me on this one

@mikealfare mikealfare added bug Something isn't working and removed support labels Jul 17, 2024
@laflet
Copy link

laflet commented Oct 8, 2024

Please note, in redshift we can't use the three-part notation or external schemas to query cross-database metadata tables or views under information_schema and pg_catalog. Noted here: https://docs.aws.amazon.com/redshift/latest/dg/cross-database_usage.html

To get metadata across databases, we can only use SVV_ALL* and SVV_REDSHIFT* metadata views like SVV_ALL_TABLES.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working ra3_node issues relating to ra3 node support
Projects
None yet
Development

No branches or pull requests

8 participants