[ADAP-634] [Bug] Redshift fails to run several threads #501

sphinks · 2023-06-20T15:45:35Z

Is this a new bug in dbt-redshift?

I believe this is a new bug in dbt-redshift
I have searched the existing issues, and I could not find an existing issue for this bug

Current Behavior

I have found several issues about performance degradation while using RA3 node. However, I'm expecting failure of dbt run if I switch from 1 thread to several.

My DBT profile:

    dev:
      type: redshift
      host: redshift-host

      user: "{{ env_var('DBT_USER') }}"
      password: "{{ env_var('DBT_PASSWORD') }}"
      port: 5439
      dbname: dev
      schema:  dev
      threads: 4
      ra3_node: true

The error I get:

18:08:46    could not open relation with OID 615797

That happens not at the same model, sometimes the first model could fail. Other time it could be model at the end of list.
What helps a little bit - setup connect_timeout: 999999. I have found that solution at some bug. It still fails, but happens not at the very begining.

What I'm using:
Redshift RA3 cluster
DBT - 1.3.2
dbt-redshift - 1.3.0

I try to switch to the latest version of dbt (1.5.1) and dbt-redshift (1.5.5), not luck, but the error message became ' The read operation timed out'.

Expected Behavior

DBT runs in parallel with out any issues.

Steps To Reproduce

Setup dbt project.
Start with specified DBT profile and 1 thread.
Switch to several threads and rerun project.

Relevant log output

No response

Environment

- OS:Windows Server 2016 (the same issue happens with remote dbt running on Linux OS)
- Python:Python 3.9.13
- dbt-core: 1.3.2
- dbt-redshift: 1.3.0

Additional Context

No response

The text was updated successfully, but these errors were encountered:

sphinks · 2023-06-21T18:24:12Z

Another finding here:
Parallel run works great with any model except models with reference to source. If I run parallel run with staging layer (simple sql to bring the table into DBT layout with {{ source('some_landing', 'table') }} -I get that issue. If i run everything except those: works great!
Not sure if it give any clue, but hope it will narrow the scope a little bit.

I also find the issue: #427 that I suppose could be the reason why my task run with 1.5.1 did not work. So I try to rollback to 1.4.6 version and still no luck.

dataders · 2023-06-22T00:14:26Z

@sphinks, you're right -- this is some strange behavior, you've certainly piqued my interest!

clarification questions:

you're seeing this issue on versions 1.3, 1.4, and 1.5 of the adapter?
does this timeout/latency issue persist after the first attempt to parallelize? Do the second and third time you try running with multi-threads fail in the same way as the first time?
could you perhaps fork https://github.com/dbt-labs/jaffle_shop and provide the exact commands to reproduce the error? specifically I'd love to see commands with --select and --threads so it's exactly reproducible

dataders · 2023-06-22T17:04:01Z

in the dbt Community Slack's #db-redshift channel, someone suggested that you may be encountering an issue with autoWLM.

Is AutoWLM enabled or disabled? if it is enabled, disabled it, configure your queues manually, and try it again, and see if the problem goes away.

sphinks · 2023-06-23T15:44:32Z

@dataders thanks for you attention to the issue.

I could see the issue on all versions: 1.3, 1.4, 1.5. The only difference - message that returned during the error. 1.3 - could not open relation with OID and after 1.4 version it returns The read operation timed out. I suppose it is just better error handling in new versions of dbt-redshfit.
Issue happens every time I try to run in parallel. It does not fail on the same model and does not take the same amount of time. It fails during the first iteration almost every time. I run in 4 threads, so during first iteration every thread take the first model to execute some of thread will fail with first model. Looks like it happens randomly. What is notice if I reduce number of threads and increase to max value (connect_timeout: 999999) it could run longer (sometimes much longer). With 2 threads you could get a fails on 3 iteration. It gives an idea that threads are not allowed to access some shared resource at the same time: less thread we have less chances to hit that limit.
I try to use https://github.com/dbt-labs/jaffle_shop to reproduce the errors, but there are significant difference: it does not use source macro in staging and it has only 3 staging that are pretty small. It does not reproduce an error for some reason (models too small to create concurrent environment or because they do not use source macro - at least that difference I could see).
About autoWLM: that could be the case, but something consufing me here. I believe a lot of DBT users able to run DBT with autoWLM and have no issue. Is it some case of bad luck of autoWLM if that helps? :) I will to switch it, but it is not so easy as we have pretty heavy used project and that change could cause some unexpected issues.

sphinks · 2023-06-26T17:45:54Z

Here is another detail that I have figured out: the issues happens with models that involves 2 databases. I have landing layer in one database and staging layer in another database. Could it be the root cause of issue with paralell run over dbt?

dataders · 2023-06-27T22:40:04Z

@sphinks the multi-db is an interesting scenario. I talked with some engineers at Redshift, they recommend that you open an AWS support ticket for this issue. once you do, they can grab the ticket if you share the case id here.

sphinks · 2023-06-29T16:37:36Z

@dataders I believe, I have found the issue: it is combination of cross-database setup and grant post hook. In my project I has a post-hook that is doing "grant select on all tables in schema...". It means that operation is done after each model is built. I suppose having several threads trying to do that operation on cross-database level - lead to issue. I do not know exactly what is wrong here, but switch to "grant select on table ..." helped.

Most interesting aspect that the same post hook is working in parallel run inside 1 database. That is confusing.

sphinks · 2023-06-30T15:39:04Z

Closing the issue.

sphinks added bug Something isn't working triage labels Jun 20, 2023

github-actions bot changed the title ~~[Bug] Redshift fails to run several threads~~ [ADAP-634] [Bug] Redshift fails to run several threads Jun 20, 2023

dataders added the incremental label Jun 23, 2023

dataders removed the triage label Jun 27, 2023

sphinks closed this as completed Jun 30, 2023

dataders removed the incremental label Jul 4, 2023

rlh1994 mentioned this issue Nov 13, 2023

[ADAP-1030] [Regression] Back-to-back hooks with a transaction using CTAS + DML fails in 1.5+ #663

Open

2 tasks

dataders added the ra3_node issues relating to ra3 node support label Mar 27, 2024

dataders mentioned this issue Mar 27, 2024

[Feature] Datasharing: cross-database support #742

Open

victor-frank-signet mentioned this issue Jul 26, 2024

[SPIKE] Investigate state of transactions #773

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ADAP-634] [Bug] Redshift fails to run several threads #501

[ADAP-634] [Bug] Redshift fails to run several threads #501

sphinks commented Jun 20, 2023 •

edited

Loading

sphinks commented Jun 21, 2023 •

edited

Loading

dataders commented Jun 22, 2023

dataders commented Jun 22, 2023

sphinks commented Jun 23, 2023

sphinks commented Jun 26, 2023

dataders commented Jun 27, 2023

sphinks commented Jun 29, 2023 •

edited by dbeatty10

Loading

sphinks commented Jun 30, 2023

[ADAP-634] [Bug] Redshift fails to run several threads #501

[ADAP-634] [Bug] Redshift fails to run several threads #501

Comments

sphinks commented Jun 20, 2023 • edited Loading

Is this a new bug in dbt-redshift?

Current Behavior

Expected Behavior

Steps To Reproduce

Relevant log output

Environment

Additional Context

sphinks commented Jun 21, 2023 • edited Loading

dataders commented Jun 22, 2023

dataders commented Jun 22, 2023

sphinks commented Jun 23, 2023

sphinks commented Jun 26, 2023

dataders commented Jun 27, 2023

sphinks commented Jun 29, 2023 • edited by dbeatty10 Loading

sphinks commented Jun 30, 2023

sphinks commented Jun 20, 2023 •

edited

Loading

sphinks commented Jun 21, 2023 •

edited

Loading

sphinks commented Jun 29, 2023 •

edited by dbeatty10

Loading