[Bug] Large Time Gaps Between dbt Statements (Suspected Python GIL Issue) #786
Labels
feature:transactions
Issues related to managing database transactions
pkg:dbt-redshift
Issue affects dbt-redshift
type:bug
Something isn't working as documented
Is this a new bug in dbt-redshift?
Current Behavior
Description:
We're observing significant time gaps (up to 2 minutes) between dbt statements (COMMIT and BEGIN/ROLLBACK) in our Redshift data pipeline, leading to increased model processing times. This issue affects multiple models and occurs in various stages of the dbt run. Initial investigation points to a potential bottleneck related to the Python Global Interpreter Lock (GIL).
Steps to Reproduce:
Example:
https://getdbt.slack.com/archives/CJARVS0RY/p1738171778055219
Observed Behavior:
Suspected Root Cause:
The observed behavior strongly suggests a bottleneck related to the Python GIL. When multiple threads are used, the GIL likely prevents true parallel execution of dbt operations, leading to queuing and delays between statements.
Workaround:
Running dbt with a single thread eliminates the time gaps, but significantly increases the overall processing time.
Impact:
Increased dbt run times, impacting data freshness and downstream processes.
Attachments:
Questions:
(https://gtm-roche.slack.com/archives/D03G68BDEDS/p1738743200437589)
https://medium.com/@mitesh.singh.jat/gil-becomes-optional-in-python-3-13-a-game-changer-for-multithreading-4c5d28856803
Expected Behavior
dbt statements within a single model should execute consecutively with minimal delay.
Steps To Reproduce
Relevant log output
Environment
Additional Context
No response
The text was updated successfully, but these errors were encountered: