Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] Oracle run_result.json has rows_affected = 0 always when data are inserted or updated into target table #134

Open
1 task done
cindyzh-ibm opened this issue Feb 22, 2024 · 14 comments
Assignees
Labels
bug Something isn't working

Comments

@cindyzh-ibm
Copy link

Is there an existing issue for this?

  • I have searched the existing issues

Current Behavior

run_results.json has rows_affected = 0 always even when there are data are inserted into target table or updated in the target table.

Screenshot 2024-02-22 at 11 11 56 AM

{
  "metadata": {
    "dbt_schema_version": "https://schemas.getdbt.com/dbt/run-results/v4.json",
    "dbt_version": "1.4.99",
    "generated_at": "2024-02-21T17:54:33.596062Z",
    "invocation_id": "5ca55e12-7cc4-43be-8ef2-754c4fb1fc0c",
    "env": {
      
    }
  },
  "results": [
    {
      "status": "success",
      "timing": [
        {
          "name": "compile",
          "started_at": "2024-02-21T17:54:31.748147Z",
          "completed_at": "2024-02-21T17:54:31.765072Z"
        },
        {
          "name": "execute",
          "started_at": "2024-02-21T17:54:31.801492Z",
          "completed_at": "2024-02-21T17:54:33.039715Z"
        }
      ],
      "thread_id": "Thread-2 (worker)",
      "execution_time": 1.3536255359649658,
      "adapter_response": {
        "_message": "OK",
        "rows_affected": 0
      },
      "message": "OK",
      "failures": null,
      "unique_id": "model.DBT_RUN_MATRIX_TEST_2_0.RUN_MATRIX_TGT_3"
    },
    {
      "status": "success",
      "timing": [
        {
          "name": "compile",
          "started_at": "2024-02-21T17:54:31.742269Z",
          "completed_at": "2024-02-21T17:54:31.763914Z"
        },
        {
          "name": "execute",
          "started_at": "2024-02-21T17:54:31.771594Z",
          "completed_at": "2024-02-21T17:54:33.061006Z"
        }
      ],
      "thread_id": "Thread-1 (worker)",
      "execution_time": 1.3784840106964111,
      "adapter_response": {
        "_message": "OK",
        "rows_affected": 0
      },
      "message": "OK",
      "failures": null,
      "unique_id": "model.DBT_RUN_MATRIX_TEST_2_0.RUN_MATRIX_TGT_1"
    }
  ],
  "elapsed_time": 4.044718503952026,
  "args": {
    
  }
}

Expected Behavior

Expected run_results.json have

  • code to indicate insert, update, or delete
  • corresponding rows_affected reflects the number of rows are either deleted, inserted, or updated.

Steps To Reproduce

Create a flow with Oracle as source and target,
Run the flow to get the run_results.json

Relevant log output using --debug flag enabled

No response

Environment

- OS:
- Python:
- dbt:1.4.9

What Oracle database version are you using dbt with?

No response

Additional Context

No response

@cindyzh-ibm cindyzh-ibm added the bug Something isn't working label Feb 22, 2024
@aosingh aosingh self-assigned this Feb 22, 2024
@aosingh
Copy link
Member

aosingh commented Feb 22, 2024

@cindyzh-ibm

Which python driver are you using ?

is it cx_Oracle or the new oracledb python driver.

dbt-oracle uses the python driver's cursor.rowcount attribute to populate rows_affected in run_results.json.

Please switch to the new driver if you have not done. To switch to the thin driver, you can set the following environment variable before running dbt

export ORA_PYTHON_DRIVER_TYPE=thin

This should populate rows_affected correctly.

@cindyzh-ibm
Copy link
Author

Thanks, @aosingh , Let me check and get back to you

@cindyzh-ibm
Copy link
Author

cindyzh-ibm commented Feb 23, 2024

Hi @aosingh
we are using oracledb python driver.
The default is cx_oracle which is deprecating, so we are using oracledb and we do using thin mode. Following is the piece from our log.

15 2/21/2024 09:59:32 INFO IIS-DSEE-TODT-00002 <dbt,0> 17:59:26  oracle adapter: Running in thin mode
16 2/21/2024 09:59:32 INFO IIS-DSEE-TODT-00002 <dbt,0> 17:59:26  Running with dbt=1.4.99 

This is our command

dbt
-command 'ORA_PYTHON_DRIVER_TYPE="thin" dbt run --vars \'{....}\' --profiles-dir "..."
;

@aosingh
Copy link
Member

aosingh commented Feb 23, 2024

Thank you @cindyzh-ibm.

What is the version of oracledb driver ?

python3 -c "import oracledb; print(oracledb.__version__)"

Also, please share the output of command

dbt --version

I have been trying to reproduce the problem with dbt-oracle==1.4.4 which is the latest version published in 1.4.x release line. Are you using incremental models ?

I tried a few simple cases which seem to work fine.

  1. dbt seed - Insertion works as expected
"results": [
        {
            "status": "success",
            "timing": [
                {
                    "name": "compile",
                    "started_at": "2024-02-23T01:26:39.822399Z",
                    "completed_at": "2024-02-23T01:26:39.822408Z"
                },
                {
                    "name": "execute",
                    "started_at": "2024-02-23T01:26:39.824222Z",
                    "completed_at": "2024-02-23T01:26:40.628122Z"
                }
            ],
            "thread_id": "Thread-1",
            "execution_time": 0.8198621273040771,
            "adapter_response": {
                "_message": "INSERT 5",
                "code": "INSERT",
                "rows_affected": 5
            },
            "message": "INSERT 5",
            "failures": null,
            "unique_id": "seed.dbt_adbs_test_project.seed"
        }
    ],
  1. Table materialization - Creation of Table also shows the rows affected
    "results": [
        {
            "status": "success",
            "timing": [
                {
                    "name": "compile",
                    "started_at": "2024-02-23T01:28:30.549139Z",
                    "completed_at": "2024-02-23T01:28:30.555564Z"
                },
                {
                    "name": "execute",
                    "started_at": "2024-02-23T01:28:30.556883Z",
                    "completed_at": "2024-02-23T01:28:31.258269Z"
                }
            ],
            "thread_id": "Thread-1",
            "execution_time": 0.730828046798706,
            "adapter_response": {
                "_message": "OK",
                "rows_affected": 5
            },
            "message": "OK",
            "failures": null,
            "unique_id": "model.dbt_adbs_test_project.people"
        }
    ],

@aosingh
Copy link
Member

aosingh commented Feb 23, 2024

cc: @anthony-tuininga

@cindyzh-ibm
Copy link
Author

@aosingh

sh-4.4$ python3 -c "import oracledb; print(oracledb.__version__)"
1.4.1
sh-4.4$ dbt --version
03:06:30  oracle adapter: Running in cx mode
03:06:30  oracle adapter: [WARNING]: cx_oracle is no longer maintained, use python-oracledb

To switch to python-oracledb set the environment variable ORA_PYTHON_DRIVER_TYPE=thin 

Read the guideline here: https://docs.getdbt.com/reference/warehouse-setups/oracle-setup#configure-the-python-driver-mode

Documentation for python-oracledb can be found here: https://oracle.github.io/python-oracledb/
Core:
  - installed: 1.4.99
  - latest:    1.7.8  - Update available!

  Your version of dbt-core is out of date!
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

Plugins:
  - bigquery:  1.4.1  - Update available!
  - oracle:    1.4.8  - Update available!
  - postgres:  1.4.99 - Update available!
  - redshift:  1.4.1  - Update available!
  - snowflake: 1.4.5  - Update available!
  - teradata:  1.4.99 - Update available!

  At least one plugin is out of date or incompatible with dbt-core.
  You can find instructions for upgrading here:
  https://docs.getdbt.com/docs/installation

@cindyzh-ibm
Copy link
Author

cindyzh-ibm commented Feb 23, 2024

For your question of "Are you using incremental models ?". No

{{
  config(
    materialized='datastage'
    , table_action='append'
    ....
  )
}}

@aosingh
Copy link
Member

aosingh commented Feb 23, 2024

@cindyzh-ibm ismaterialized='datastage' a custom materialization which you have implemented ? dbt doesn't implement "datastage" materialization.

Could you verify "target" relation is returned in the end ?

-- Return the relations created in this materialization
  {{ return({'relations': [target_relation]}) }}

@cindyzh-ibm
Copy link
Author

cindyzh-ibm commented Feb 23, 2024

@aosingh Yes, we use custom materialization.

No, we don't have "target" relation is returned in the end.
We prepare statement and run run_query macro which delegates execution to dbt core by means of specific adapter

{{
  config(
    materialized='datastage'
    
    , write_mode='update'
    , key_columns=['ID']
    
    , force_commit='false'
    
    , use_adapter_drop_statement='false'
  )
}}
{{
  config(
    materialized='datastage'
    
    , table_action='append'
    , write_mode='insert'
    
    , force_commit='false'
    
    , use_adapter_drop_statement='false'
  )
}}

@aosingh
Copy link
Member

aosingh commented Feb 23, 2024

@cindyzh-ibm
what is this statement ? is it an anonymous PL/SQL block or CTAS ? Details will help me reproduce the problem.

@cindyzh-ibm
Copy link
Author

@aosingh custom materialization prepares and submits sql statements

@aosingh
Copy link
Member

aosingh commented Feb 26, 2024

@cindyzh-ibm

I understand these are SQL statements. What is the last statement executed in your custom materialization?

dbt-oracle uses cursor.rowcount to get the number of rows affected

The documentation states that cursor.rowcount is a read-only attribute which specifies the number of rows that have currently been fetched from the cursor (for select statements) or that have been affected by the operation (for insert, update, delete and merge statements). For all other statements the value is always zero.

Immediately after the cursor.execute() call has been made, if no rows have been fetched, the result will be 0. If you call cursor.fetchone() then result will be 1 and if you call cursor.fetchmany(5) then the result will be 6, and so forth

Did you try returning the target relation in the end ?

-- Return the relations created in this materialization
  {{ return({'relations': [target_relation]}) }}

@cindyzh-ibm
Copy link
Author

cindyzh-ibm commented Feb 26, 2024

@aosingh ok.
This is insert statement

BEGIN
  EXECUTE IMMEDIATE '
 insert into "test"."RUN_MATRIX_TGT_1" ("ID", "NAME", "HIRE_DATE", "SALARY")
    select "ID", "NAME", "HIRE_DATE", "SALARY"
    from  "test"."test__dbt_tmp"
';
END;

Update statement

BEGIN
  EXECUTE IMMEDIATE '
  update "test"."RUN_MATRIX_TGT_3"
  set
  ("ID", "NAME", "HIRE_DATE", "SALARY")
  = ( select "ID", "NAME", "HIRE_DATE", "SALARY"
    from "test"."test__dbt_tmp"
    where 
                    "test"."test__dbt_tmp"."ID" = "test"."test"."ID"
                )
';
END;

I haven't tried following yet. So far we don't have "target" relation is returned in the end.

-- Return the relations created in this materialization
  {{ return({'relations': [target_relation]}) }}

@aosingh
Copy link
Member

aosingh commented Feb 26, 2024

@cindyzh-ibm

Thanks, you are executing a PL/SQL(BEGIN..END) block in the custom materialization.

The value for cursor.rowcount will be 0 after executing a PL/SQL block. Let me check if we can use sql%rowcount expression immediately to get the row affected and pass that to dbt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants