wr.athena.to_iceberg - Insert query has mismatched column types #2678

Mroq93 · 2024-02-16T19:35:36Z

Describe the bug

I try to save several Data Frames to Iceberg table using wr.athena.to_iceberg.
A few incremental savings go without any issues, but after some iteration I am getting error:
TYPE_MISMATCH: Insert query has mismatched column types: Table: [varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, timestamp(6), varchar, varchar, varchar, varchar, varchar, varchar, varchar], Query: [varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, varchar, timestamp(3), varchar, varchar, varchar, varchar, varchar, varchar]. If a data manifest file was generated at 's3://bucket-temp/athena/results/c0c18807-2773-4afc-b95c-580034d960ed-manifest.csv', you may need to manually clean the data from locations specified in the manifest. Athena will not delete data in your account.

I see 2 differences between schemas:

In the table, there is timestamp(6) but the next saving to iceberg recognizes column as timestamp(3)
the number of types is different by one

Before saving, I cast Dataframe timestamp columns to the same format to be sure that every timestamp is aligned.

        dataframe[name] = pd.to_datetime(
            dataframe[name], format='ISO8601'
        )

When I print this timestamp, column for every Dataframe column format is the same.

In dtype in wr.athena.to_iceberg , for timestamp column I provide type as timestamp, I can not provide precision - it is not supported.

I am not sure if the matter of different number of columns should be an issue. I guess it was resolved here:
#2616

PS.
The order of columns does matter?

How to Reproduce

                wr.athena.to_iceberg(
                    df=df,
                    database=database,
                    table=tbl_name,
                    table_location=s3_target_path,
                    temp_path=f"{s3_target_path}temp/{ini_time}",
                    partition_cols={"bookmark_date_str"},
                    workgroup=work_group_name,
                    schema_evolution=True,
                    dtype=columns_for_iceberg,
                )

Expected behavior

No issues with timestamp precision
No issues when saving DataFrames with different schema(missing or additional columns)

Your project

No response

Screenshots

No response

OS

AWS

Python version

3.9

AWS SDK for pandas version

3.5.2

Additional context

No response

The text was updated successfully, but these errors were encountered:

kukushking · 2024-03-04T13:18:24Z

Hi @Mroq93 thanks for opening this - looking into it.

GalVishi · 2024-03-10T10:31:24Z

Hello @kukushking, do you have any news or updates regarding the bug that we discussed earlier?

GalVishi · 2024-03-10T11:14:56Z

Just need to change this line:

aws-sdk-pandas/awswrangler/athena/_write_iceberg.py

Line 489 in da4ba40

    
           sql_statement = f'INSERT INTO "{database}"."{table}" SELECT * FROM "{database}"."{temp_table}"'

sql_statement = f'INSERT INTO "{database}"."{table}" SELECT {', '.join([f'"{x}"' for x in df.columns])} FROM "{database}"."{temp_table}"'

…2678 (#2715)

Mroq93 · 2024-03-14T13:48:08Z

Hi @jaidisido @GalvFionic ,
I tried 3.7.1 awswrangler version, but still I am facing the issue.
Is the fix available in the latest version?

Mroq93 added the bug Something isn't working label Feb 16, 2024

github-actions bot added the needs-triage label Feb 21, 2024

jaidisido removed the needs-triage label Mar 4, 2024

GalVishi added a commit to GalVishi/aws-sdk-pandas that referenced this issue Mar 10, 2024

wr.athena.to_iceberg - Insert query has mismatched column types aws#2678

12f9ec4

GalVishi mentioned this issue Mar 10, 2024

fix: wr.athena.to_iceberg - Insert query has mismatched column types #2678 #2715

Merged

LeonLuttenberger pushed a commit that referenced this issue Mar 11, 2024

fix: wr.athena.to_iceberg - Insert query has mismatched column types #…

e2c960b

…2678 (#2715)

jaidisido linked a pull request Mar 12, 2024 that will close this issue

fix: wr.athena.to_iceberg - Insert query has mismatched column types #2678 #2715

Merged

jaidisido closed this as completed Mar 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

wr.athena.to_iceberg - Insert query has mismatched column types #2678

wr.athena.to_iceberg - Insert query has mismatched column types #2678

Mroq93 commented Feb 16, 2024 •

edited

Loading

kukushking commented Mar 4, 2024

GalVishi commented Mar 10, 2024

GalVishi commented Mar 10, 2024 •

edited

Loading

Mroq93 commented Mar 14, 2024

wr.athena.to_iceberg - Insert query has mismatched column types #2678

wr.athena.to_iceberg - Insert query has mismatched column types #2678

Comments

Mroq93 commented Feb 16, 2024 • edited Loading

Describe the bug

How to Reproduce

Expected behavior

Your project

Screenshots

OS

Python version

AWS SDK for pandas version

Additional context

kukushking commented Mar 4, 2024

GalVishi commented Mar 10, 2024

GalVishi commented Mar 10, 2024 • edited Loading

Mroq93 commented Mar 14, 2024

Mroq93 commented Feb 16, 2024 •

edited

Loading

GalVishi commented Mar 10, 2024 •

edited

Loading