Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Performance Issue]: fetchall is 3x slower with SQLAlchemy + duckdb_engine than DuckDB API #1179

Open
1 task done
bskubi opened this issue Dec 18, 2024 · 0 comments
Open
1 task done
Assignees
Labels
bug Something isn't working

Comments

@bskubi
Copy link

bskubi commented Dec 18, 2024

What happened?

Switching from writing queries in raw SQL with DuckDB's API for execution to SQLAlchemy + duckdb_engine, I have found performance slowdowns of 3-50x.

For example:

This query using DuckDB runs at about 0.026s/query on average using random values for the parameterization.

    WITH
    contigs_filtered AS (
        SELECT id FROM contigs WHERE name = ? LIMIT 1
    ),
    region_blocks AS (
        SELECT ublocks.id, ublocks.ge
        FROM ublocks
        INNER JOIN contigs_filtered 
        ON ublocks.contig_id = contigs_filtered.id
        WHERE ublocks.ge >= ? AND ublocks.ge <= ?
    )

    SELECT 
        counts2d.count AS count, 
        b1.ge AS ge1, 
        b2.ge AS ge2
    FROM 
        counts2d
    JOIN region_blocks b1 
        ON counts2d.block1_id = b1.id
    JOIN region_blocks b2 
        ON counts2d.block2_id = b2.id;

I tried converting this to SQLAlchemy in the following manner:

        contigs_filtered = (
            al.select(contigs.c.id)
            .where(contigs.c.name == al.bindparam("chromname"))
            .limit(1)
            .cte("contigs_filtered")
        )
        blocks1 = (
            al.select(ublocks.c.id, ublocks.c.ge)
            .join_from(ublocks, contigs_filtered, ublocks.c.contig_id == contigs_filtered.c.id)
            .where(
                ublocks.c.ge >= al.bindparam("ge"),
                ublocks.c.ge <= al.bindparam("le")
            )
            .cte("region_blocks")
        )
        blocks2 = al.alias(blocks1)
        stmt = (
            al.select(
                counts2d.c.count.label("count"),
                blocks1.c.ge.label("ge1"),
                blocks2.c.ge.label("ge2")
            )
            .join_from(counts2d, blocks1, counts2d.c.block1_id == blocks1.c.id)
            .join_from(counts2d, blocks2, counts2d.c.block2_id == blocks2.c.id)
        )

print(stmt) gives:

WITH contigs_filtered AS 
(SELECT contigs.id AS id 
FROM contigs 
WHERE contigs.name = :chromname
 LIMIT :param_1), 
region_blocks AS 
(SELECT ublocks.id AS id, ublocks.ge AS ge 
FROM ublocks JOIN contigs_filtered ON ublocks.contig_id = contigs_filtered.id 
WHERE ublocks.ge >= :ge AND ublocks.ge <= :le)
 SELECT counts2d.count AS count, region_blocks.ge AS ge1, anon_1.ge AS ge2 
FROM counts2d JOIN region_blocks ON counts2d.block1_id = region_blocks.id JOIN region_blocks AS anon_1 ON counts2d.block2_id = anon_1.id

Which appears to be functionally the same, but runs about 50x slower.

Even a simple select statement runs 3x slower compared with DuckDB's raw SQL (pasting the SQL output from printing the SQLAlchemy statement directly into conn.execute() for DuckDB). I timed it, and almost all the relative performance slowdown for simple selection appears to be on the fetch step.

Compiling the statement produced by SQLAlchemy into raw SQL and then executing and retrieving the result with DuckDB's conn.execute(sql) works quickly.

Please let me know if you think there's something I am missing here that's resulting in the performance hit. Despite this issue, duckdb_engine is great software and much appreciated - worst case scenario, compiling to raw SQL and running with DuckDB's native API for execution and fetching may work for my purposes.

Thank you!

DuckDB Engine Version

0.14.0

DuckDB Version

1.1.3

SQLAlchemy Version

2.0.36

Relevant log output

No response

Code of Conduct

  • I agree to follow this project's Code of Conduct
@bskubi bskubi added the bug Something isn't working label Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants