-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Benchmark Parquet indices for duckdb-wasm performance #5
Comments
Have you looked at the monolithic EPA CEMS hourly parquet file? I think it might be the only one with logical row-groups, since we write in year-state chunks. But anyway if we can add a few meaningful indices to the Parquet files and make them faster and easier to query that sounds great! |
I'm curious to see if the indices improve performance without having to also create meaningful partitions (which is what I think you meant by the CEMS logical row-groups thing? Maybe I misunderstood.) |
But no, I haven't looked at CEMS since its metadata wasn't in the |
Oh, indices were also one potential explanation @bendnorman mentioned for why |
Overview
DuckDB will send a bunch of requests (~10-20 for most tables, but up to several hundred for the VCE RARE data) to download data. It looks like our Parquet files do not have any column indices - let's see if adding some helps.
Success Criteria
How will we know that we're done?
Next steps
The text was updated successfully, but these errors were encountered: