-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
can hyper python API use multi-core? #99
Comments
Thanks for bringing this up! Yes, Hyper can use multiple cores. In this particular case, the input set is so tiny, that Hyper will not benefit much from multi-threading, though. Hyper's full performance will only be unleashed on much bigger data sets than 17 megabyte. I would recommend testing Hyper with data sizes of at least a couple of gigabytes. However, I guess your actual question is not about multi-core anyway. I guess you are rather wondering: "Why is Hyper slower than DuckDB on those queries?". Let's take a closer look at this 🙂 The trick is to use CREATE TEMPORARY EXTERNAL TABLE. The difference is:
Here is an updated benchmark:
Note how I first declared an external table, and then used it in the following queries. This gives me the following numbers:
Note how the first time we run the first query is rather slow. This is because Hyper computes some statistics on the external table the firs time you access it. Those statistics are important to Hyper's optimizer such that it will pick a good query plan. For the simple queries we are benchmarking here, those statistics won't make much of a difference, but for more complex join queries, those statistics are vital. The updated performance numbers of Hyper are already much closer to DuckDB. Still slightly slower - we could tune Hyper further but I am not sure this would make sene. Your benchmark data is pretty small and Hyper is more tuned towards larger data sets. I would be interested in which performance your benchmark yields on larger data sets |
@vogelsgesang |
test of python module duckdb
query on CREATE view of duckdb is faster than CREATE TEMPORARY EXTERNAL TABLE of hyper |
returns
while duckdb CLI on same machine query same file
The text was updated successfully, but these errors were encountered: