-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Better SQLite connection handling #45
Comments
The JobQueue is something interesting which wasn't probably available 2 years ago (or we didn't notice). A few questions that come to my mind:
My thoughts: It is true that with: 1) a very large number of partitions, 2) a large pool size, 3) a tablespace where all partitions are equally used, and 4) not many queries per second, then connections aren't likely to be reused often right now... Is this really the prototypical scenario? Even with 1200 partitions, 64 threads and 10 queries per second, after 2 hours all the connections would be cached, if iterating through the whole tablespace 64 times. It would be also interesting to see what is the performance penalty of using a new connection for a query, relative to the query time itself. My opinion is that this kind of penalty might be relevant when we are talking about several queries per second, in which case connections would be likely reused. The inconvenience I do see is that the ulimit of the systems must be adapted. So if we continue with what we have, maybe we should indicate this properly on the user guide or on the FAQ. |
One thread per partition: I'm not thinking in using just a single thread per partition. That's obviously not acceptable. I'm thinking in having a pool of connections per partition that are lazy initialized. The size of the pool after some time being executed should be as big as the amount of concurrent queries that were executed withing this partition in a particular moment. In other words, if the load is not very high, it is very likely that having just one one connection cached per each partition will be enough to serve queries. But if at some point there are two or three queries in parallel running for the same partition, then the pool should grow. The maximum size of the pool should be controllable, as it is now. If the pool is full, the rest of queries for this partition are queued up. Asynchronous execution Nothing is going to change. Right now we are using an executor to control the execution of queries. That is going to be exactly the same. We can use the blocking read approach. Performance penalty Very interesting to know. We should run some tests to really know the overhead of opening connections vs reusing them. What we have now From my point of view, is completely useless. It would be same to open a new connection for every query, code would be less complex and query times would be in average quite similar. If a system can wait two hours to have queries cached, then it can also run without having them cached. In other words my conclusion is that I would only consider to follow one of the following options:
|
Currenly we have a pool of threads to resolve queries at DNode, and each has a threadlocal cached connection for each partition. So we can potentially have pool_size * num_partitions number of SQLite connections opened. That is too much, and it is also ineffective, as it is very unlikely to use a cached connection.
Change current system to use SQLite Job Queues to solve this problem. See https://code.google.com/p/sqlite4java/wiki/JobQueue
The idea is to try to minimize the number of cached connections, having a flexible pool of connections per partition that can adapt to load.
The text was updated successfully, but these errors were encountered: