Better SQLite connection handling #45

ivanprado · 2014-12-12T15:36:47Z

Currenly we have a pool of threads to resolve queries at DNode, and each has a threadlocal cached connection for each partition. So we can potentially have pool_size * num_partitions number of SQLite connections opened. That is too much, and it is also ineffective, as it is very unlikely to use a cached connection.

Change current system to use SQLite Job Queues to solve this problem. See https://code.google.com/p/sqlite4java/wiki/JobQueue

The idea is to try to minimize the number of cached connections, having a flexible pool of connections per partition that can adapt to load.

pereferrera · 2014-12-12T17:12:52Z

The JobQueue is something interesting which wasn't probably available 2 years ago (or we didn't notice).

A few questions that come to my mind:

Wouldn't using a single thread per partition with a JobQueue cause contention on highly demanded partitions (hot spots)?
This looks like we would be using an asynchronous API for resolving queries. How would that affect the current architecture / process when serving a query? Or would we use the blocking read approach?

My thoughts:

It is true that with: 1) a very large number of partitions, 2) a large pool size, 3) a tablespace where all partitions are equally used, and 4) not many queries per second, then connections aren't likely to be reused often right now... Is this really the prototypical scenario? Even with 1200 partitions, 64 threads and 10 queries per second, after 2 hours all the connections would be cached, if iterating through the whole tablespace 64 times.

It would be also interesting to see what is the performance penalty of using a new connection for a query, relative to the query time itself. My opinion is that this kind of penalty might be relevant when we are talking about several queries per second, in which case connections would be likely reused.

The inconvenience I do see is that the ulimit of the systems must be adapted. So if we continue with what we have, maybe we should indicate this properly on the user guide or on the FAQ.

ivanprado · 2014-12-12T19:31:50Z

One thread per partition:

I'm not thinking in using just a single thread per partition. That's obviously not acceptable. I'm thinking in having a pool of connections per partition that are lazy initialized. The size of the pool after some time being executed should be as big as the amount of concurrent queries that were executed withing this partition in a particular moment.

In other words, if the load is not very high, it is very likely that having just one one connection cached per each partition will be enough to serve queries. But if at some point there are two or three queries in parallel running for the same partition, then the pool should grow. The maximum size of the pool should be controllable, as it is now. If the pool is full, the rest of queries for this partition are queued up.

Asynchronous execution

Nothing is going to change. Right now we are using an executor to control the execution of queries. That is going to be exactly the same. We can use the blocking read approach.

Performance penalty

Very interesting to know. We should run some tests to really know the overhead of opening connections vs reusing them.

What we have now

From my point of view, is completely useless. It would be same to open a new connection for every query, code would be less complex and query times would be in average quite similar. If a system can wait two hours to have queries cached, then it can also run without having them cached.

In other words my conclusion is that I would only consider to follow one of the following options:

Simplify code and open one connection per each query. That is, remove current caching system.
Implement the Job system described in this ticket.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better SQLite connection handling #45

Better SQLite connection handling #45

ivanprado commented Dec 12, 2014

pereferrera commented Dec 12, 2014

ivanprado commented Dec 12, 2014

Better SQLite connection handling #45

Better SQLite connection handling #45

Comments

ivanprado commented Dec 12, 2014

pereferrera commented Dec 12, 2014

ivanprado commented Dec 12, 2014