Support QPS/TPS reporting #400

mpetri · 2020-06-01T16:44:20Z

Describe the solution you'd like
The bitfunnel paper contained some additions to the partitioned_elias_fano codebase which allowed measuring queries per second with multiple threads. see here: https://github.com/BitFunnel/partitioned_elias_fano/blob/master/Runner/QueryLogRunner.cpp#L56

It would be nice in addition to reporting the metrics by queries if pisa could provide such a multi threaded QPS benchmark tool.

Additional context

In practice one would be interested in metrics such as "if my p95 requirement is 100ms, how much QPS can I achieve with this index and this hardware configuration"

The text was updated successfully, but these errors were encountered:

JMMackenzie · 2020-06-10T03:41:54Z

@amallia @elshize @mpetri

I am thinking about this at the moment. Do we want to try to use TBB to facilitate the multi threading, or should we try to follow the example from the BitFunnel experiments (they seem to roll their own thread synchronization so they avoid timing the overhead of spooling up threads prior to processing). I suspect we can make a simple one-function addition to the queries binary to allow threads.

I know that evaluate_queries does parallel processing but it uses the tbb::parallel_for approach which might add some overhead to the timings? Not entirely sure...

elshize · 2020-06-10T13:10:19Z

@JMMackenzie When you're talking about overhead, do you mean the overhead of setting up thread pool and such, or also overhead of starting each thread?

If I understand correctly, the goal is this: given a set of queries and a number of threads, run the queries in the specified order and report the query throughput. Is this correct?

If this is the case, we could spin up a number of workers that would wait on a concurrent queue for a query in a loop. This way we could measure even in different modes:

begin to end,
each execution separately,
report data for each query: time of arrival, time finished (so also latency)

I personally think that measuring just the query processing without any synchronization is a bit unfair, because concurrent execution always will have overhead over sequential. But as I said, we can have different ways of measuring if we really want.

JMMackenzie · 2020-06-10T23:35:20Z

@JMMackenzie When you're talking about overhead, do you mean the overhead of setting up thread pool and such, or also overhead of starting each thread?

Right, I mean starting each thread.

If I understand correctly, the goal is this: given a set of queries and a number of threads, run the queries in the specified order and report the query throughput. Is this correct?

I agree, this is the goal.

If this is the case, we could spin up a number of workers that would wait on a concurrent queue for a query in a loop. This way we could measure even in different modes:
* begin to end,

* each execution separately,

* report data for each query: time of arrival, time finished (so also latency)
I personally think that measuring just the query processing without any synchronization is a bit unfair, because concurrent execution always will have overhead over sequential. But as I said, we can have different ways of measuring if we really want.

Okay, I agree and I am in favor of your suggested approach. I think we have capacity to do this via TBB right?

elshize · 2020-06-11T00:34:48Z

Yes, I believe TBB should do just fine. There are two concurrent queues: one unbounded and one bounded. Since currently all queries are loaded to memory at the beginning anyway, we don't need to bound the queue.

mpetri added the enhancement New feature or request label Jun 1, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support QPS/TPS reporting #400

Support QPS/TPS reporting #400

mpetri commented Jun 1, 2020 •

edited

Loading

JMMackenzie commented Jun 10, 2020

elshize commented Jun 10, 2020

JMMackenzie commented Jun 10, 2020

elshize commented Jun 11, 2020

Support QPS/TPS reporting #400

Support QPS/TPS reporting #400

Comments

mpetri commented Jun 1, 2020 • edited Loading

JMMackenzie commented Jun 10, 2020

elshize commented Jun 10, 2020

JMMackenzie commented Jun 10, 2020

elshize commented Jun 11, 2020

mpetri commented Jun 1, 2020 •

edited

Loading