[Date Histogram] Investigate the safe number of buckets for which filter rewrite optimization can be applied #13549

bowenlan-amzn · 2024-05-06T02:27:43Z

Follow up tasks for #13317

The idea of the filter rewrite optimiaztion is to utilize the index structure instead of iterating over documents to get the buckets results. We are able to know how many buckets before the actual aggregate execution logic begins.

As the bucket counts increase or the number of documents that should be aggregated on decrease, the iterative method may become faster and the filter rewrite method may become slower.
Currently we have a cluster setting to define the supported bucket count but it may not always work. For example, if the dataset only has 3k different values and the aggregation query asks for 1024 buckets, it is too high and wouldn't be better than just iterating over; on the other hand, if the dataset has 100k different values, we can probably support more than 1024 buckets.

This task is to investigate some rules to decide whether the optimization should be used, dynamically depending on the dataset or the index.

The biggest part of overhead normally is when reading the values from documents. The bkd index structure has all the documents as leaf nodes and will only need to be traversed through when the leaf node is intersected with the query.
One idea here is to do a dummy traversal on the bkd tree to tell how many leaf node will be intersected, and how many middle node will be skipped, based on these 2 numbers, we can get a relatively accurate idea about the cost of certain range query.

andrross · 2024-05-08T15:36:47Z

[Triage - attendees 1 2 3 4]
@bowenlan-amzn Thanks for filing.

bowenlan-amzn · 2024-06-18T19:26:53Z

close because new issue #14438 will include this.

jainankitk added this to Performance Roadmap Nov 3, 2023

bowenlan-amzn self-assigned this May 6, 2024

bowenlan-amzn converted this from a draft issue May 6, 2024

github-actions bot added the untriaged label May 6, 2024

andrross added the Search:Performance label May 8, 2024

github-project-automation bot added this to Search Project Board May 8, 2024

github-project-automation bot moved this to 🆕 New in Search Project Board May 8, 2024

andrross added enhancement Enhancement or improvement to existing feature or request and removed untriaged labels May 8, 2024

bowenlan-amzn mentioned this issue Jun 18, 2024

[Profiling deep dive] Default aggregation vs. optimization code path #14438

Open

bowenlan-amzn closed this as completed Jun 18, 2024

github-project-automation bot moved this from Todo to Done in Performance Roadmap Jun 18, 2024

github-project-automation bot moved this from 🆕 New to ✅ Done in Search Project Board Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Date Histogram] Investigate the safe number of buckets for which filter rewrite optimization can be applied #13549

[Date Histogram] Investigate the safe number of buckets for which filter rewrite optimization can be applied #13549

bowenlan-amzn commented May 6, 2024

andrross commented May 8, 2024

bowenlan-amzn commented Jun 18, 2024

[Date Histogram] Investigate the safe number of buckets for which filter rewrite optimization can be applied #13549

[Date Histogram] Investigate the safe number of buckets for which filter rewrite optimization can be applied #13549

Comments

bowenlan-amzn commented May 6, 2024

andrross commented May 8, 2024

bowenlan-amzn commented Jun 18, 2024