feat(mapping-optimizer): Support in operator for mapping optimizer #5685

Zylphrex · 2024-03-25T17:33:17Z

This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like

sentry_tags[key] IN (value1, value2)

This results in a sql like

in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1'])

which scans the entire sentry_tags.key and sentry_tags.value columns. The optimization here is to use the tags hash map which gives us a condition like

hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1')))

This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like `sentry_tags[key] IN (value1, value2)`. This results in a sql like `in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1'])` which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like `hasAny(_sentry_tags_hash_map, array(cityHash64('environment=prod'), cityHash64('environment=production')))`.

codecov · 2024-03-25T17:59:12Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.93%. Comparing base (f5f9208) to head (cfe6365).
Report is 1 commits behind head on master.

✅ All tests successful. No failed tests found ☺️

Additional details and impacted files

@@           Coverage Diff           @@
##           master    #5685   +/-   ##
=======================================
  Coverage   89.92%   89.93%           
=======================================
  Files         898      898           
  Lines       43453    43474   +21     
  Branches      299      299           
=======================================
+ Hits        39077    39098   +21     
  Misses       4334     4334           
  Partials       42       42

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Zylphrex · 2024-03-25T18:59:22Z

snuba/query/processors/physical/mapping_optimizer.py

@@ -265,7 +351,7 @@ def _get_condition_without_redundant_checks(
                    if tag_exist_match:
                        matched_tag_exists_conditions[condition_id] = tag_exist_match
                if not tag_exist_match:
-                    eq_match = self.__optimizable_pattern.match(cond)
+                    eq_match = self.__equals_condition_pattern.match(cond)


@volokluev would you happen to know if I need to implement this removing of redundant checks for IN conditions?

you could but it's not strictly necessary. I'm not sure how often we get those cases with IN conditions. Definitely something that can be added later

I think this is less common on the older datasets but more likely to happen with the spans dataset as we have sentry_tags which contains some more commonly used columns.

The example I ran into was with environment. For a 24h period, it read >48GiB of data, and after applying this optimization, I saw it was reduced to <24GiB of data. On 7 day periods, the query was already timing out. So this optimization should already be helpful

oh yes absolutely. Merge the PR. I was saying that the redundant clause optimization is probably not going to be as applicable for IN clauses

getsentry-bot · 2024-03-26T16:51:26Z

PR reverted: 8c6329d

…mizer (#5685)" This reverts commit cf89313. Co-authored-by: Zylphrex <[email protected]>

Re-apply #5685 This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like ``` sentry_tags[key] IN (value1, value2) ``` This results in a sql like ``` in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1']) ``` which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like ``` hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1'))) ```

…5691) Re-apply #5685 This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like ``` sentry_tags[key] IN (value1, value2) ``` This results in a sql like ``` in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1']) ``` which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like ``` hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1'))) ```

…5691) (#5692) Re-apply #5685 This was a TODO item. But on the spans dataset, one easy to encounter situation is a condition like ``` sentry_tags[key] IN (value1, value2) ``` This results in a sql like ``` in((arrayElement(sentry_tags.value, indexOf(sentry_tags.key, 'key')) AS `_snuba_sentry_tags[key]`), ['value1', 'value1']) ``` which scans the entire `sentry_tags.key` and `sentry_tags.value` columns. The optimization here is to use the tags hash map which gives us a condition like ``` hasAny(_sentry_tags_hash_map, array(cityHash64('key=value1'), cityHash64('key=value1'))) ```

Zylphrex requested a review from a team as a code owner March 25, 2024 17:33

evanh approved these changes Mar 25, 2024

View reviewed changes

Zylphrex commented Mar 25, 2024

View reviewed changes

Zylphrex merged commit cf89313 into master Mar 26, 2024
32 checks passed

Zylphrex deleted the txiao/feat/support-in-operator-for-mapping-optimizers branch March 26, 2024 14:14

Zylphrex added the Trigger: Revert label Mar 26, 2024

getsentry-bot added a commit that referenced this pull request Mar 26, 2024

Revert "feat(mapping-optimizer): Support in operator for mapping opti…

8c6329d

…mizer (#5685)" This reverts commit cf89313. Co-authored-by: Zylphrex <[email protected]>

Zylphrex mentioned this pull request Mar 26, 2024

feat(mapping-optimizer): Support in operator for mapping optimizer #5691

Merged

Zylphrex mentioned this pull request Mar 26, 2024

feat(mapping-optimizer): Support in operator for mapping optimizer #5692

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mapping-optimizer): Support in operator for mapping optimizer #5685

feat(mapping-optimizer): Support in operator for mapping optimizer #5685

Zylphrex commented Mar 25, 2024 •

edited

Loading

codecov bot commented Mar 25, 2024 •

edited

Loading

Zylphrex Mar 25, 2024

volokluev Mar 25, 2024

Zylphrex Mar 25, 2024

volokluev Mar 25, 2024

getsentry-bot commented Mar 26, 2024

feat(mapping-optimizer): Support in operator for mapping optimizer #5685

feat(mapping-optimizer): Support in operator for mapping optimizer #5685

Conversation

Zylphrex commented Mar 25, 2024 • edited Loading

codecov bot commented Mar 25, 2024 • edited Loading

Codecov Report

Zylphrex Mar 25, 2024

Choose a reason for hiding this comment

volokluev Mar 25, 2024

Choose a reason for hiding this comment

Zylphrex Mar 25, 2024

Choose a reason for hiding this comment

volokluev Mar 25, 2024

Choose a reason for hiding this comment

getsentry-bot commented Mar 26, 2024

Zylphrex commented Mar 25, 2024 •

edited

Loading

codecov bot commented Mar 25, 2024 •

edited

Loading