feat: Add more settings and tunability to snapshot table and dictionary #28062

tkaemming · 2025-01-29T22:23:20Z

Problem

If max_memory_usage is set on the cluster client settings, the INSERT INTO statement in the populate step can fail due to the large aggregation needed to ensure we have the latest value for each override in the overrides table.

Changes

optimize_aggregation_in_order reduces the memory (which we don't have enough of) at the cost of some time (which we have plenty of.)

This also allows providing a max_memory_usage parameter to the dictionary as an additional safeguard against excessive memory usage on the cluster during job execution.

Does this work well for both Cloud and self-hosted?

Yes

How did you test this code?

Already covered by existing test, and ensured optimize_aggregation_in_order improved the peak memory usage of the relevant query in production

greptile-apps

PR Summary

This PR adds memory optimization settings to the person overrides squashing process to prevent out-of-memory errors during large data operations.

Added optimize_aggregation_in_order=1 in dags/person_overrides.py to reduce memory usage during snapshot table population
Added max_memory_usage parameter to dictionary creation for controlling memory consumption
Added max_execution_time parameter to dictionary creation for timeout control
Improved memory efficiency by trading execution speed for lower memory usage during aggregation steps

_{1 file(s) reviewed, no comment(s)}
_{Edit PR Review Bot Settings | Greptile}

…ry (#28062)

tkaemming added 2 commits January 29, 2025 14:10

use optimize_aggregation_in_order when populating

b664a5e

add max_memory_usage setting

426aba9

tkaemming marked this pull request as ready for review January 29, 2025 22:29

tkaemming requested a review from a team January 29, 2025 22:29

greptile-apps bot reviewed Jan 29, 2025

View reviewed changes

fuziontech approved these changes Jan 30, 2025

View reviewed changes

tkaemming merged commit fdee4c8 into master Jan 30, 2025
95 checks passed

tkaemming deleted the more-squash-settings branch January 30, 2025 00:02

danielbachhuber pushed a commit that referenced this pull request Jan 30, 2025

feat: Add more settings and tunability to snapshot table and dictiona…

699232a

…ry (#28062)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Add more settings and tunability to snapshot table and dictionary #28062

feat: Add more settings and tunability to snapshot table and dictionary #28062

tkaemming commented Jan 29, 2025

greptile-apps bot left a comment

feat: Add more settings and tunability to snapshot table and dictionary #28062

feat: Add more settings and tunability to snapshot table and dictionary #28062

Conversation

tkaemming commented Jan 29, 2025

Problem

Changes

Does this work well for both Cloud and self-hosted?

How did you test this code?

greptile-apps bot left a comment

Choose a reason for hiding this comment

PR Summary