-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: deletes fail if there are too many ongoing mutations on any replica in the cluster #6351
Conversation
snuba/web/delete_query.py
Outdated
@with_span() | ||
def delete_from_storage( | ||
storage: WritableTableStorage, | ||
columns: Dict[str, list[Any]], | ||
attribution_info: Mapping[str, Any], | ||
max_ongoing_mutations: int = 5, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this number should be a runtime config option
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but can still use 5 as the default for the config
snuba/web/delete_query.py
Outdated
|
||
# fail if too many mutations ongoing | ||
cluster = storage.get_cluster() | ||
if cluster.is_single_node(): | ||
query = f""" | ||
SELECT count() as cnt | ||
FROM system.mutations | ||
WHERE table IN ({", ".join(map(repr, delete_settings.tables))}) AND is_done=0 | ||
""" | ||
else: | ||
query = f""" | ||
SELECT max(cnt) | ||
FROM ( | ||
SELECT hostname() as host, count() as cnt | ||
FROM clusterAllReplicas('{cluster.get_clickhouse_cluster_name()}', 'system', mutations) | ||
WHERE table IN ({", ".join(map(repr, delete_settings.tables))}) AND is_done=0 | ||
GROUP BY host | ||
) | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this piece that determines the amount of mutations its own function. Mock it in a test. then you can test this functionality
snuba/web/delete_query.py
Outdated
.results[0][0] | ||
) | ||
if num_ongoing_mutations > max_ongoing_mutations: | ||
raise TooManyOngoingMutationsError() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this exception be raised with a message saying how many mutations are ongoing? Possibly log the exception with relevant details like how many concurrent mutations are allowed etc.?
snuba/web/delete_query.py
Outdated
FROM ( | ||
SELECT hostname() as host, count() as cnt | ||
FROM clusterAllReplicas('{cluster.get_clickhouse_cluster_name()}', 'system', mutations) | ||
WHERE table IN ({", ".join(map(repr, delete_settings.tables))}) AND is_done=0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will combine the counts for the tables right? Can we group by table and check the max of the results? So the max of the max for the cluster query, and it would be the max of the single node
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think what you're asking for is what its currently doing. The max ongoing mutations, for this specific table, across all nodes.
The inner query will give something like
node-1-1, 4
node-1-2, 3
node-1-3, 0
then the outer query will give 4
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
but if there are two tables for a node it will combine them
SELECT count(), table
FROM system.mutations
WHERE table in ('outcomes_raw_local', 'outcomes_hourly_local')
GROUP BY table
If you don't group by table, the total count is 19
This is for one node. This won't be a problem for the search issues table, but I'm just saying if we ever need to delete from multiple tables it will combine the counts
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see what you mean now, fixed.
|
❌ 1 Tests Failed:
View the top 1 failed tests by shortest run time
To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard |
ae45579
to
52b072b
Compare
this is for jira ticket sns-2876
This PR makes it so that a delete request will fail if there are too many ongoing mutations on the table we are trying to delete from. "too many ongoing mutations" means that no replica can have over 5 ongoing. the number 5 can be changed.
TooManyOngoingMutationsError
if any replica has over 5 mutations ongoing for the given table.I wasn't sure how to write tests for this, please lmk if you have an idea how to test.
I verified by hand on our clickhouse sandbox that the queries look good and work.