-
-
Notifications
You must be signed in to change notification settings - Fork 57
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(metrics): Add a meta table to counters #5669
Conversation
This creates a table and a materialized view to populate it for storing meta information about metrics. This table is meant to satisfy queries that are trying to find metric_ids, tag keys and tag values, but are not interested in the values associated with the metrics. In theory this will eventually be done for all the metric types, but for now this is being used just for counters to test how well this solution actually solves the problems.
Codecov ReportAll modified and coverable lines are covered by tests ✅
✅ All tests successful. No failed tests found Additional details and impacted files@@ Coverage Diff @@
## master #5669 +/- ##
==========================================
+ Coverage 89.91% 89.92% +0.01%
==========================================
Files 894 897 +3
Lines 43393 43441 +48
Branches 299 299
==========================================
+ Hits 39017 39065 +48
Misses 4334 4334
Partials 42 42 ☔ View full report in Codecov by Sentry. |
This PR has a migration; here is the generated SQL -- start migrations
-- forward migration generic_metrics : 0030_add_record_meta_column
Local op: ALTER TABLE generic_metric_counters_raw_local ADD COLUMN IF NOT EXISTS record_meta UInt8 DEFAULT 0 AFTER materialization_version;
-- end forward migration generic_metrics : 0030_add_record_meta_column
-- backward migration generic_metrics : 0030_add_record_meta_column
Local op: ALTER TABLE generic_metric_counters_raw_local DROP COLUMN IF EXISTS record_meta;
-- end backward migration generic_metrics : 0030_add_record_meta_column
-- forward migration generic_metrics : 0031_counters_meta_table
Local op: CREATE TABLE IF NOT EXISTS generic_metric_counters_meta_aggregated_local (org_id UInt64, project_id UInt64, use_case_id LowCardinality(String), metric_id UInt64, tag_key String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, tag_values AggregateFunction(groupUniqArray, String), count AggregateFunction(sum, Float64)) ENGINE ReplicatedAggregatingMergeTree('/clickhouse/tables/generic_metrics_counters/{shard}/default/generic_metric_counters_meta_aggregated_local', '{replica}') PRIMARY KEY (org_id, project_id, use_case_id, metric_id, tag_key, timestamp) ORDER BY (org_id, project_id, use_case_id, metric_id, tag_key, timestamp) PARTITION BY (retention_days, toMonday(timestamp)) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=2048;
Distributed op: CREATE TABLE IF NOT EXISTS generic_metric_counters_meta_aggregated_dist (org_id UInt64, project_id UInt64, use_case_id LowCardinality(String), metric_id UInt64, tag_key String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, tag_values AggregateFunction(groupUniqArray, String), count AggregateFunction(sum, Float64)) ENGINE Distributed(`cluster_one_sh`, default, generic_metric_counters_meta_aggregated_local);
-- end forward migration generic_metrics : 0031_counters_meta_table
-- backward migration generic_metrics : 0031_counters_meta_table
Distributed op: DROP TABLE IF EXISTS generic_metric_counters_meta_aggregated_dist;
Local op: DROP TABLE IF EXISTS generic_metric_counters_meta_aggregated_local;
-- end backward migration generic_metrics : 0031_counters_meta_table
-- forward migration generic_metrics : 0032_counters_meta_table_mv
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS generic_metric_counters_meta_aggregation_mv TO generic_metric_counters_meta_aggregated_local (org_id UInt64, project_id UInt64, use_case_id LowCardinality(String), metric_id UInt64, tag_key String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, tag_values AggregateFunction(groupUniqArray, String), value AggregateFunction(sum, Float64)) AS
SELECT
org_id,
project_id,
use_case_id,
metric_id,
tag_key,
toStartOfWeek(timestamp) as timestamp,
retention_days,
groupUniqArrayState(tag_value) as `tag_values`,
sumState(count_value) as count
FROM generic_metric_counters_raw_local
ARRAY JOIN
tags.key AS tag_key, tags.raw_value AS tag_value
WHERE record_meta = 1
GROUP BY
org_id,
project_id,
use_case_id,
metric_id,
tag_key,
timestamp,
retention_days
;
-- end forward migration generic_metrics : 0032_counters_meta_table_mv
-- backward migration generic_metrics : 0032_counters_meta_table_mv
Local op: DROP TABLE IF EXISTS generic_metric_counters_meta_aggregation_mv;
-- end backward migration generic_metrics : 0032_counters_meta_table_mv |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added some thoughts, but not blocking
snuba/snuba_migrations/generic_metrics/0031_counters_meta_table_mv.py
Outdated
Show resolved
Hide resolved
snuba/snuba_migrations/generic_metrics/0031_counters_meta_table_mv.py
Outdated
Show resolved
Hide resolved
Did some local testing just to make sure this worked: First, used tests to insert some data to Then, ran these queries:
Then I ran this query:
And then I ran these approximations on how the autocomplete could be run in prod:
|
Some more numbers: I pulled two days of data to see how many rows this would actually create.
It's nice to see that excluding escalating issues would eliminate about half of the total potential rows. |
PR reverted: 1eb33c5 |
This reverts commit 8d41ed1. Co-authored-by: evanh <[email protected]>
Problem
Queries that fetch all the metric IDs for a given org/project, and another query to fetch the tag values for a given tag key, take a long time because they are done on the main aggregate table. These queries don't actually care about the values, granularities etc., but are coming up against those problems.
Solution
This creates a table and a materialized view to populate it for storing meta information about metrics. This table is meant to satisfy queries that are trying to find metric_ids, tag keys and tag values, but are not interested in the values associated with the metrics. Since this table is discarding the values and has a very coarse granularity (weekly), it should store much less data than the main table, and make queries faster.
In theory this will eventually be done for all the metric types, but for now this is being used just for counters to test how well this solution actually solves the problems.
Potential Impact
The major point of risk is that writing to two materialized views (the existing aggregate table, and now this meta table) will take longer on insert and cause slowdowns/backlogs when inserting data. In that case the reverse migration can be run to remove the view and that will stop.
This will also start storing more data on the node, so that should be watched. In theory this should store much less data than the main table, so that impact should be small, and will certainly be incremental. If this does become a problem, the migrations can be reversed.