Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(metrics): Add a meta table to counters #5669

Merged
merged 8 commits into from
Mar 22, 2024

Conversation

evanh
Copy link
Member

@evanh evanh commented Mar 19, 2024

Problem

Queries that fetch all the metric IDs for a given org/project, and another query to fetch the tag values for a given tag key, take a long time because they are done on the main aggregate table. These queries don't actually care about the values, granularities etc., but are coming up against those problems.

Solution

This creates a table and a materialized view to populate it for storing meta information about metrics. This table is meant to satisfy queries that are trying to find metric_ids, tag keys and tag values, but are not interested in the values associated with the metrics. Since this table is discarding the values and has a very coarse granularity (weekly), it should store much less data than the main table, and make queries faster.

In theory this will eventually be done for all the metric types, but for now this is being used just for counters to test how well this solution actually solves the problems.

Potential Impact

The major point of risk is that writing to two materialized views (the existing aggregate table, and now this meta table) will take longer on insert and cause slowdowns/backlogs when inserting data. In that case the reverse migration can be run to remove the view and that will stop.

This will also start storing more data on the node, so that should be watched. In theory this should store much less data than the main table, so that impact should be small, and will certainly be incremental. If this does become a problem, the migrations can be reversed.

This creates a table and a materialized view to populate it for storing meta
information about metrics. This table is meant to satisfy queries that are
trying to find metric_ids, tag keys and tag values, but are not interested in
the values associated with the metrics.

In theory this will eventually be done for all the metric types, but for now
this is being used just for counters to test how well this solution actually
solves the problems.
@evanh evanh requested a review from a team as a code owner March 19, 2024 15:39
Copy link

codecov bot commented Mar 19, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 89.92%. Comparing base (f66a9fa) to head (f019f71).

✅ All tests successful. No failed tests found ☺️

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #5669      +/-   ##
==========================================
+ Coverage   89.91%   89.92%   +0.01%     
==========================================
  Files         894      897       +3     
  Lines       43393    43441      +48     
  Branches      299      299              
==========================================
+ Hits        39017    39065      +48     
  Misses       4334     4334              
  Partials       42       42              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

github-actions bot commented Mar 19, 2024

This PR has a migration; here is the generated SQL

-- start migrations

-- forward migration generic_metrics : 0030_add_record_meta_column
Local op: ALTER TABLE generic_metric_counters_raw_local ADD COLUMN IF NOT EXISTS record_meta UInt8 DEFAULT 0 AFTER materialization_version;
-- end forward migration generic_metrics : 0030_add_record_meta_column




-- backward migration generic_metrics : 0030_add_record_meta_column
Local op: ALTER TABLE generic_metric_counters_raw_local DROP COLUMN IF EXISTS record_meta;
-- end backward migration generic_metrics : 0030_add_record_meta_column
-- forward migration generic_metrics : 0031_counters_meta_table
Local op: CREATE TABLE IF NOT EXISTS generic_metric_counters_meta_aggregated_local (org_id UInt64, project_id UInt64, use_case_id LowCardinality(String), metric_id UInt64, tag_key String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, tag_values AggregateFunction(groupUniqArray, String), count AggregateFunction(sum, Float64)) ENGINE ReplicatedAggregatingMergeTree('/clickhouse/tables/generic_metrics_counters/{shard}/default/generic_metric_counters_meta_aggregated_local', '{replica}') PRIMARY KEY (org_id, project_id, use_case_id, metric_id, tag_key, timestamp) ORDER BY (org_id, project_id, use_case_id, metric_id, tag_key, timestamp) PARTITION BY (retention_days, toMonday(timestamp)) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=2048;
Distributed op: CREATE TABLE IF NOT EXISTS generic_metric_counters_meta_aggregated_dist (org_id UInt64, project_id UInt64, use_case_id LowCardinality(String), metric_id UInt64, tag_key String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, tag_values AggregateFunction(groupUniqArray, String), count AggregateFunction(sum, Float64)) ENGINE Distributed(`cluster_one_sh`, default, generic_metric_counters_meta_aggregated_local);
-- end forward migration generic_metrics : 0031_counters_meta_table




-- backward migration generic_metrics : 0031_counters_meta_table
Distributed op: DROP TABLE IF EXISTS generic_metric_counters_meta_aggregated_dist;
Local op: DROP TABLE IF EXISTS generic_metric_counters_meta_aggregated_local;
-- end backward migration generic_metrics : 0031_counters_meta_table
-- forward migration generic_metrics : 0032_counters_meta_table_mv
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS generic_metric_counters_meta_aggregation_mv TO generic_metric_counters_meta_aggregated_local (org_id UInt64, project_id UInt64, use_case_id LowCardinality(String), metric_id UInt64, tag_key String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, tag_values AggregateFunction(groupUniqArray, String), value AggregateFunction(sum, Float64)) AS 
                SELECT
                    org_id,
                    project_id,
                    use_case_id,
                    metric_id,
                    tag_key,
                    toStartOfWeek(timestamp) as timestamp,
                    retention_days,
                    groupUniqArrayState(tag_value) as `tag_values`,
                    sumState(count_value) as count
                FROM generic_metric_counters_raw_local
                ARRAY JOIN
                    tags.key AS tag_key, tags.raw_value AS tag_value
                WHERE record_meta = 1
                GROUP BY
                    org_id,
                    project_id,
                    use_case_id,
                    metric_id,
                    tag_key,
                    timestamp,
                    retention_days
                ;
-- end forward migration generic_metrics : 0032_counters_meta_table_mv




-- backward migration generic_metrics : 0032_counters_meta_table_mv
Local op: DROP TABLE IF EXISTS generic_metric_counters_meta_aggregation_mv;
-- end backward migration generic_metrics : 0032_counters_meta_table_mv

Copy link
Contributor

@onewland onewland left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added some thoughts, but not blocking

@evanh
Copy link
Member Author

evanh commented Mar 21, 2024

@nikhars @onewland I updated this to use groupUniqArray and also to use a record_meta column in the raw table.

@evanh
Copy link
Member Author

evanh commented Mar 21, 2024

Did some local testing just to make sure this worked:

First, used tests to insert some data to generic_metric_counters_raw_local.

Then, ran these queries:

CREATE TABLE generic_metric_counters_meta_aggregated_local
(
    `org_id` UInt64,
    `project_id` UInt64,
    `use_case_id` LowCardinality(String),
    `metric_id` UInt64,
    `tag_key` String,
    `timestamp` DateTime CODEC(DoubleDelta),
    `retention_days` UInt16,
    `tag_values` AggregateFunction(groupUniqArray, String),
    `count` AggregateFunction(sum, Float64)
)
ENGINE = AggregatingMergeTree
PARTITION BY (retention_days, toMonday(timestamp))
PRIMARY KEY (org_id, project_id, use_case_id, metric_id, tag_key, timestamp)
ORDER BY (org_id, project_id, use_case_id, metric_id, tag_key, timestamp)
TTL timestamp + toIntervalDay(retention_days)
SETTINGS index_granularity = 2048

Then I ran this query:

INSERT INTO generic_metric_counters_meta_aggregated_local (org_id, project_id, use_case_id, metric_id, tag_key, timestamp, retention_days, tag_values, count) SELECT
    org_id,
    project_id,
    use_case_id,
    metric_id,
    tag_key,
    toStartOfWeek(timestamp) AS timestamp,
    retention_days,
    groupUniqArrayState(tag_value) AS tag_values,
    sumState(count_value) AS count
FROM generic_metric_counters_raw_local
ARRAY JOIN
    tags.key AS tag_key,
    tags.raw_value AS tag_value
GROUP BY
    org_id,
    project_id,
    use_case_id,
    metric_id,
    tag_key,
    timestamp,
    retention_days

And then I ran these approximations on how the autocomplete could be run in prod:

SELECT
    org_id,
    project_id,
    metric_id,
    sumMerge(count)
FROM generic_metric_counters_meta_aggregated_local
WHERE (org_id = 4553686409543680) AND (project_id = 4553686409674753) AND (use_case_id = 'transactions')
GROUP BY
    org_id,
    project_id,
    metric_id

Query id: 6a24c6eb-e48e-425b-a561-30c59a8d71cb

┌───────────org_id─┬───────project_id─┬───────────metric_id─┬─sumMerge(count)─┐
│ 4553686409543680 │ 4553686409674753 │ 9223372036854775933 │             135 │
└──────────────────┴──────────────────┴─────────────────────┴─────────────────┘
SELECT
    org_id,
    project_id,
    metric_id,
    tag_key,
    groupUniqArrayMerge(tag_values),
    sumMerge(count)
FROM generic_metric_counters_meta_aggregated_local
WHERE (org_id = 4553686409543680) AND (project_id = 4553686409674753) AND (use_case_id = 'transactions') AND (metric_id = 9223372036854775933) AND (tag_key = '10000')
GROUP BY
    org_id,
    project_id,
    metric_id,
    tag_key

Query id: c25358ac-26ee-4a63-a9a5-97d5271590e8

┌───────────org_id─┬───────project_id─┬───────────metric_id─┬─tag_key─┬─groupUniqArrayMerge(tag_values)─┬─sumMerge(count)─┐
│ 4553686409543680 │ 4553686409674753 │ 9223372036854775933 │ 10000   │ ['200','500']                   │              45 │
└──────────────────┴──────────────────┴─────────────────────┴─────────┴─────────────────────────────────┴─────────────────┘

@evanh
Copy link
Member Author

evanh commented Mar 21, 2024

Some more numbers:

I pulled two days of data to see how many rows this would actually create.

use_case_id,count()
escalating_issues,355870
transactions,270810
spans,283241
metric_stats,459
custom,7288

It's nice to see that excluding escalating issues would eliminate about half of the total potential rows.

@evanh evanh merged commit 8d41ed1 into master Mar 22, 2024
32 checks passed
@evanh evanh deleted the evanh/feat/meta-materialized-views branch March 22, 2024 16:50
@getsentry-bot
Copy link
Contributor

PR reverted: 1eb33c5

getsentry-bot added a commit that referenced this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants