Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ref(eap): Add configuration for attributes meta #6162

Closed
wants to merge 12 commits into from

Conversation

evanh
Copy link
Member

@evanh evanh commented Jul 30, 2024

Add the storage for the spans attributes meta. Also add a basic test file for the eap dataset, and a test for the spans storage and the meta storage.

Depends on #6152

Create a table that stores just the attribute keys and values that have been seen by an org. This
can be used for autocomplete or any other validation feature.

The table doesn't have any indexes on it at the moment. It also provides a count, to do simple
ranking if that's useful in the future.
Add the storage for the spans attributes meta. Also add a basic test file for the eap dataset, and a
test for the spans storage and the meta storage.
@evanh evanh requested a review from a team as a code owner July 30, 2024 21:13
Copy link

This PR has a migration; here is the generated SQL

-- start migrations

-- forward migration events_analytics_platform : 0002_spans_attributes_mv
Local op: CREATE TABLE IF NOT EXISTS spans_attributes_meta_local (organization_id UInt64, attribute_type String, attribute_key String, attribute_value String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, count AggregateFunction(sum, UInt64)) ENGINE ReplicatedAggregatingMergeTree('/clickhouse/tables/events_analytics_platform/{shard}/default/spans_attributes_meta_local', '{replica}') PRIMARY KEY (organization_id, attribute_key) ORDER BY (organization_id, attribute_key, attribute_value, timestamp) PARTITION BY toMonday(timestamp) TTL timestamp + toIntervalDay(retention_days) SETTINGS index_granularity=8192, ttl_only_drop_parts=0;
Distributed op: CREATE TABLE IF NOT EXISTS spans_attributes_meta_dist (organization_id UInt64, attribute_type String, attribute_key String, attribute_value String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, count AggregateFunction(sum, UInt64)) ENGINE Distributed(`cluster_one_sh`, default, spans_attributes_meta_local);
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS spans_attributes_str_meta_mv TO spans_attributes_meta_local (organization_id UInt64, attribute_type String, attribute_key String, attribute_value String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, count AggregateFunction(sum, UInt64)) AS 
SELECT
    organization_id,
    attribute_key,
    attr_value AS attribute_value,
    toMonday(start_timestamp) AS timestamp,
    retention_days,
    sumState(cast(1, 'UInt64')) AS count
FROM eap_spans_local
LEFT ARRAY JOIN
    arrayConcat(mapKeys(attr_str_0),mapKeys(attr_str_1),mapKeys(attr_str_2),mapKeys(attr_str_3),mapKeys(attr_str_4),mapKeys(attr_str_5),mapKeys(attr_str_6),mapKeys(attr_str_7),mapKeys(attr_str_8),mapKeys(attr_str_9),mapKeys(attr_str_10),mapKeys(attr_str_11),mapKeys(attr_str_12),mapKeys(attr_str_13),mapKeys(attr_str_14),mapKeys(attr_str_15),mapKeys(attr_str_16),mapKeys(attr_str_17),mapKeys(attr_str_18),mapKeys(attr_str_19)) AS attribute_key,
    arrayConcat(mapValues(attr_str_0),mapValues(attr_str_1),mapValues(attr_str_2),mapValues(attr_str_3),mapValues(attr_str_4),mapValues(attr_str_5),mapValues(attr_str_6),mapValues(attr_str_7),mapValues(attr_str_8),mapValues(attr_str_9),mapValues(attr_str_10),mapValues(attr_str_11),mapValues(attr_str_12),mapValues(attr_str_13),mapValues(attr_str_14),mapValues(attr_str_15),mapValues(attr_str_16),mapValues(attr_str_17),mapValues(attr_str_18),mapValues(attr_str_19)) AS attr_value
GROUP BY
    organization_id,
    attribute_key,
    attribute_value,
    timestamp,
    retention_days
;
Local op: CREATE MATERIALIZED VIEW IF NOT EXISTS spans_attributes_num_meta_mv TO spans_attributes_meta_local (organization_id UInt64, attribute_type String, attribute_key String, attribute_value String, timestamp DateTime CODEC (DoubleDelta), retention_days UInt16, count AggregateFunction(sum, UInt64)) AS 
SELECT
    organization_id,
    attribute_key,
    '' AS attribute_value,
    toMonday(start_timestamp) AS timestamp,
    retention_days,
    sumState(cast(1, 'UInt64')) AS count
FROM eap_spans_local
LEFT ARRAY JOIN
    arrayConcat(mapKeys(attr_num_0),mapKeys(attr_num_1),mapKeys(attr_num_2),mapKeys(attr_num_3),mapKeys(attr_num_4),mapKeys(attr_num_5),mapKeys(attr_num_6),mapKeys(attr_num_7),mapKeys(attr_num_8),mapKeys(attr_num_9),mapKeys(attr_num_10),mapKeys(attr_num_11),mapKeys(attr_num_12),mapKeys(attr_num_13),mapKeys(attr_num_14),mapKeys(attr_num_15),mapKeys(attr_num_16),mapKeys(attr_num_17),mapKeys(attr_num_18),mapKeys(attr_num_19)) AS attribute_key,
    arrayConcat(mapValues(attr_num_0),mapValues(attr_num_1),mapValues(attr_num_2),mapValues(attr_num_3),mapValues(attr_num_4),mapValues(attr_num_5),mapValues(attr_num_6),mapValues(attr_num_7),mapValues(attr_num_8),mapValues(attr_num_9),mapValues(attr_num_10),mapValues(attr_num_11),mapValues(attr_num_12),mapValues(attr_num_13),mapValues(attr_num_14),mapValues(attr_num_15),mapValues(attr_num_16),mapValues(attr_num_17),mapValues(attr_num_18),mapValues(attr_num_19)) AS attr_value
GROUP BY
    organization_id,
    attribute_key,
    attribute_value,
    timestamp,
    retention_days
;
-- end forward migration events_analytics_platform : 0002_spans_attributes_mv




-- backward migration events_analytics_platform : 0002_spans_attributes_mv
Local op: DROP TABLE IF EXISTS spans_attributes_str_meta_mv;
Local op: DROP TABLE IF EXISTS spans_attributes_num_meta_mv;
Local op: DROP TABLE IF EXISTS spans_attributes_meta_local;
Distributed op: DROP TABLE IF EXISTS spans_attributes_meta_dist;
-- end backward migration events_analytics_platform : 0002_spans_attributes_mv

Copy link

codecov bot commented Jul 30, 2024

❌ 1 Tests Failed:

Tests completed Failed Passed Skipped
220 1 219 0
View the top 1 failed tests by shortest run time
tests.test_eap_spans_api.TestEAPSpansAPI test_simple_query
Stack Traces | 0.226s run time
Traceback (most recent call last):
  File ".../snuba/tests/test_eap_spans_api.py", line 96, in test_simple_query
    assert response.status_code == 200, data
AssertionError: {'error': {'message': "Missing >= condition with a datetime literal on column timestamp for entity eap_spans. Example: timestamp >= toDateTime('2023-05-16 00:00')", 'type': 'invalid_query'}}
assert 400 == 200
 +  where 400 = <WrapperTestResponse 190 bytes [400 BAD REQUEST]>.status_code

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

Base automatically changed from evanh/feat/attributes-materialized-views to master August 7, 2024 14:25
@evanh evanh requested a review from a team as a code owner August 23, 2024 12:55
@volokluev
Copy link
Member

Didn't we remove this because the load on the cluster was too high?

@evanh
Copy link
Member Author

evanh commented Oct 2, 2024

This was added separately.

@evanh evanh closed this Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants