feat(eap): Timeseries V1 RPC #6475

volokluev · 2024-10-29T23:53:49Z

This PR implements the basics of the timeseries API.

Supported:

Aggregations on all attributes
filters
groupby
zerofilling nonexistent data

To come in future PRs:

Sample count
extrapolation

codecov · 2024-10-30T00:10:23Z

❌ 1 Tests Failed:

Tests completed	Failed	Passed	Skipped
2667	1	2666	5

View the top 1 failed tests by shortest run time

tests.web.rpc.v1.test_endpoint_time_series.TestTimeSeriesApi test_basic

Stack Traces | 0.259s run time

Traceback (most recent call last):
  File ".../rpc/v1/test_endpoint_time_series.py", line 161, in test_basic
    assert response.status_code == 200
AssertionError: assert 500 == 200
 +  where 500 = &lt;WrapperTestResponse streamed [500 INTERNAL SERVER ERROR]&gt;.status_code

To view individual test run time comparison to the main branch, go to the Test Analytics Dashboard

evanh · 2024-10-30T14:45:38Z

snuba/web/rpc/v1/endpoint_time_series.py

+    )
+    time_buckets = [
+        Timestamp(seconds=(request.meta.start_timestamp.seconds) + secs)
+        for secs in range(0, query_duration, request.granularity_secs)


What happens if the granularity doesn't line up with the duration? E.g. a granularity of 61 seconds with a 10 minute window?

Do we need extra validation to ensure that the result timestamps will line up with these generated buckets?

Good point, I added some logic to handle this and tests for it as well

evanh · 2024-10-30T14:46:32Z

snuba/web/rpc/v1/endpoint_time_series.py

+            if col_name in group_by_labels:
+                group_by_map[col_name] = col_value
+
+        group_by_key = "|".join([f"{k},{v}" for k, v in group_by_map.items()])


This is a performance nit, but I might make this a tuple instead of a string.

I don't know how much faster that will actually make it as the tuple will still need to be hashed eventually

evanh · 2024-10-30T14:48:34Z

snuba/web/rpc/v1/endpoint_time_series.py

+    res = Query(
+        from_clause=entity,
+        selected_columns=[
+            SelectedExpression(name="time", expression=column("time", alias="time")),


I would rather not have things depend on the TimeSeriesProcessor, but that is a personal opinion. I've been trying to deprecate that processor for a while.

evanh · 2024-10-30T14:50:07Z

snuba/web/rpc/v1/endpoint_time_series.py

+) -> Iterable[TimeSeries]:
+    # to convert the results, need to know which were the groupby columns and which ones
+    # were aggregations
+    aggregation_labels = set([agg.label for agg in request.aggregations])


Is it valid to have a request with duplicate labels?

good point, met me enforce that that does not happen

This is done now

kylemumma

I got a bit confused trying to understand _convert_result_timeseries but other than that it looks good

kylemumma · 2024-10-30T15:43:27Z

snuba/web/rpc/v1/endpoint_time_series.py

+
+    def _execute(self, in_msg: TimeSeriesRequest) -> TimeSeriesResponse:
+        # TODO: Move this to base
+        in_msg.meta.request_id = getattr(in_msg.meta, "request_id", None) or str(


It seems that all fields of the protobuf can be empty / unset.
Is the only way we enforce required fields right now just from implicit internal error if they don't provide?

yeah pretty much

kylemumma · 2024-10-30T16:45:20Z

snuba/web/rpc/v1/endpoint_time_series.py

+    for row in data:
+        group_by_map = {}
+
+        for col_name, col_value in row.items():
+            if col_name in group_by_labels:
+                group_by_map[col_name] = col_value
+
+        group_by_key = "|".join([f"{k},{v}" for k, v in group_by_map.items()])
+        for col_name in aggregation_labels:
+            if not result_timeseries.get((group_by_key, col_name), None):
+                result_timeseries[(group_by_key, col_name)] = TimeSeries(
+                    group_by_attributes=group_by_map,
+                    label=col_name,
+                    buckets=time_buckets,
+                )
+            result_timeseries_timestamp_to_row[(group_by_key, col_name)][
+                int(datetime.fromisoformat(row["time"]).timestamp())
+            ] = row


what is group_by_map and group_by_key? Im not sure what you're doing here.

I added an explanation, lemme know if it makes sense now

evanh

LGTM.

volokluev requested review from a team as code owners October 29, 2024 23:53

evanh reviewed Oct 30, 2024

View reviewed changes

kylemumma approved these changes Oct 30, 2024

View reviewed changes

github-actions bot added the migrations label Oct 30, 2024

volokluev added 16 commits October 30, 2024 17:06

add new file

a383825

boilerplate

27d9c37

write a test

a2fab5c

make first real test pass

63ee6e2

add test for data not present

8db0a9c

make work with not present data

92bbc62

erroneous TODO

4c2e605

fix filter impl, add test

6bf87b6

specify extrapolation mode in tests

1a2a6c8

fix merge squash

21adb2d

mypy

e88927d

enforce no duplicate labels

d6e28a0

add algo explanation

0e625b3

fix buggy test

10902d5

typing

40bf3f1

enforce granularity better

aff967f

volokluev force-pushed the volo/eap/timeseries_v1 branch from cd925c9 to aff967f Compare October 31, 2024 00:06

github-actions bot removed the migrations label Oct 31, 2024

evanh approved these changes Oct 31, 2024

View reviewed changes

volokluev added 2 commits October 31, 2024 12:17

fix basic test

a42525d

see what's up with this test

2daee67

volokluev merged commit 1b0a1f6 into master Oct 31, 2024
30 checks passed

volokluev deleted the volo/eap/timeseries_v1 branch October 31, 2024 22:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eap): Timeseries V1 RPC #6475

feat(eap): Timeseries V1 RPC #6475

volokluev commented Oct 29, 2024

codecov bot commented Oct 30, 2024 •

edited

Loading

evanh Oct 30, 2024

volokluev Oct 31, 2024

evanh Oct 30, 2024

volokluev Oct 30, 2024

evanh Oct 30, 2024

evanh Oct 30, 2024

volokluev Oct 30, 2024

volokluev Oct 31, 2024

kylemumma left a comment

kylemumma Oct 30, 2024

volokluev Oct 30, 2024

kylemumma Oct 30, 2024

volokluev Oct 30, 2024

evanh left a comment

feat(eap): Timeseries V1 RPC #6475

feat(eap): Timeseries V1 RPC #6475

Conversation

volokluev commented Oct 29, 2024

codecov bot commented Oct 30, 2024 • edited Loading

❌ 1 Tests Failed:

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kylemumma left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

evanh left a comment

Choose a reason for hiding this comment

codecov bot commented Oct 30, 2024 •

edited

Loading