feat(llm-observability): query runner for LLM traces #27509

skoob13 · 2025-01-14T16:03:20Z

Problem

We need a query runner for LLM traces retrieval.

Changes

This PR introduces the TracesQueryRunner, which has pagination and the ability to filter by a specific trace ID. It groups all $ai_generation events by their $ai_trace_id property, so generations are presented as an array sorted by the event timestamp.

SQL Query

/* user_id:1 celery:posthog.tasks.tasks.process_query_task */ SELECT
    replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_trace_id'), ''), 'null'), '^"|"$', '') AS id,
    min(toTimeZone(events.timestamp, 'UTC')) AS trace_timestamp,
    tuple(max(events__person.id), max(events.distinct_id), max(events__person.created_at), max(events__person.properties)) AS person,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_latency'), ''), 'null'), '^"|"$', ''), 'Float64')) AS total_latency,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_input_tokens'), ''), 'null'), '^"|"$', ''), 'Float64')) AS input_tokens,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_output_tokens'), ''), 'null'), '^"|"$', ''), 'Float64')) AS output_tokens,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_input_cost_usd'), ''), 'null'), '^"|"$', ''), 'Float64')) AS input_cost,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_output_cost_usd'), ''), 'null'), '^"|"$', ''), 'Float64')) AS output_cost,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_total_cost_usd'), ''), 'null'), '^"|"$', ''), 'Float64')) AS total_cost,
    arraySort(x -> tupleElement(x, 2), groupArray(tuple(events.uuid, toTimeZone(events.timestamp, 'UTC'), events.properties))) AS events
FROM
    events
    LEFT OUTER JOIN (SELECT
        argMax(person_distinct_id_overrides.person_id, person_distinct_id_overrides.version) AS person_id,
        person_distinct_id_overrides.distinct_id AS distinct_id
    FROM
        person_distinct_id_overrides
    WHERE
        equals(person_distinct_id_overrides.team_id, 1)
    GROUP BY
        person_distinct_id_overrides.distinct_id
    HAVING
        ifNull(equals(argMax(person_distinct_id_overrides.is_deleted, person_distinct_id_overrides.version), 0), 0)
    SETTINGS optimize_aggregation_in_order=1) AS events__override ON equals(events.distinct_id, events__override.distinct_id)
    LEFT JOIN (SELECT
        person.id AS id,
        toTimeZone(person.created_at, 'UTC') AS created_at,
        person.properties AS properties
    FROM
        person
    WHERE
        and(equals(person.team_id, 1), ifNull(in(tuple(person.id, person.version), (SELECT
                    person.id AS id,
                    max(person.version) AS version
                FROM
                    person
                WHERE
                    equals(person.team_id, 1)
                GROUP BY
                    person.id
                HAVING
                    and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(toTimeZone(person.created_at, 'UTC'), person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)))), 0))
    SETTINGS optimize_aggregation_in_order=1) AS events__person ON equals(if(not(empty(events__override.distinct_id)), events__override.person_id, events.person_id), events__person.id)
WHERE
    and(equals(events.team_id, 1), equals(events.event, '$ai_generation'))
GROUP BY
    id
ORDER BY
    trace_timestamp DESC
LIMIT 101
OFFSET 0 SETTINGS readonly=2, max_execution_time=600, allow_experimental_object_type=1, format_csv_allow_double_quotes=0, max_ast_elements=4000000, max_expanded_ast_elements=4000000, max_bytes_before_external_group_by=0

Does this work well for both Cloud and self-hosted?

Yes

How did you test this code?

Unit tests

…ervability-query-runner

skoob13 · 2025-01-14T16:40:46Z

frontend/src/queries/schema/schema-general.ts

+    columns?: string[]
+}
+
+export interface TracesQuery extends DataNode<TracesQueryResponse> {


What casing convention do we use for the query schemas?

As we discussed in the sync, it's a mix… but new additions should be camelCase basically

github-actions · 2025-01-14T17:01:53Z

Size Change: -150 B (-0.01%)

Total Size: 1.13 MB

ℹ️ View Unchanged

Filename	Size	Change
`frontend/dist/toolbar.js`	1.13 MB	-150 B (-0.01%)

_{compressed-size-action}

Twixes · 2025-01-14T18:41:25Z

frontend/src/queries/schema/schema-general.ts

+    columns?: string[]
+}
+
+export interface TracesQuery extends DataNode<TracesQueryResponse> {


As we discussed in the sync, it's a mix… but new additions should be camelCase basically

Twixes · 2025-01-14T19:03:18Z

frontend/src/scenes/saved-insights/SavedInsights.tsx

@@ -313,6 +313,11 @@ export const QUERY_TYPES_METADATA: Record<NodeKind, InsightTypeMetadata> = {
        icon: IconHogQL,
        inMenu: false,
    },
+    [NodeKind.TracesQuery]: {
+        name: 'LLM Observability Traces',
+        icon: IconHogQL,


Let's do IconAI, like in the navbar

Twixes · 2025-01-14T19:05:17Z

posthog/hogql_queries/ai/traces_query_runner.py

+"""
+select
+    properties.$ai_trace_id as trace_id,
+    min(timestamp) as trace_timestamp,
+    max(person.properties) as person,
+    sum(properties.$ai_latency) as total_latency,
+    sum(properties.$ai_input_tokens) as input_tokens,
+    sum(properties.$ai_output_tokens) as output_tokens,
+    sum(properties.$ai_input_cost_usd) as input_cost,
+    sum(properties.$ai_output_cost_usd) as output_cost,
+    sum(properties.$ai_total_cost_usd) as total_cost,
+    arraySort(x -> x.1, groupArray(tuple(timestamp, properties))) as events
+from events
+where
+    event = '$ai_generation'
+group by
+    trace_id
+order by
+    trace_timestamp desc
+"""


Looks like dead code!

Twixes · 2025-01-14T19:14:31Z

posthog/hogql_queries/ai/traces_query_runner.py

+            ast.Alias(
+                expr=ast.Call(
+                    name="arraySort",
+                    args=[
+                        ast.Lambda(
+                            args=["x"],
+                            expr=ast.Call(name="tupleElement", args=[ast.Field(chain=["x"]), ast.Constant(value=2)]),
+                        ),
+                        ast.Call(
+                            name="groupArray",
+                            args=[
+                                ast.Tuple(
+                                    exprs=[
+                                        ast.Field(chain=["uuid"]),
+                                        ast.Field(chain=["timestamp"]),
+                                        ast.Field(chain=["properties"]),
+                                    ]
+                                )
+                            ],
+                        ),
+                    ],
+                ),
+                alias="events",
+            ),


The rest of those columns is pretty readable in direct AST form, but in this one's case I propose parse_expr – it's a chonker

Twixes · 2025-01-14T19:32:16Z

posthog/hogql_queries/ai/traces_query_runner.py

+    def _get_where_clause(self):
+        event_expr = ast.CompareOperation(
+            left=ast.Field(chain=["event"]),
+            op=ast.CompareOperationOp.Eq,
+            right=ast.Constant(value="$ai_generation"),
+        )
+        if self.query.traceId is not None:
+            return ast.And(
+                exprs=[
+                    event_expr,
+                    ast.CompareOperation(
+                        left=ast.Field(chain=["id"]),
+                        op=ast.CompareOperationOp.Eq,
+                        right=ast.Constant(value=self.query.traceId),
+                    ),
+                ]
+            )
+        return event_expr


This MUST have a time range, otherwise we're going to be scanning ClickHouse parts furiously and slowly. A tricky-ish thing here, because we also must ensure traces are selected in full, without missing any generations at the beginning or end.

With traces we can make some reasonable assumptions to make this straightforward.

Let's assume a trace can never be longer than 10 minutes (won't always be true, but almost always true is enough). With this assumption in mind, we'll filter this way in the SQL timestamp > {date_from} - INTERVAL 10 MINUTE AND timestamp < {date_to} + INTERVAL 10 MINUTE. Then in _map_results we'll cut off traces that started before date_from. This way we will be skipping traces that would be selected partially at the beginning of the range, and selecting full traces at the end of the range. The result: consistent results and pagination as simple as always.

Thanks! I missed this part, but you're totally right. One addition I've made is filtering for traces that first appeared after the date_to as they are outside the date range.

skoob13 · 2025-01-15T07:37:27Z

Some flaky tests.

skoob13 · 2025-01-15T09:03:19Z

should be ok now

Twixes

Looks great

sentry-io · 2025-01-16T18:55:32Z

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

‼️ CHQueryErrorIllegalTypeOfArgument: DB::Exception: Illegal type String of argument for aggregate function sum. Stack trace: posthog.tasks.tasks.process_query_task View Issue

_{Did you find this useful? React with a 👍 or 👎}

skoob13 added 11 commits January 13, 2025 20:40

feat: query

ca15798

fix: query aliases

d76093b

feat: trace and generation models

3922f9d

test: retrieving a list

7c4bb87

feat: add a filter by traceId

ac5da15

test: pagination

f59e7ea

test: more field tests

678cd6c

feat: mirror events query for persons

a5b2789

chore: register a query runner

6438dcd

fix: AI-prefixed names broke the schema

b3af14b

Merge branch 'master' of github.com:PostHog/posthog into feat/llm-obs…

ea0d8ef

…ervability-query-runner

skoob13 requested review from timgl, Twixes and k11kirky January 14, 2025 16:25

skoob13 marked this pull request as ready for review January 14, 2025 16:25

fix: schema renaming

117a9d1

skoob13 commented Jan 14, 2025

View reviewed changes

fix: used the new query in wrong places

3a12ca5

Twixes requested changes Jan 14, 2025

View reviewed changes

skoob13 added 2 commits January 14, 2025 21:39

fix: correct date ranges

87cba50

fix: camel case in query schemas

335ea23

skoob13 requested a review from Twixes January 14, 2025 20:49

fix: set snapshot dates

42ea56d

skoob13 force-pushed the feat/llm-observability-query-runner branch from 027d646 to 42ea56d Compare January 15, 2025 09:01

PostHog deleted a comment from posthog-bot Jan 15, 2025

Twixes approved these changes Jan 15, 2025

View reviewed changes

skoob13 merged commit b46f351 into master Jan 15, 2025
99 checks passed

skoob13 deleted the feat/llm-observability-query-runner branch January 15, 2025 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llm-observability): query runner for LLM traces #27509

feat(llm-observability): query runner for LLM traces #27509

skoob13 commented Jan 14, 2025 •

edited

Loading

skoob13 Jan 14, 2025

Twixes Jan 14, 2025

github-actions bot commented Jan 14, 2025 •

edited

Loading

Twixes Jan 14, 2025

Twixes Jan 14, 2025

Twixes Jan 14, 2025

Twixes Jan 14, 2025

Twixes Jan 14, 2025 •

edited

Loading

skoob13 Jan 14, 2025

skoob13 commented Jan 15, 2025

skoob13 commented Jan 15, 2025

Twixes left a comment

sentry-io bot commented Jan 16, 2025

feat(llm-observability): query runner for LLM traces #27509

feat(llm-observability): query runner for LLM traces #27509

Conversation

skoob13 commented Jan 14, 2025 • edited Loading

Problem

Changes

Does this work well for both Cloud and self-hosted?

How did you test this code?

skoob13 Jan 14, 2025

Choose a reason for hiding this comment

Twixes Jan 14, 2025

Choose a reason for hiding this comment

github-actions bot commented Jan 14, 2025 • edited Loading

Twixes Jan 14, 2025

Choose a reason for hiding this comment

Twixes Jan 14, 2025

Choose a reason for hiding this comment

Twixes Jan 14, 2025

Choose a reason for hiding this comment

Twixes Jan 14, 2025

Choose a reason for hiding this comment

Twixes Jan 14, 2025 • edited Loading

Choose a reason for hiding this comment

skoob13 Jan 14, 2025

Choose a reason for hiding this comment

skoob13 commented Jan 15, 2025

skoob13 commented Jan 15, 2025

Twixes left a comment

Choose a reason for hiding this comment

sentry-io bot commented Jan 16, 2025

Suspect Issues

skoob13 commented Jan 14, 2025 •

edited

Loading

github-actions bot commented Jan 14, 2025 •

edited

Loading

Twixes Jan 14, 2025 •

edited

Loading