Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(llm-observability): query runner for LLM traces #27509

Merged
merged 16 commits into from
Jan 15, 2025

Conversation

skoob13
Copy link
Contributor

@skoob13 skoob13 commented Jan 14, 2025

Problem

We need a query runner for LLM traces retrieval.

Changes

This PR introduces the TracesQueryRunner, which has pagination and the ability to filter by a specific trace ID. It groups all $ai_generation events by their $ai_trace_id property, so generations are presented as an array sorted by the event timestamp.

SQL Query
/* user_id:1 celery:posthog.tasks.tasks.process_query_task */ SELECT
    replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_trace_id'), ''), 'null'), '^"|"$', '') AS id,
    min(toTimeZone(events.timestamp, 'UTC')) AS trace_timestamp,
    tuple(max(events__person.id), max(events.distinct_id), max(events__person.created_at), max(events__person.properties)) AS person,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_latency'), ''), 'null'), '^"|"$', ''), 'Float64')) AS total_latency,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_input_tokens'), ''), 'null'), '^"|"$', ''), 'Float64')) AS input_tokens,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_output_tokens'), ''), 'null'), '^"|"$', ''), 'Float64')) AS output_tokens,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_input_cost_usd'), ''), 'null'), '^"|"$', ''), 'Float64')) AS input_cost,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_output_cost_usd'), ''), 'null'), '^"|"$', ''), 'Float64')) AS output_cost,
    sum(accurateCastOrNull(replaceRegexpAll(nullIf(nullIf(JSONExtractRaw(events.properties, '$ai_total_cost_usd'), ''), 'null'), '^"|"$', ''), 'Float64')) AS total_cost,
    arraySort(x -> tupleElement(x, 2), groupArray(tuple(events.uuid, toTimeZone(events.timestamp, 'UTC'), events.properties))) AS events
FROM
    events
    LEFT OUTER JOIN (SELECT
        argMax(person_distinct_id_overrides.person_id, person_distinct_id_overrides.version) AS person_id,
        person_distinct_id_overrides.distinct_id AS distinct_id
    FROM
        person_distinct_id_overrides
    WHERE
        equals(person_distinct_id_overrides.team_id, 1)
    GROUP BY
        person_distinct_id_overrides.distinct_id
    HAVING
        ifNull(equals(argMax(person_distinct_id_overrides.is_deleted, person_distinct_id_overrides.version), 0), 0)
    SETTINGS optimize_aggregation_in_order=1) AS events__override ON equals(events.distinct_id, events__override.distinct_id)
    LEFT JOIN (SELECT
        person.id AS id,
        toTimeZone(person.created_at, 'UTC') AS created_at,
        person.properties AS properties
    FROM
        person
    WHERE
        and(equals(person.team_id, 1), ifNull(in(tuple(person.id, person.version), (SELECT
                    person.id AS id,
                    max(person.version) AS version
                FROM
                    person
                WHERE
                    equals(person.team_id, 1)
                GROUP BY
                    person.id
                HAVING
                    and(ifNull(equals(argMax(person.is_deleted, person.version), 0), 0), ifNull(less(argMax(toTimeZone(person.created_at, 'UTC'), person.version), plus(now64(6, 'UTC'), toIntervalDay(1))), 0)))), 0))
    SETTINGS optimize_aggregation_in_order=1) AS events__person ON equals(if(not(empty(events__override.distinct_id)), events__override.person_id, events.person_id), events__person.id)
WHERE
    and(equals(events.team_id, 1), equals(events.event, '$ai_generation'))
GROUP BY
    id
ORDER BY
    trace_timestamp DESC
LIMIT 101
OFFSET 0 SETTINGS readonly=2, max_execution_time=600, allow_experimental_object_type=1, format_csv_allow_double_quotes=0, max_ast_elements=4000000, max_expanded_ast_elements=4000000, max_bytes_before_external_group_by=0

Does this work well for both Cloud and self-hosted?

Yes

How did you test this code?

Unit tests

@skoob13 skoob13 requested review from timgl, Twixes and k11kirky January 14, 2025 16:25
@skoob13 skoob13 marked this pull request as ready for review January 14, 2025 16:25
columns?: string[]
}

export interface TracesQuery extends DataNode<TracesQueryResponse> {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What casing convention do we use for the query schemas?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in the sync, it's a mix… but new additions should be camelCase basically

Copy link
Contributor

github-actions bot commented Jan 14, 2025

Size Change: -150 B (-0.01%)

Total Size: 1.13 MB

ℹ️ View Unchanged
Filename Size Change
frontend/dist/toolbar.js 1.13 MB -150 B (-0.01%)

compressed-size-action

columns?: string[]
}

export interface TracesQuery extends DataNode<TracesQueryResponse> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we discussed in the sync, it's a mix… but new additions should be camelCase basically

@@ -313,6 +313,11 @@ export const QUERY_TYPES_METADATA: Record<NodeKind, InsightTypeMetadata> = {
icon: IconHogQL,
inMenu: false,
},
[NodeKind.TracesQuery]: {
name: 'LLM Observability Traces',
icon: IconHogQL,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do IconAI, like in the navbar

Comment on lines 25 to 44
"""
select
properties.$ai_trace_id as trace_id,
min(timestamp) as trace_timestamp,
max(person.properties) as person,
sum(properties.$ai_latency) as total_latency,
sum(properties.$ai_input_tokens) as input_tokens,
sum(properties.$ai_output_tokens) as output_tokens,
sum(properties.$ai_input_cost_usd) as input_cost,
sum(properties.$ai_output_cost_usd) as output_cost,
sum(properties.$ai_total_cost_usd) as total_cost,
arraySort(x -> x.1, groupArray(tuple(timestamp, properties))) as events
from events
where
event = '$ai_generation'
group by
trace_id
order by
trace_timestamp desc
"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like dead code!

Comment on lines 195 to 218
ast.Alias(
expr=ast.Call(
name="arraySort",
args=[
ast.Lambda(
args=["x"],
expr=ast.Call(name="tupleElement", args=[ast.Field(chain=["x"]), ast.Constant(value=2)]),
),
ast.Call(
name="groupArray",
args=[
ast.Tuple(
exprs=[
ast.Field(chain=["uuid"]),
ast.Field(chain=["timestamp"]),
ast.Field(chain=["properties"]),
]
)
],
),
],
),
alias="events",
),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rest of those columns is pretty readable in direct AST form, but in this one's case I propose parse_expr – it's a chonker

Comment on lines 234 to 251
def _get_where_clause(self):
event_expr = ast.CompareOperation(
left=ast.Field(chain=["event"]),
op=ast.CompareOperationOp.Eq,
right=ast.Constant(value="$ai_generation"),
)
if self.query.traceId is not None:
return ast.And(
exprs=[
event_expr,
ast.CompareOperation(
left=ast.Field(chain=["id"]),
op=ast.CompareOperationOp.Eq,
right=ast.Constant(value=self.query.traceId),
),
]
)
return event_expr
Copy link
Member

@Twixes Twixes Jan 14, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This MUST have a time range, otherwise we're going to be scanning ClickHouse parts furiously and slowly. A tricky-ish thing here, because we also must ensure traces are selected in full, without missing any generations at the beginning or end.

With traces we can make some reasonable assumptions to make this straightforward.

Let's assume a trace can never be longer than 10 minutes (won't always be true, but almost always true is enough). With this assumption in mind, we'll filter this way in the SQL timestamp > {date_from} - INTERVAL 10 MINUTE AND timestamp < {date_to} + INTERVAL 10 MINUTE. Then in _map_results we'll cut off traces that started before date_from. This way we will be skipping traces that would be selected partially at the beginning of the range, and selecting full traces at the end of the range. The result: consistent results and pagination as simple as always.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I missed this part, but you're totally right. One addition I've made is filtering for traces that first appeared after the date_to as they are outside the date range.

@skoob13 skoob13 requested a review from Twixes January 14, 2025 20:49
@skoob13
Copy link
Contributor Author

skoob13 commented Jan 15, 2025

Some flaky tests.

@skoob13 skoob13 force-pushed the feat/llm-observability-query-runner branch from 027d646 to 42ea56d Compare January 15, 2025 09:01
@PostHog PostHog deleted a comment from posthog-bot Jan 15, 2025
@PostHog PostHog deleted a comment from posthog-bot Jan 15, 2025
@PostHog PostHog deleted a comment from posthog-bot Jan 15, 2025
@PostHog PostHog deleted a comment from posthog-bot Jan 15, 2025
@PostHog PostHog deleted a comment from posthog-bot Jan 15, 2025
@skoob13
Copy link
Contributor Author

skoob13 commented Jan 15, 2025

should be ok now

Copy link
Member

@Twixes Twixes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

@skoob13 skoob13 merged commit b46f351 into master Jan 15, 2025
99 checks passed
@skoob13 skoob13 deleted the feat/llm-observability-query-runner branch January 15, 2025 13:50
Copy link

sentry-io bot commented Jan 16, 2025

Suspect Issues

This pull request was deployed and Sentry observed the following issues:

  • ‼️ CHQueryErrorIllegalTypeOfArgument: DB::Exception: Illegal type String of argument for aggregate function sum. Stack trace: posthog.tasks.tasks.process_query_task View Issue

Did you find this useful? React with a 👍 or 👎

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants