Optimise clustering of event store #642

cortadocodes · 2024-04-11T10:57:48Z

Feature request

Use Case

We need to decide which fields to cluster on in the BigQuery event store and whether to pull the event kind out as a column.

Current state

The event kind is stored in the event JSON field and is queryable but cannot be ordered by (I don't think we need to order by it). We're currently clustering on ["sender", "question_uuid"] in that order. Clustering is order-dependent on the filtered fields and must include the fields of higher priority (to the left) of a clustered field to take advantage of the clustering.

@thclark says: "We’d need to cluster on event_kind otherwise you’d have to process (for example) all the log rows every time you want to query for input or output values (remember it’s column based storage so the filters aren’t like conventional SQL, it’ll process all rows in order to apply a filter). Also, regardless of clustering I think (??) it may be more efficient to filter directly on a column than on a JSONField."

Proposed Solution

Discuss and choose:

Whether to pull the event kind out as a field
The fields to cluster on and in what order

The text was updated successfully, but these errors were encountered:

cortadocodes added decision needed A decision is required (e.g. on UX or company policy) tech-debt Technical debt (tidy up, refactoring, restructuring, caused by laziness now) labels Apr 11, 2024

cortadocodes assigned cortadocodes and thclark Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimise clustering of event store #642

Optimise clustering of event store #642

cortadocodes commented Apr 11, 2024

Optimise clustering of event store #642

Optimise clustering of event store #642

Comments

cortadocodes commented Apr 11, 2024

Feature request

Use Case

Current state

Proposed Solution