Optimise clustering of event store #642
Labels
decision needed
A decision is required (e.g. on UX or company policy)
tech-debt
Technical debt (tidy up, refactoring, restructuring, caused by laziness now)
Feature request
Use Case
We need to decide which fields to cluster on in the BigQuery event store and whether to pull the event kind out as a column.
Current state
The event kind is stored in the
event
JSON field and is queryable but cannot be ordered by (I don't think we need to order by it). We're currently clustering on["sender", "question_uuid"]
in that order. Clustering is order-dependent on the filtered fields and must include the fields of higher priority (to the left) of a clustered field to take advantage of the clustering.@thclark says: "We’d need to cluster on event_kind otherwise you’d have to process (for example) all the log rows every time you want to query for input or output values (remember it’s column based storage so the filters aren’t like conventional SQL, it’ll process all rows in order to apply a filter). Also, regardless of clustering I think (??) it may be more efficient to filter directly on a column than on a JSONField."
Proposed Solution
Discuss and choose:
The text was updated successfully, but these errors were encountered: