You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
These are my proposals that are using at The Globe and Mail successfully:
Event Latency Filter: Sometimes, the tracker sends the events late or there is a bug and the event has the data like 1970. When storing the event, it doesn't make any sense. We can just ignore them.
Field Value Filter: For example, filtering events based on event app_id. We have a lot of stakeholders with different interests and this way we just send them the data from the source that they need.
Deduplication filter: We can select a field, i.e. event_fingerprint and keep a cache of recent events fingerprints. If we have seen them recently, we just discard them. This helped us with 5% less duplicated events being sent to our Elasticsearch/Postgres/Kinesis modules emitters.
The text was updated successfully, but these errors were encountered:
KCL has a good feature that we are missing: Filters
We are currently using
AllPassFilter
:snowplow-elasticsearch-loader/core/src/main/scala/com.snowplowanalytics.stream/loader/KinesisPipeline.scala
Line 77 in 003f24b
These are my proposals that are using at The Globe and Mail successfully:
app_id
. We have a lot of stakeholders with different interests and this way we just send them the data from the source that they need.event_fingerprint
and keep a cache of recent events fingerprints. If we have seen them recently, we just discard them. This helped us with 5% less duplicated events being sent to our Elasticsearch/Postgres/Kinesis modules emitters.The text was updated successfully, but these errors were encountered: