Cross event transformation #331

colmsnowplow · 2024-06-17T16:43:54Z

This isn't ready for develop, but the refactoring part of things is basically done, so it feels like a good time to get a review and split up the rest of the work to separate it from the complex bits.

There are still a lot of TODO notes etc in the code, and the work to migrate the targets to the new structure, fix and add tests etc. is yet to happen.

But at this point this PR is in service of getting a review of the overall structure. we can treat the feature branch as a WIP, and get it ready for a PR to develop bit by bit.

jbeemster · 2024-06-19T07:10:48Z

pkg/target/kinesis.go

@@ -127,6 +127,7 @@ func AdaptKinesisTargetFunc(f func(c *KinesisTargetConfig) (*KinesisTarget, erro
 func (kt *KinesisTarget) Write(messages []*models.Message) (*models.TargetWriteResult, error) {
 	kt.log.Debugf("Writing %d messages to stream ...", len(messages))

+	// TODO: Replace with new batch transformation


By "replace" you mean change from passing in messages []*models.Message to passing in batches []models.MessageBatch? I imagine the chunking logic here would need to stay still but you are just working across batches instead of messages right?

I mean replace this bit where we call the Chunk function directly with the new function we have in this PR and pass it as a batch transformation.

The chunking logic necessarily remains yes - the change here is just to run it as part of the new structure.

pkg/models/batch_transformation.go

jbeemster · 2024-06-19T07:14:45Z

pkg/models/batch_transformation.go

+type MessageBatch struct {
+	OriginalMessages    []*Message        // Most targets will use the data from here, but where we have a http templating transformation, we would use this to ack batches of messages
+	BatchData           []byte            // Where we template http requests, we use this to define the body of the request
+	HTTPHeaders         map[string]string // For dynamic headers feature


I am not a huge fan of having a target specific implementation mixed in with a generic model here. Is there another way to carry this data through without it being specifically mapped to http?

Would something like Parameters map[string]interface{} where in http you can then access a HTTPHeaders key within the Parameters work to abstract this away?

I don't love it either. This actually comes from having implemented the dynamic header transformation - which has the same problem for the message model.

I considered that way of doing things but it's a trade-off - the downside of that approach is that you have a more obscure API and the target's logic depends on that specific key, but the api defines it as being anything.

I experimented with other things we could do but didn't find an elegant solution (yet), and it didn't feel like it serves the project well to labour on it for too long.

Right now we only have one thing that needs to do this, so my thinking was that this will do for the moment but when we need to design for further similar things we should revisit the api design.

I'm not massively opposed to doing what you suggest either - I just haven't given up on finding something better

Fair point! As long as the seed of "maybe we should change this" is planted I am fine with it staying where it is.

Not even that much of a maybe to be honest! Just needs to percolate a bit. Perhaps we will even see the answer when the rest of this refactor falls into place.

pkg/models/batch_transformation.go

pondzix · 2024-06-19T10:39:53Z

cmd/aws/cli/main.go

@@ -14,6 +14,7 @@ package main
 import (
 	"github.com/snowplow/snowbridge/cmd/cli"
 	"github.com/snowplow/snowbridge/config"
+	"github.com/snowplow/snowbridge/pkg/batchtransform/batchtransformconfig"


We have already github.com/snowplow/snowbridge/pkg/transform so what do you think about making batch part of it, like: "github.com/snowplow/snowbridge/pkg/transform/batch/....?

Hmm good point - batch transformations are a different class of thing to my mind - hence why I did it this way to begine with.

But from a project organisation point of view it might make more sense to organise things so that we have transform/batch and transform/{the existing ones}. Major downside is that we now need to come up with a name for that :D lmk if you have ideas

single....? batchbutwithsize1...? I give up, let's leave it :p

notbatch it is

colmsnowplow · 2024-06-19T15:27:40Z

@jbeemster should be pointers now and @pondzix redid the dir structure :)

adatzer

Happy to approve at this point. No strong objections. My comments are only points i'd like to understand better or discuss for alternatives.

pkg/batchtransform/template.go

adatzer · 2024-06-19T15:40:13Z

cmd/cli/cli.go

@@ -211,7 +228,7 @@ func sourceWriteFunc(t targetiface.Target, ft failureiface.Failure, tr transform
 		messagesToSend := transformed.Result

 		res, err := retry.ExponentialWithInterface(5, time.Second, "target.Write", func() (interface{}, error) {
-			res, err := t.Write(messagesToSend)
+			res, err := t.Write(messagesToSend, btr)


i wonder whether there is an alternative so that the target does not need to know about batch transformations. Are there other reasons for this besides the chunking and possible group-by?

I started with a design that segmented them, but it left things very messy because the target needs to be aware of the transformation in order to decide how to send the data.

Similarly, the dynamic headers feature leaves us with a challenge here. Necessarily it must group data by headers, before a request template is created.

From a configuration perspective, if this logic is upstream of the target, it seems very easy to break the target by configuring a separate feature.

I don't know if it's the best design, but it's what I came up with as an attempt to reconcile this with the concept of solving for batch transformations more generically.

adatzer · 2024-06-19T15:49:30Z

pkg/target/common.go

+		chunkedBatches := make([]*models.MessageBatch, 0)
+
+		for _, batch := range batchesIn {
+			chunks, oversized := models.GetChunkedMessages(batch.OriginalMessages, chunkSize, maxMessageByteSize, maxChunkByteSize)


GetChunkedMessages now feels more like it belongs to target interface. Do you think this (it actually becoming a target method) could be of help to move batch transformation logic upstream of target.Write?

I don't think so - the problem is that where we have templating, this logic must occur before the templater.

adatzer · 2024-06-19T16:03:50Z

pkg/target/http.go

-func (ht *HTTPTarget) Write(messages []*models.Message) (*models.TargetWriteResult, error) {
+// When we have dynamic headers, batching by header must necessarily run first. This is a http specific function,
+// so defining it here and fixing it into the Write function avoids complexity in configuration
+func (ht *HTTPTarget) groupByDynamicHeaders(batches []*models.MessageBatch) ([]*models.MessageBatch, []*models.Message, []*models.Message) {


I think i have an idea, but not sure exactly about what you mean by configuration complexity. Is it about the feature switch? So far i'd still prefer batch transformations decoupled, so commenting to understand whether this could be a counter argument.

We must group the data by dynamic headers before we create a request template. If we create the template before that then we cannot batch only events with the same headers together afterwards.

If we decouple this from the target, then the user has to understand this nuance of how the code works in order to configure the app correctly. Or to put it another way - they can have a valid configuration that fits the API that we have provided, which breaks a feature.

Does that make sense?

Thanks for explaining @colmsnowplow ! It does make more sense.
To confirm, the order before HTTP target writes is:
Group-by > Chunks > Templater
Is this correct?
A side question: For the chuck size limits we won't take into account the template size. Is this ok?

I don't see a reason we can't take the size of the template into account. But that's more of an implementation detail than a matter of the design

colmsnowplow · 2024-06-20T14:16:31Z

Closing in favour of #333, which allows us to avoid the more complex aspects, at least until we reach the need for the features/functionality that it would facilitate.

Thanks for the feedback @adatzer and @pondzix - wouldn't have landed on the better path without that input

colmsnowplow changed the base branch from develop to feat/batch-templating June 17, 2024 16:45

Refactor structure to allow Batch Transformations

3c8e391

colmsnowplow force-pushed the cross-event-transformation branch from e2f119a to 3c8e391 Compare June 17, 2024 16:58

jbeemster reviewed Jun 19, 2024

View reviewed changes

pkg/models/batch_transformation.go Show resolved Hide resolved

jbeemster reviewed Jun 19, 2024

View reviewed changes

pkg/models/batch_transformation.go Outdated Show resolved Hide resolved

pondzix reviewed Jun 19, 2024

View reviewed changes

colmsnowplow added 3 commits June 19, 2024 16:13

Josh's pointer to use pointers

3c31b74

Move existing transforms to 'single' directory

869441b

Move batch transformations under the transform umbrella

f46f398

adatzer approved these changes Jun 19, 2024

View reviewed changes

colmsnowplow closed this Jun 20, 2024

colmsnowplow deleted the cross-event-transformation branch July 10, 2024 10:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross event transformation #331

Cross event transformation #331

colmsnowplow commented Jun 17, 2024 •

edited

Loading

jbeemster Jun 19, 2024

colmsnowplow Jun 19, 2024

jbeemster Jun 19, 2024

colmsnowplow Jun 19, 2024

jbeemster Jun 19, 2024

colmsnowplow Jun 19, 2024

pondzix Jun 19, 2024

colmsnowplow Jun 19, 2024

pondzix Jun 19, 2024

colmsnowplow Jun 19, 2024

colmsnowplow commented Jun 19, 2024

adatzer left a comment

adatzer Jun 19, 2024

colmsnowplow Jun 19, 2024

adatzer Jun 19, 2024

colmsnowplow Jun 19, 2024

adatzer Jun 19, 2024

colmsnowplow Jun 19, 2024

adatzer Jun 19, 2024

colmsnowplow Jun 19, 2024 •

edited

Loading

colmsnowplow commented Jun 20, 2024

Cross event transformation #331

Cross event transformation #331

Conversation

colmsnowplow commented Jun 17, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colmsnowplow commented Jun 19, 2024

adatzer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

colmsnowplow Jun 19, 2024 • edited Loading

Choose a reason for hiding this comment

colmsnowplow commented Jun 20, 2024

colmsnowplow commented Jun 17, 2024 •

edited

Loading

colmsnowplow Jun 19, 2024 •

edited

Loading