-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cross event transformation #331
Changes from all commits
3c8e391
3c31b74
869441b
f46f398
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
/** | ||
* Copyright (c) 2020-present Snowplow Analytics Ltd. | ||
* All rights reserved. | ||
* | ||
* This software is made available by Snowplow Analytics, Ltd., | ||
* under the terms of the Snowplow Limited Use License Agreement, Version 1.0 | ||
* located at https://docs.snowplow.io/limited-use-license-1.0 | ||
* BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION | ||
* OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. | ||
*/ | ||
|
||
package models | ||
|
||
import "time" | ||
|
||
// MessageBatch houses batches of messages, for batch transformations to operate across | ||
type MessageBatch struct { | ||
OriginalMessages []*Message // Most targets will use the data from here, but where we have a http templating transformation, we would use this to ack batches of messages | ||
jbeemster marked this conversation as resolved.
Show resolved
Hide resolved
|
||
BatchData []byte // Where we template http requests, we use this to define the body of the request | ||
HTTPHeaders map[string]string // For dynamic headers feature | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I am not a huge fan of having a Would something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't love it either. This actually comes from having implemented the dynamic header transformation - which has the same problem for the message model. I considered that way of doing things but it's a trade-off - the downside of that approach is that you have a more obscure API and the target's logic depends on that specific key, but the api defines it as being anything. I experimented with other things we could do but didn't find an elegant solution (yet), and it didn't feel like it serves the project well to labour on it for too long. Right now we only have one thing that needs to do this, so my thinking was that this will do for the moment but when we need to design for further similar things we should revisit the api design. I'm not massively opposed to doing what you suggest either - I just haven't given up on finding something better There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Fair point! As long as the seed of "maybe we should change this" is planted I am fine with it staying where it is. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not even that much of a maybe to be honest! Just needs to percolate a bit. Perhaps we will even see the answer when the rest of this refactor falls into place. |
||
TimeRequestStarted time.Time | ||
TimeRequestFinished time.Time | ||
} | ||
|
||
// BatchTransformationResult houses the result of a batch transformation | ||
type BatchTransformationResult struct { | ||
Success []*MessageBatch | ||
Invalid []*Message | ||
Oversized []*Message | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
/** | ||
* Copyright (c) 2020-present Snowplow Analytics Ltd. | ||
* All rights reserved. | ||
* | ||
* This software is made available by Snowplow Analytics, Ltd., | ||
* under the terms of the Snowplow Limited Use License Agreement, Version 1.0 | ||
* located at https://docs.snowplow.io/limited-use-license-1.0 | ||
* BY INSTALLING, DOWNLOADING, ACCESSING, USING OR DISTRIBUTING ANY PORTION | ||
* OF THE SOFTWARE, YOU AGREE TO THE TERMS OF SUCH LICENSE AGREEMENT. | ||
*/ | ||
|
||
package target | ||
|
||
import ( | ||
"github.com/snowplow/snowbridge/pkg/models" | ||
batchtransform "github.com/snowplow/snowbridge/pkg/transform/batch" | ||
) | ||
|
||
// chunkBatcherWithConfig returns a batch transformation which incorporates models.GetChunkedMessages() into the batch transformation model. | ||
// It is done this way in order to pass GetChunkedMessages its config within the confines of the BatchTransfomration design | ||
func chunkBatcherWithConfig(chunkSize int, maxMessageByteSize int, maxChunkByteSize int) batchtransform.BatchTransformationFunction { | ||
|
||
// chunkBatcher is a batch transformation which incorporates models.GetChunkedMessages() into the batch transformation model, | ||
// preserving the original logic and ownership of the function. | ||
chunkBatcher := func(batchesIn []*models.MessageBatch) ([]*models.MessageBatch, []*models.Message, []*models.Message) { | ||
oversizedOut := make([]*models.Message, 0) | ||
chunkedBatches := make([]*models.MessageBatch, 0) | ||
|
||
for _, batch := range batchesIn { | ||
chunks, oversized := models.GetChunkedMessages(batch.OriginalMessages, chunkSize, maxMessageByteSize, maxChunkByteSize) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think so - the problem is that where we have templating, this logic must occur before the templater. |
||
|
||
oversizedOut = append(oversizedOut, oversized...) | ||
|
||
for _, chunk := range chunks { | ||
asBatch := &models.MessageBatch{ | ||
OriginalMessages: chunk, | ||
} | ||
|
||
chunkedBatches = append(chunkedBatches, asBatch) | ||
} | ||
|
||
} | ||
return chunkedBatches, nil, oversizedOut | ||
} | ||
|
||
return chunkBatcher | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i wonder whether there is an alternative so that the target does not need to know about batch transformations. Are there other reasons for this besides the chunking and possible group-by?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I started with a design that segmented them, but it left things very messy because the target needs to be aware of the transformation in order to decide how to send the data.
Similarly, the dynamic headers feature leaves us with a challenge here. Necessarily it must group data by headers, before a request template is created.
From a configuration perspective, if this logic is upstream of the target, it seems very easy to break the target by configuring a separate feature.
I don't know if it's the best design, but it's what I came up with as an attempt to reconcile this with the concept of solving for batch transformations more generically.