Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add collector payload to the bad stream data #129

Open
szareiangm opened this issue Aug 16, 2019 · 4 comments
Open

Add collector payload to the bad stream data #129

szareiangm opened this issue Aug 16, 2019 · 4 comments

Comments

@szareiangm
Copy link
Contributor

When we have a new integration, usually it has some events going to the bad stream. Finding them is not straight-forward because the bad stream data is not searchable in Elasticsearch due to its encoding (Base64).

Our proposal is to parse the Collector payload as a Json with its Snowplow library and add it beside the the bad row data.

@szareiangm szareiangm changed the title Add collector payload to bad stream data Add collector payload to the bad stream data Aug 16, 2019
@chuwy
Copy link
Contributor

chuwy commented Aug 16, 2019

Hi @szareiangm, that's exactly what we are planning to do in a new bad rows https://discourse.snowplowanalytics.com/t/a-new-bad-row-format/2558

@szareiangm
Copy link
Contributor Author

Hi @chuwy !

That's great. That stream of work will significantly improve our workflow.

However, about this small piece, the collector payload exists here but base64 encoded. Like no deep overhaul is needed. We have added the decoded value of the payload to test its usefulness and we believe we could share this with you guys, too. Not a lot of code change and so much benefit.

@chuwy
Copy link
Contributor

chuwy commented Aug 19, 2019

Ah, @szareiangm, you mean that you'd like to implement it as a short-term solution before new bad rows arrived? I definitely don't oppose that and think this is a great idea (which we had to implement long time ago), but I doubt that we'll manage to do another ES Loader release before bad rows arrive.

Just in case, in new bad rows, this will be not a "collector payload", but "partially enriched event payload": snowplow/iglu-central@378f485#diff-e552d9f744e44db7e869d0771da0b645R277-R414, which preserves all data from the collector payload, but more structured.

@szareiangm
Copy link
Contributor Author

Sure.
Then it means that I should not go forward with this. 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants