-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSC-STM-B1: Add Storage for PSC Kinesis pointers per shard #249
Comments
Clarification from PDF about meaning of PSC-STM-B section: Part B: Continuous TransformationGoal |
It was found that the existing Transformer PSC streaming code already handled storage of stream pointers, including for multiple shards. Some experiments with this code were made, and things seem to be working as they ought with regards to handling the Kinesis stream itself. Note that there isn't a single thread per shard, contrary to Kinesis recommendations, but since we're currently using a single shard per stream and in consideration of the potential of race conditions within BODS statement publishing (e.g. Transformer PSC bulk import), this seems fine for now. Various logging was added, since there wasn't any visibility into what part of the stream was being played, nor record-level logging such as was already added to Ingester PSC. It should now be far easier to see whether things are working as they ought, and what's currently happening, even if there is no data currently to be consumed.
Note that there is a 1s delay between each fetch of the stream. Although this makes things work slower than they could otherwise, this is in keeping with Kinesis recommendations. Despite this, a 24h stream with few-to-no events could be caught up within a few minutes, so this should be okay. The current lag of each shard was added to logging, to make this more apparent:
|
This is similar to Task T3, where a stream pointer will need to be stored and retrieved when processing from the Kinesis Stream.
A slight complexity is that stream pointers actually exist per shard as opposed to for the whole stream, so it is necessary for each shard consumer to do this. In practice, we currently only use a single shard, so for an initial release it would suffice to assume this.
Estimate: 4 hours
The text was updated successfully, but these errors were encountered: