-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PSC-STM-B3: Finish PSC Kinesis Stream Transformer #251
Comments
The existing code works for consuming from multiple Kinesis shards. However, the manner by which it does that isn't optimal:
There are a number of considerations with this approach:
Extending to support additional threads or processes likely wouldn't be too much work; however, multi-threading hasn't always been smooth-sailing with existing bulk data (i.e. non-stream) transformations, and I'm concerned this could lead to more conflicts when writing to Elasticsearch resulting in program crashes. Despite these limitations, the existing approach is likely good enough for us at present, because we're using only a single shard per stream, and even a single shard is able to cope with a far higher throughput than we're able to cope with, given how long it takes to process each statement. Not only that, but using multiple shards affects event order, and this would have to be considered carefully for our use case, especially given that statements are generally order-sensitive. |
Kinesis Quotas and LimitsData stream throughput Provisioned mode
https://docs.aws.amazon.com/streams/latest/dev/service-sizes-and-limits.html This is approximately 3 orders of magnitude faster on write than we're currently utilising. |
In keeping with recent work on other parts of the program, I'm not writing extra formal tests for this. I am not convinced of the benefit of doing so, especially as keeping to the previous pattern would result in calls to Kinesis and other external services being stubbed (i.e. not actually executed live) anyway. I note there are some existing tests checking some overall calls, but extending these would be significant work, and I'm unpersuaded about the merit of doing so considering other details of the project, codebase, and roadmap. |
The main work recently (and monthly import) have involved running the bulk transformer, which transforms from the S3 files produced from buffering the Kinesis stream.
This means the app which consumes from Kinesis directly hasn’t been run or updated recently.
Estimate: 4 hours
The text was updated successfully, but these errors were encountered: