You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We've had a good run with Sqewer, with 5 years of operation and billions of jobs being processed. There were good times and bad times, but it seems like we have a chance of meaningfully upgrading Sqewer into the next phase. I believe it is necessary so the various services that we run can benefit from more SQS features than previously, and also provide some cost savings.
I propose replacing the .submit! API we have at the moment and the configuration API.
Configuration API
The proposed configuration API looks like this:
Sqewer.configuredo |config|
config.sqs_endpoint_url=some_url# defaults to SQS_QUEUE_URL environment variable valueconfig.serializer=MyCustomSerializer.new# defaults to Sqewer::Serializer.defaultconfig.logger=SomeLogger.new# defaults to Sqewer::Logger.defaultconfig.middleware_handlers=[some_middleware,another_middleware,yet_another_middleware]config.worker=MyCustomizedWorker.new# defaults to Sqewer::Worker.defaultend
Reasoning: currently most of our services define a custom Sqewer worker class. The customizations that we apply in that custom class are pretty much in-scope for Sqewer itself though. I think that moving these to stable configuration options and allowing them to be configured from one place is appropriate.
Submission (message sending) API
Currently we have two ways of sending messages using Sqewer. One is using Sqewer.submit! which guarantees that your message will be delivered to the queue immediately, and the call will fail if that doesn't succeed. We also have the Executor object passed to every job's run() method, which also responds to .submit! with the same arguments. When an Executor is used, we apply batching to the messages generated during a single run of a job. This was done so that the if a job forward-spools descendant jobs, they would be delivered to the queue at once - saving money due to batching. When using bare Sqewer.submit! batching was not used for fear of "dropping jobs" if a process that has buffered some jobs terminates, but the messages have not been sent.
I propose changing the submission API in a way that prioritizes batching over guaranteed delivery to the queue, by removing the Executor objects entirely. Instead of providing submit! on an Executor and submit! on Sqewer itself with different semantics, we will instead provide submit! and submit_immediately! on Sqewer itself.
Sqewer.submit_immediately! would ensure that the jobs passed to it get serialized together (and if any job fails to serialize none get delivered to SQS) and then deliver them to SQS in batches before returning.
Sqewer.submit! would first serialize all jobs, if any of the jobs fails to serialize it would raise an exception before buffering them for sending. Once the jobs are converted into messages ready for sending, they will be added to an in-memory Queue. This queue would be emptied by a Thread running in the background at regular intervals - for instance once every 2 seconds. All messages picked up after this fixed interval would be delivered to SQS in batches.
The reasoning is as follows:
Even though we prioritised message delivery over batching, we still had a bug where this delivery and/or serialization could fail silently
Our ActiveJob adapter, which currently powers the main site, does not do batching at all because ActiveJob does not provide a "transaction" semantic for buffering the jobs during an execution of another job - or during the execution of another Rails unit of work for that matter. Using buffering for submit! would immediately enable batching for ActiveJob, allowing for substantial savings
Delivery can be wrapped in a separate error-tracking transaction, in all cases
The choice is clearly placed on the developer/consumer to make a choice for less efficient batching and delivery guarantee with submit_immediately! as opposed to sane default batching and a deferred failure with just submit!.
Very interested in your thoughts about this!
The text was updated successfully, but these errors were encountered:
I highly support the config block; that would make things a lot smoother and make working with Sqewer a lot nicer.
Regarding the second point, I like what I am reading. However, would that time-based buffer really help? We could also flush it earlier if there are 10 jobs (SQS' limit) in there.
On a different note, I would base Sqewer on either of the two concepts. I think libraries like these should be somewhat opinionated, and by choosing one concept, it would 1) make it easier for developers to reason about (one way of doing things) and 2) make it easier to write and maintain Sqewer (less things that can go wrong). Given SQS' guarantees, I would go for your first concept, that is, the one without having the buffer. When submitting jobs I value the delivery guarantee more than anything else. It would also be the "expected" behaviour for such a library.
Of course, I would add support to submit multiple jobs at once (which is pretty "ugly" to set up at the moment). That would also batch jobs before sending, as you mentioned.
Regarding the ActiveJob adapter, I won't expect it to support everything a job processing library may offer. It should work for the basic scenarios, but for the more complex ones, one should use the library directly.
As a side note, you've asked me whether I'd want to continue working on connection pooling. I think that's still valuable, however I would prioritise the work you've mentioned here. If we can also cache the SQS client between connections (so that it fetches credentials less often), I think we might just get away without connection pooling.
We've had a good run with Sqewer, with 5 years of operation and billions of jobs being processed. There were good times and bad times, but it seems like we have a chance of meaningfully upgrading Sqewer into the next phase. I believe it is necessary so the various services that we run can benefit from more SQS features than previously, and also provide some cost savings.
I propose replacing the
.submit!
API we have at the moment and the configuration API.Configuration API
The proposed configuration API looks like this:
Reasoning: currently most of our services define a custom Sqewer worker class. The customizations that we apply in that custom class are pretty much in-scope for Sqewer itself though. I think that moving these to stable configuration options and allowing them to be configured from one place is appropriate.
Submission (message sending) API
Currently we have two ways of sending messages using Sqewer. One is using
Sqewer.submit!
which guarantees that your message will be delivered to the queue immediately, and the call will fail if that doesn't succeed. We also have theExecutor
object passed to every job'srun()
method, which also responds to.submit!
with the same arguments. When anExecutor
is used, we apply batching to the messages generated during a singlerun
of a job. This was done so that the if a job forward-spools descendant jobs, they would be delivered to the queue at once - saving money due to batching. When using bareSqewer.submit!
batching was not used for fear of "dropping jobs" if a process that has buffered some jobs terminates, but the messages have not been sent.I propose changing the submission API in a way that prioritizes batching over guaranteed delivery to the queue, by removing the
Executor
objects entirely. Instead of providingsubmit!
on anExecutor
andsubmit!
onSqewer
itself with different semantics, we will instead providesubmit!
andsubmit_immediately!
onSqewer
itself.Sqewer.submit_immediately!
would ensure that the jobs passed to it get serialized together (and if any job fails to serialize none get delivered to SQS) and then deliver them to SQS in batches before returning.Sqewer.submit!
would first serialize all jobs, if any of the jobs fails to serialize it would raise an exception before buffering them for sending. Once the jobs are converted into messages ready for sending, they will be added to an in-memory Queue. This queue would be emptied by a Thread running in the background at regular intervals - for instance once every 2 seconds. All messages picked up after this fixed interval would be delivered to SQS in batches.The reasoning is as follows:
submit!
would immediately enable batching for ActiveJob, allowing for substantial savingssubmit_immediately!
as opposed to sane default batching and a deferred failure with justsubmit!
.Very interested in your thoughts about this!
The text was updated successfully, but these errors were encountered: