Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal for an API redesign #57

Open
julik opened this issue May 30, 2020 · 2 comments
Open

Proposal for an API redesign #57

julik opened this issue May 30, 2020 · 2 comments
Assignees

Comments

@julik
Copy link
Contributor

julik commented May 30, 2020

We've had a good run with Sqewer, with 5 years of operation and billions of jobs being processed. There were good times and bad times, but it seems like we have a chance of meaningfully upgrading Sqewer into the next phase. I believe it is necessary so the various services that we run can benefit from more SQS features than previously, and also provide some cost savings.

I propose replacing the .submit! API we have at the moment and the configuration API.

Configuration API

The proposed configuration API looks like this:

Sqewer.configure do |config|
  config.sqs_endpoint_url = some_url # defaults to SQS_QUEUE_URL environment variable value
  config.serializer = MyCustomSerializer.new # defaults to Sqewer::Serializer.default
  config.logger = SomeLogger.new # defaults to Sqewer::Logger.default
  config.middleware_handlers = [some_middleware, another_middleware, yet_another_middleware]
  config.worker = MyCustomizedWorker.new # defaults to Sqewer::Worker.default
end

Reasoning: currently most of our services define a custom Sqewer worker class. The customizations that we apply in that custom class are pretty much in-scope for Sqewer itself though. I think that moving these to stable configuration options and allowing them to be configured from one place is appropriate.

Submission (message sending) API

Currently we have two ways of sending messages using Sqewer. One is using Sqewer.submit! which guarantees that your message will be delivered to the queue immediately, and the call will fail if that doesn't succeed. We also have the Executor object passed to every job's run() method, which also responds to .submit! with the same arguments. When an Executor is used, we apply batching to the messages generated during a single run of a job. This was done so that the if a job forward-spools descendant jobs, they would be delivered to the queue at once - saving money due to batching. When using bare Sqewer.submit! batching was not used for fear of "dropping jobs" if a process that has buffered some jobs terminates, but the messages have not been sent.

I propose changing the submission API in a way that prioritizes batching over guaranteed delivery to the queue, by removing the Executor objects entirely. Instead of providing submit! on an Executor and submit! on Sqewer itself with different semantics, we will instead provide submit! and submit_immediately! on Sqewer itself.

Sqewer.submit_immediately! would ensure that the jobs passed to it get serialized together (and if any job fails to serialize none get delivered to SQS) and then deliver them to SQS in batches before returning.

Sqewer.submit! would first serialize all jobs, if any of the jobs fails to serialize it would raise an exception before buffering them for sending. Once the jobs are converted into messages ready for sending, they will be added to an in-memory Queue. This queue would be emptied by a Thread running in the background at regular intervals - for instance once every 2 seconds. All messages picked up after this fixed interval would be delivered to SQS in batches.

The reasoning is as follows:

  • Even though we prioritised message delivery over batching, we still had a bug where this delivery and/or serialization could fail silently
  • Our ActiveJob adapter, which currently powers the main site, does not do batching at all because ActiveJob does not provide a "transaction" semantic for buffering the jobs during an execution of another job - or during the execution of another Rails unit of work for that matter. Using buffering for submit! would immediately enable batching for ActiveJob, allowing for substantial savings
  • Delivery can be wrapped in a separate error-tracking transaction, in all cases
  • The choice is clearly placed on the developer/consumer to make a choice for less efficient batching and delivery guarantee with submit_immediately! as opposed to sane default batching and a deferred failure with just submit!.

Very interested in your thoughts about this!

@julik
Copy link
Contributor Author

julik commented May 30, 2020

Closes #21

@linkyndy
Copy link
Contributor

linkyndy commented Jun 3, 2020

Thanks @julik for starting this! 🙌

I highly support the config block; that would make things a lot smoother and make working with Sqewer a lot nicer.

Regarding the second point, I like what I am reading. However, would that time-based buffer really help? We could also flush it earlier if there are 10 jobs (SQS' limit) in there.

On a different note, I would base Sqewer on either of the two concepts. I think libraries like these should be somewhat opinionated, and by choosing one concept, it would 1) make it easier for developers to reason about (one way of doing things) and 2) make it easier to write and maintain Sqewer (less things that can go wrong). Given SQS' guarantees, I would go for your first concept, that is, the one without having the buffer. When submitting jobs I value the delivery guarantee more than anything else. It would also be the "expected" behaviour for such a library.

Of course, I would add support to submit multiple jobs at once (which is pretty "ugly" to set up at the moment). That would also batch jobs before sending, as you mentioned.

Regarding the ActiveJob adapter, I won't expect it to support everything a job processing library may offer. It should work for the basic scenarios, but for the more complex ones, one should use the library directly.

As a side note, you've asked me whether I'd want to continue working on connection pooling. I think that's still valuable, however I would prioritise the work you've mentioned here. If we can also cache the SQS client between connections (so that it fetches credentials less often), I think we might just get away without connection pooling.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants