-
Notifications
You must be signed in to change notification settings - Fork 867
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rate Limiting #441
Comments
Have you read https://github.com/LearnBoost/kue#processing-concurrency for throttling jobs? |
in a distributed work pulling with done acknowledgement, there's no easy way to implement "n jobs done per t seconds" since you don't know when jobs will be done. |
The current method for processing concurrency only limits the number on jobs running at the same time, it doesn't really rate limit it. What I was looking for is rate limiting the start of jobs. I could use some external module but there doesn't seem to be anything decent out there (can't see where in q it can be done) and it would be much easier if kue itself was able to provide this functionality. |
pordon, my mistake! |
The problem with things like node-rate-limiter is that it's not distributed. If I have multiple instances of my code base running, the rate limiter is on a per-instance basis. I have yet to see one that is backed by something like redis in order to preserve the limit across all instances. I would write my own, but having it directly in Kue makes it much easier as there isn't a separate layer to configure/maintain |
Hi All, I just want to restart this conversation about rate limiting with Kue, I am currently looking to do something quite complicated using Kue and @jesucarr's token bucket implementation and I'm wondering if anyone else has an idea of what exactly they are looking for so that we might actually collaborate and maybe get a PR together? Is this the best place to have an architectural discussion about this? |
you can share here what you are thinking about @mansona |
I'll just give a quick overview of the use case for now and maybe go into more detail if it is required. There is a whole structure as to why I want to do things this way so if you feel like it is useful for me to give more context just let me know. (It might be a very long post so be warned 😉 ) I am currently working with the Twitter API and I want to push as much throughput with a single job processor as possible. You can see on this page that you are limited to 180 The way that my jobs are going to be laid out is that I am going to be requesting between 12 and 20 profiles at a time (maximum) which means that I'm only going to be able to batch 12 profiles together (worst case) if I limit my "batching" to 1 batch per job. This reduces my potential throughput to 2160, which is a meagre 12% of the theoretical maximum. The idea that I'm coming up with is that I want my original job, lets call it getTwelveProfiles, doesn't actually do the profile request directly. Instead it in turn batches the jobs in some way and this is where the TokenBucket algorithm comes in. For the example below to work i am assuming that each getTwelveProfiles call actually creates 12 new lookupProfile jobs each with the profileId that it needs to lookup My original thought would be for us to integrate @jesucarr's token bucket repo directly into Kue. I know this would be quite a difficult thing to do but I think it would provide such an interesting API that it might just be worth it. It might even offer a different solution to #493 as a side affect. The idea would be to provide an api something like this for the worker:
Internally This implementation would also allow for you to start multiple processors with different tokenBuckets, in my use case that makes sense because Twitter limits request on a per user basis. So if I had more users and access tokens I could get more throughput through the system by just adding another worker with a new access token. I hope that explains the use case, please let me know if you have any questions. I have a need to implement something along these lines in the coming days so if it is something that you are interested in adding to the project I could work on a fork with the hope of creating a PR to add the functionality. I would probably need some guidance a) to know if this is just a stupid idea or not 😖 and b) if I were to implement this what the best strategy might be. |
At very first I think,
Optimistic method is: after worker is BLPOPed, and before it fetches new job, Then worker should return swallowed item into helper list, and prevent from fetching the job... another method is to always check if tokens are available and then wait on BLPOP and fetch job ignoring a double check on token bucket. After current job is finished again check for tokens... |
I have a similar problem. Limiting API requests per second.
Call this in nodejs with:
This works in distributed environments, because redis script is guaranteed to be executed atomically. I am not familiar with the kue implementation but you should be able to call this just before a worker pulls a job. |
So has anyone found out a way to limit x jobs until completed and then run the next set of jobs? For example you have 6 orders go through and nobody else can process anymore orders until one of the 6 are done. |
Hi Folks 👋 I just wanted to give you an indication on how i got this working in the end. This is not ideal but it gives a good indication of what I'm trying to achieve.
So how this works is it doesn't request anything from twitter until there are the required tokens in the bucket, the problem with this is that the job becomes active before it has the right number of tokes it is requesting. The ideal solution to integrate these two methods would see the job not becoming active until there are enough tokens in the bucket. Does that make sense or do you need me to explain it any better? |
Some thoughts on an old thread:
This solution will work in large distributed setups and can be scaled to any amount of processing workers (but there should only be one backlog worker, or carefully calculating shared rate among more of them. Just my 2 cents. / Michael EDIT: Updated the pseudo-code to be a little less pseudo. |
Nice trick @tarraq , Layers always do the job ;) |
Thanks @behrad. I haven't tried it out in practice, it's just a mind trick as it were :) On that note, been wondering, what's the maximum queue length in kue? 2 to the power of 32 - 1, the Redis list limit? |
Actually Redis ZSETs limit, and in real the MEMORY before that :) |
That should be alright then ;) |
Hmm... the problem that I have with this approach is that it doesn't really deal with the fact that things like Twitter allow 180 requests per 15 minutes per set of credentials. The way that works with the Token Bucket example above is that you would create instances of processors that have unique credentials and their own token bucket. That way you are not limiting throughput of an "interim queue" you are more closely limiting throughput of how many jobs a processor can actually process per minute. I hope this makes sense. I can probably explain it better with a graph if needs be? |
Looking forward to a rate limit feature in kue. I really like all the examples and discussion going on here. Just hoping something would be integrated into kue for a consistent solution.. Any updates on this? |
I believe best place to solve API/request rate limiting are API gateways. When we say rate-limiting, it means overflow requests are denied/rejected. So any (distributed) rate-limiter can be placed in front of Kue. |
That's correct, I'm anyways looking for a time based option only - like celery's rate limit. I am moving from a django-celery combination to a node-kue one. |
I'm also interested in this feature. My use-case is sending emails to the Amazon SES API from dozens of Node.js processes across several servers. I would like to pass all the email sending through a single job queue which make farmers out the sending to workers and makes sure that we don't exceed the prescribed number of messages/second, while still sending as fast as possible. |
+1. Github APIs have rate limiting (5000/hr) + abuse rate limits on top of this. Having a rate_limit option will be helpful. |
+1 |
1 similar comment
+1 |
I added https://www.npmjs.com/package/limiter and it works great |
For an example implementation you could have a look at Josiah Carlson post about Rate limiting with Redis. He describes a very robust implementation that also supports multiple buckets. |
I've been having a problem recently with limiting jobs rate globally, depending on the memory usage. It was an urgent issue, so, I've implemented a really simple patch that does the trick, and we're using it now on our production server: max-loginov@112b094 I went through this discussion and realized that my solution can probably fit here, as well. It allows rate-limiting of jobs either globally of job type-wise, using any imaginable condition. What do you think guys? |
Created a pull request for this feature. #1103 |
Are there any plans to introduce rate limiting of jobs?
The text was updated successfully, but these errors were encountered: