ThrottingException - Rate limit exceeded #69

brunogsa · 2017-07-26T14:40:12Z

Hey!
How are you?

Sometimes I retrieve this error when the module tries to send a lot of logs to CloudWatch.
It comes as an "Unhandled Rejection", as we saw in #32, but let's not focus in this subject for now.

In fact, CloudWatch has some limits when we send data to it, as we can see here:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/cloudwatch_limits_cwl.html

I would like to know if this module already works in the "best way it can", making batch operations and respecting the 5 requests / second restriction, per log stream.

My concern is to lose logs in this process.
If that happen, I just can't trust in CloudWatch when debugging an issue..

Do you have suggestions, anyway?
It seems that temporally decreasing the log level doesn't ease the overload on CloudWatch.

Thanks in advance!

timdp · 2017-07-26T15:22:56Z

Hi!

The Relay class that the code uses internally has an option for the interval between submissions. Currently, it's set to 2 seconds. There's no way to change it yet but I can add that. However, it wouldn't change the behavior tremendously since 2 seconds is quite long. I think it's more about the log volumes that you're sending, as you've already suggested. I've never hit the limits, personally.

There should really be code that handles these limit errors though. It could be as simple as allowing the error to happen here which should trigger a retry. Of course, without logging a warning, it could just start retrying indefinitely, meaning you'd still lose the logs.

brunogsa · 2017-07-26T17:51:56Z

Sorry, mis-clicked.

brunogsa · 2017-07-26T18:06:24Z

Not sure what's wrong...

Tried to change batchSize and submissionInterval directly into my node_modules, but I'm still retrieving this:

Unhandled rejection ThrottlingException: Rate exceeded
    at Request.extractError (/my-project/node_modules/aws-sdk/lib/protocol/json.js:48:27)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/request.js:678:14)
    at Request.transition (/my-project/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/my-project/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /my-project/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:680:12)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:115:18)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/request.js:678:14)
    at Request.transition (/my-project/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/my-project/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /my-project/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:680:12)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:115:18)
    at callNextListener (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:95:12)
    at IncomingMessage.onEnd (/my-project/node_modules/aws-sdk/lib/event_listeners.js:256:13)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
From previous event:
    at Bottleneck.schedulePriority (/my-project/node_modules/bottleneck/lib/Bottleneck.js:248:14)
    at Object.<anonymous> (/my-project/node_modules/bottleneck/lib/Bottleneck.js:4:59)
    at Bottleneck.schedule (/my-project/node_modules/bottleneck/lib/Bottleneck.js:229:36)
    at Relay._scheduleSubmission (/my-project/node_modules/winston-aws-cloudwatch/lib/relay.js:98:21)
    at Relay.submit (/my-project/node_modules/winston-aws-cloudwatch/lib/relay.js:90:12)
    at CloudWatchTransport.log (/my-project/node_modules/winston-aws-cloudwatch/lib/index.js:63:19)
    at transportLog (/my-project/node_modules/winston/lib/winston/logger.js:234:15)
    at /my-project/node_modules/winston/node_modules/async/lib/async.js:157:13
    at _each (/my-project/node_modules/winston/node_modules/async/lib/async.js:57:9)
    at Object.async.each (/my-project/node_modules/winston/node_modules/async/lib/async.js:156:9)
    at Logger.log (/my-project/node_modules/winston/lib/winston/logger.js:246:9)
    at target.(anonymous function) [as info] (/my-project/node_modules/winston/lib/winston/common.js:54:18)
    at Server.<anonymous> (/my-project/lib/api.js:46:14)
    at Server.g (events.js:292:16)
    at emitNone (events.js:86:13)
    at Server.emit (events.js:185:7)
    at emitListeningNT (net.js:1284:10)
    at _combinedTickCallback (internal/process/next_tick.js:77:11)
    at process._tickDomainCallback (internal/process/next_tick.js:128:9)

Unhandled rejection ThrottlingException: Rate exceeded
    at Request.extractError (/my-project/node_modules/aws-sdk/lib/protocol/json.js:48:27)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/request.js:678:14)
    at Request.transition (/my-project/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/my-project/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /my-project/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:680:12)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:115:18)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/request.js:678:14)
    at Request.transition (/my-project/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/my-project/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /my-project/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:680:12)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:115:18)
    at callNextListener (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:95:12)
    at IncomingMessage.onEnd (/my-project/node_modules/aws-sdk/lib/event_listeners.js:256:13)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
From previous event:
    at Bottleneck.schedulePriority (/my-project/node_modules/bottleneck/lib/Bottleneck.js:248:14)
    at Object.<anonymous> (/my-project/node_modules/bottleneck/lib/Bottleneck.js:4:59)
    at Bottleneck.schedule (/my-project/node_modules/bottleneck/lib/Bottleneck.js:229:36)
    at Relay._scheduleSubmission (/my-project/node_modules/winston-aws-cloudwatch/lib/relay.js:98:21)
    at /my-project/node_modules/winston-aws-cloudwatch/lib/relay.js:119:23
    at process._tickDomainCallback (internal/process/next_tick.js:135:7)

If I got it right, I shouldn't have this problem with:

batchSize=1
submissionInterval=5000

Since it would send 1 log every 5 seconds to CloudWatch.

The only problem would be a giant queue, holding the messages before they're properly sent, which would consume a lot of RAM, but would still work.

Do you have any idea or suggestion?

brunogsa · 2017-07-27T20:50:54Z

Well, I fixed the problem.

I can open 2 PRs, if that's ok for you:

One for making batchSize and submissionInterval configurable
One to fix this issue

Just let me know if you're interested on that.

TLDR:

Just add this line to the constructor of the CloudWatchTransport to fix the issue:

maxSequenceTokenAge: 999999

Here's what I found:

Internally, you're calling the method describeLogStreams from the AWS SDK.
The AWS SDK (for Javascript) docs says that this method has a limit of 5 transactions per second.

You can check it here:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatchLogs.html#describeLogStreams-property

Since my solution has more than 6 servers and all of them uses this module, together they were exceeding that limit. This explains the ThrottlingException.

In your code, you call describeLogStreams to retrieve the sequenceToken, which is required on method putLogEvents, that sends data to CloudWatch.

I saw that you're properly caching sequenceToken:

Initially, calling describeLogStreams, then storing the value
After that, you update this value using the response from putLogEvents

The problem was in a method you use to check if the sequenceToken can be retrieved from the cache, or if it requires to be fetched again.

It seems like the implementation of your method is always saying to fetch again no matter what.
Here's a snippet of some code. Read the XXX comments for details:

export default class CloudWatchClient {
  constructor (logGroupName, logStreamName, options) {
    debug('constructor', {logGroupName, logStreamName, options})
    this._logGroupName = logGroupName
    this._logStreamName = logStreamName
    this._options = defaults(options, {
      awsConfig: null,

      // XXX: This should be positive value, I think. I used it as a work-around to fix my problem
      maxSequenceTokenAge: -1,

      formatLog: null,
      formatLogItem: null,
      createLogGroup: false,
      createLogStream: false
    })
    this._formatter = new CloudWatchEventFormatter(this._options)
    this._sequenceTokenInfo = null
    this._client = new AWS.CloudWatchLogs(this._options.awsConfig)
    this._initializing = null

    // ...

    // XXX: I think you could just check if we have sequenceToken cached here
    _getSequenceToken () {
      const now = +new Date()

      // XXX: It's always false, apparently
      const isStale = (!this._sequenceTokenInfo ||
        this._sequenceTokenInfo.date + this._options.maxSequenceTokenAge < now)

      return isStale ? this._fetchAndStoreSequenceToken()
        : Promise.resolve(this._sequenceTokenInfo.sequenceToken)
    }

    // ...
}

That's why passing maxSequenceTokenAge to CloudWatchTransport fix the problem (as a work-around).

brunogsa · 2017-07-28T17:51:44Z

PR to make batchSize and submissionInterval configurable: #70

brunogsa · 2017-07-28T18:09:21Z

I've also opened a PR that tries to fix this issue.
Could you took a look later?

#71

timdp · 2017-07-29T11:36:44Z

Thanks for the hard work! I haven't had time to look at the PRs yet but they're at the top of my list.

deyhle · 2021-04-12T12:39:19Z

So is this issue fixed in the current version (3.0.0) or why is it still open?

I'm still getting frequent errors "Rate exceeded for logStreamName ". I have rather large logGroups with about 50+ logStreams each. Normally, 2-8 processes share a single logStream.
I'm setting submissionRetryCount to 3 instead of the default 1, other settings are left to defaults.
Does the lib manage the "Rate exceeded" errors and just resubmit (meaning, I can safely ignore them), or should I set other options for my use case? Are there any best practices for submissionInterval, submissionRetryCount and batchSize? Some documentation on those options would be great.

brunogsa closed this as completed Jul 26, 2017

brunogsa reopened this Jul 26, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ThrottingException - Rate limit exceeded #69

ThrottingException - Rate limit exceeded #69

brunogsa commented Jul 26, 2017 •

edited

Loading

timdp commented Jul 26, 2017

brunogsa commented Jul 26, 2017 •

edited

Loading

brunogsa commented Jul 26, 2017 •

edited

Loading

brunogsa commented Jul 27, 2017 •

edited

Loading

brunogsa commented Jul 28, 2017 •

edited

Loading

brunogsa commented Jul 28, 2017

timdp commented Jul 29, 2017

deyhle commented Apr 12, 2021

ThrottingException - Rate limit exceeded #69

ThrottingException - Rate limit exceeded #69

Comments

brunogsa commented Jul 26, 2017 • edited Loading

timdp commented Jul 26, 2017

brunogsa commented Jul 26, 2017 • edited Loading

brunogsa commented Jul 26, 2017 • edited Loading

brunogsa commented Jul 27, 2017 • edited Loading

TLDR:

Here's what I found:

brunogsa commented Jul 28, 2017 • edited Loading

brunogsa commented Jul 28, 2017

timdp commented Jul 29, 2017

deyhle commented Apr 12, 2021

brunogsa commented Jul 26, 2017 •

edited

Loading

brunogsa commented Jul 26, 2017 •

edited

Loading

brunogsa commented Jul 26, 2017 •

edited

Loading

brunogsa commented Jul 27, 2017 •

edited

Loading

brunogsa commented Jul 28, 2017 •

edited

Loading