Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ThrottingException - Rate limit exceeded #69

Open
brunogsa opened this issue Jul 26, 2017 · 8 comments
Open

ThrottingException - Rate limit exceeded #69

brunogsa opened this issue Jul 26, 2017 · 8 comments

Comments

@brunogsa
Copy link
Contributor

brunogsa commented Jul 26, 2017

Hey!
How are you?

Sometimes I retrieve this error when the module tries to send a lot of logs to CloudWatch.
It comes as an "Unhandled Rejection", as we saw in #32, but let's not focus in this subject for now.

In fact, CloudWatch has some limits when we send data to it, as we can see here:
http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/cloudwatch_limits_cwl.html

I would like to know if this module already works in the "best way it can", making batch operations and respecting the 5 requests / second restriction, per log stream.

My concern is to lose logs in this process.
If that happen, I just can't trust in CloudWatch when debugging an issue..

Do you have suggestions, anyway?
It seems that temporally decreasing the log level doesn't ease the overload on CloudWatch.

Thanks in advance!

@timdp
Copy link
Owner

timdp commented Jul 26, 2017

Hi!

The Relay class that the code uses internally has an option for the interval between submissions. Currently, it's set to 2 seconds. There's no way to change it yet but I can add that. However, it wouldn't change the behavior tremendously since 2 seconds is quite long. I think it's more about the log volumes that you're sending, as you've already suggested. I've never hit the limits, personally.

There should really be code that handles these limit errors though. It could be as simple as allowing the error to happen here which should trigger a retry. Of course, without logging a warning, it could just start retrying indefinitely, meaning you'd still lose the logs.

@brunogsa brunogsa reopened this Jul 26, 2017
@brunogsa
Copy link
Contributor Author

brunogsa commented Jul 26, 2017

Sorry, mis-clicked.

@brunogsa
Copy link
Contributor Author

brunogsa commented Jul 26, 2017

Not sure what's wrong...

Tried to change batchSize and submissionInterval directly into my node_modules, but I'm still retrieving this:

Unhandled rejection ThrottlingException: Rate exceeded
    at Request.extractError (/my-project/node_modules/aws-sdk/lib/protocol/json.js:48:27)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/request.js:678:14)
    at Request.transition (/my-project/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/my-project/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /my-project/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:680:12)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:115:18)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/request.js:678:14)
    at Request.transition (/my-project/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/my-project/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /my-project/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:680:12)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:115:18)
    at callNextListener (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:95:12)
    at IncomingMessage.onEnd (/my-project/node_modules/aws-sdk/lib/event_listeners.js:256:13)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
From previous event:
    at Bottleneck.schedulePriority (/my-project/node_modules/bottleneck/lib/Bottleneck.js:248:14)
    at Object.<anonymous> (/my-project/node_modules/bottleneck/lib/Bottleneck.js:4:59)
    at Bottleneck.schedule (/my-project/node_modules/bottleneck/lib/Bottleneck.js:229:36)
    at Relay._scheduleSubmission (/my-project/node_modules/winston-aws-cloudwatch/lib/relay.js:98:21)
    at Relay.submit (/my-project/node_modules/winston-aws-cloudwatch/lib/relay.js:90:12)
    at CloudWatchTransport.log (/my-project/node_modules/winston-aws-cloudwatch/lib/index.js:63:19)
    at transportLog (/my-project/node_modules/winston/lib/winston/logger.js:234:15)
    at /my-project/node_modules/winston/node_modules/async/lib/async.js:157:13
    at _each (/my-project/node_modules/winston/node_modules/async/lib/async.js:57:9)
    at Object.async.each (/my-project/node_modules/winston/node_modules/async/lib/async.js:156:9)
    at Logger.log (/my-project/node_modules/winston/lib/winston/logger.js:246:9)
    at target.(anonymous function) [as info] (/my-project/node_modules/winston/lib/winston/common.js:54:18)
    at Server.<anonymous> (/my-project/lib/api.js:46:14)
    at Server.g (events.js:292:16)
    at emitNone (events.js:86:13)
    at Server.emit (events.js:185:7)
    at emitListeningNT (net.js:1284:10)
    at _combinedTickCallback (internal/process/next_tick.js:77:11)
    at process._tickDomainCallback (internal/process/next_tick.js:128:9)

Unhandled rejection ThrottlingException: Rate exceeded
    at Request.extractError (/my-project/node_modules/aws-sdk/lib/protocol/json.js:48:27)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:105:20)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/request.js:678:14)
    at Request.transition (/my-project/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/my-project/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /my-project/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:680:12)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:115:18)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:77:10)
    at Request.emit (/my-project/node_modules/aws-sdk/lib/request.js:678:14)
    at Request.transition (/my-project/node_modules/aws-sdk/lib/request.js:22:10)
    at AcceptorStateMachine.runTo (/my-project/node_modules/aws-sdk/lib/state_machine.js:14:12)
    at /my-project/node_modules/aws-sdk/lib/state_machine.js:26:10
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:38:9)
    at Request.<anonymous> (/my-project/node_modules/aws-sdk/lib/request.js:680:12)
    at Request.callListeners (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:115:18)
    at callNextListener (/my-project/node_modules/aws-sdk/lib/sequential_executor.js:95:12)
    at IncomingMessage.onEnd (/my-project/node_modules/aws-sdk/lib/event_listeners.js:256:13)
    at emitNone (events.js:91:20)
    at IncomingMessage.emit (events.js:185:7)
From previous event:
    at Bottleneck.schedulePriority (/my-project/node_modules/bottleneck/lib/Bottleneck.js:248:14)
    at Object.<anonymous> (/my-project/node_modules/bottleneck/lib/Bottleneck.js:4:59)
    at Bottleneck.schedule (/my-project/node_modules/bottleneck/lib/Bottleneck.js:229:36)
    at Relay._scheduleSubmission (/my-project/node_modules/winston-aws-cloudwatch/lib/relay.js:98:21)
    at /my-project/node_modules/winston-aws-cloudwatch/lib/relay.js:119:23
    at process._tickDomainCallback (internal/process/next_tick.js:135:7)

If I got it right, I shouldn't have this problem with:

batchSize=1
submissionInterval=5000

Since it would send 1 log every 5 seconds to CloudWatch.

The only problem would be a giant queue, holding the messages before they're properly sent, which would consume a lot of RAM, but would still work.

Do you have any idea or suggestion?

@brunogsa
Copy link
Contributor Author

brunogsa commented Jul 27, 2017

Well, I fixed the problem.

I can open 2 PRs, if that's ok for you:

  • One for making batchSize and submissionInterval configurable
  • One to fix this issue

Just let me know if you're interested on that.

TLDR:

Just add this line to the constructor of the CloudWatchTransport to fix the issue:

maxSequenceTokenAge: 999999

Here's what I found:

Internally, you're calling the method describeLogStreams from the AWS SDK.
The AWS SDK (for Javascript) docs says that this method has a limit of 5 transactions per second.

You can check it here:
http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/CloudWatchLogs.html#describeLogStreams-property

Since my solution has more than 6 servers and all of them uses this module, together they were exceeding that limit. This explains the ThrottlingException.

In your code, you call describeLogStreams to retrieve the sequenceToken, which is required on method putLogEvents, that sends data to CloudWatch.

I saw that you're properly caching sequenceToken:

  • Initially, calling describeLogStreams, then storing the value
  • After that, you update this value using the response from putLogEvents

The problem was in a method you use to check if the sequenceToken can be retrieved from the cache, or if it requires to be fetched again.

It seems like the implementation of your method is always saying to fetch again no matter what.
Here's a snippet of some code. Read the XXX comments for details:

export default class CloudWatchClient {
  constructor (logGroupName, logStreamName, options) {
    debug('constructor', {logGroupName, logStreamName, options})
    this._logGroupName = logGroupName
    this._logStreamName = logStreamName
    this._options = defaults(options, {
      awsConfig: null,

      // XXX: This should be positive value, I think. I used it as a work-around to fix my problem
      maxSequenceTokenAge: -1,

      formatLog: null,
      formatLogItem: null,
      createLogGroup: false,
      createLogStream: false
    })
    this._formatter = new CloudWatchEventFormatter(this._options)
    this._sequenceTokenInfo = null
    this._client = new AWS.CloudWatchLogs(this._options.awsConfig)
    this._initializing = null

    // ...

    // XXX: I think you could just check if we have sequenceToken cached here
    _getSequenceToken () {
      const now = +new Date()

      // XXX: It's always false, apparently
      const isStale = (!this._sequenceTokenInfo ||
        this._sequenceTokenInfo.date + this._options.maxSequenceTokenAge < now)

      return isStale ? this._fetchAndStoreSequenceToken()
        : Promise.resolve(this._sequenceTokenInfo.sequenceToken)
    }

    // ...
}

That's why passing maxSequenceTokenAge to CloudWatchTransport fix the problem (as a work-around).

@brunogsa
Copy link
Contributor Author

brunogsa commented Jul 28, 2017

PR to make batchSize and submissionInterval configurable: #70

@brunogsa
Copy link
Contributor Author

I've also opened a PR that tries to fix this issue.
Could you took a look later?

#71

@timdp
Copy link
Owner

timdp commented Jul 29, 2017

Thanks for the hard work! I haven't had time to look at the PRs yet but they're at the top of my list.

@deyhle
Copy link

deyhle commented Apr 12, 2021

So is this issue fixed in the current version (3.0.0) or why is it still open?

I'm still getting frequent errors "Rate exceeded for logStreamName ". I have rather large logGroups with about 50+ logStreams each. Normally, 2-8 processes share a single logStream.
I'm setting submissionRetryCount to 3 instead of the default 1, other settings are left to defaults.
Does the lib manage the "Rate exceeded" errors and just resubmit (meaning, I can safely ignore them), or should I set other options for my use case? Are there any best practices for submissionInterval, submissionRetryCount and batchSize? Some documentation on those options would be great.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants