Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements: statistical tracking of success #19

Open
jamiemccarthy opened this issue May 29, 2022 · 1 comment
Open

Enhancements: statistical tracking of success #19

jamiemccarthy opened this issue May 29, 2022 · 1 comment

Comments

@jamiemccarthy
Copy link

Background

Hello! I and two of my (now-former) colleagues at Vox Media, @stephenmckinney and @thomsbg, have added some functionality to breakers which we would like to contribute back to this project. Before submitting a PR, I wanted to check with you whether this work would be considered valuable.

Because Vox Media wanted to use breakers on a high-traffic connection between two of our applications, we wanted to avoid excess traffic to redis, and to avoid declaring an outage for minor glitches. This allowed us to optimize the 99.9% case where both applications are functioning correctly.

We have been running our custom version for about four years now, and it runs so reliably that we are barely even aware of it anymore!

Current vs. desired behavior

The current behavior is:

  • Each successful Faraday request sends an INCR to redis, to track the exact number of successes.
  • The middleware sends a ZRANGE to redis prior to every request, to check whether there is an outage.
  • If traffic is low, a single failure can trigger an outage.
  • When a plugin is notified about an error due to an exception, that exception is not passed along.

The desired behavior (which we've implemented) is to add three configurable parameters p, s, and e:

  • An INCR is sent only p% of the time, but it increments the stored value by 1/p%. So if p == 5%, the client sends an INCR 20 but it randomly sends it 1 time out of 20 successful requests, cutting write traffic by a factor of 20.
  • A ZRANGE check is only made once every s seconds, in any given ruby process. This is most useful when a client can make multiple Faraday requests in rapid succession.
  • A minimum number of errors e must be observed before an outage is reported.

Leaving these options at the default values of 1.0, 0, and 1 respectively preserves the existing behavior. (Vox Media happens to be running them at 0.1, 10, and 100.)

Additionally,

  • The exception that triggered an error, if any, is passed to plugin.on_error.

This has helped us understand the nature of the connection difficulties between our applications.

Usefulness

I believe that these configuration options would be useful to most users of breakers, since they will allow the middleware to scale more efficiently as projects' network traffic increases, and debug failures more quickly.

Authorship and rights

I've obtained approval from the legal department at Vox Media to contribute it into the public domain, per this project's license and the required legal notice. I have gotten approval from my co-contributors @stephenmckinney and @thomsbg.

Let us know!

Please reply to this issue to let us know whether this is a contribution that would be considered helpful, and any thoughts you may have. In particular, if you would find only 1 or 2 of these features appropriate, we don't have to submit all four, or we can separate them into multiple PRs.

Thank you!

@jamiemccarthy
Copy link
Author

Good morning! It's been about a month since submitting this issue, and not hearing any feedback, I'd like to go ahead and submit our work. It may be easier to just see the code. I will plan to submit one PR for the first three behaviors described above, because they are related, and a second smaller one for the fourth behavior. Feedback is still welcome, but if I haven't heard back I'll plan to submit them next week.

cc @stephenmckinney @thomsbg

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant