You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Mar 27, 2021. It is now read-only.
This is dependent on #703. Please see that issue for the motivation and context.
Ideally, we'd reproduce one of Heroic's metrics lag episodes with varying batch sizes, in a binary search fashion. We'd measure how good each setting is by seeing how much it helped Heroic during its time of need.
However, back in the Real World ™️:
we cannot replicate the lag episode
We have already deployed changes that target mitigating the metric lag episodes (ILM was reverted) (i.e. we're not comparing apples with apples)
We do not have a Staging environment that's a realistic copy of production
Hence my proposal is to simulate random BigTable write (aka Mutation) time-outs at varying frequencies e.g. 0.01, 0.1, 1.0, 3.0, 10.0, 25.0 %. We could use the http://wiremock.org/ stubbing library to stub out the BigTable API calls cleanly. Note: we should try to get the actual % of Mutation API requests that failed during an episode from Google. @malish8632 - how do I do that, any ideas?
Then we see which batch size performed best overall.
Finally we default DEFAULT_BATCH_SIZE to that number and deploy to production.
The text was updated successfully, but these errors were encountered:
Hello @malish8632 , @hexedpackets , @lmuhlha , what do you think of the proposal above? Crap? Genius? Meh?
Is there a cleaner/easier way of determining the best batch write size?
Is there something significant I've not considered?
Cheers!
This is dependent on #703. Please see that issue for the motivation and context.
Ideally, we'd reproduce one of Heroic's metrics lag episodes with varying batch sizes, in a binary search fashion. We'd measure how good each setting is by seeing how much it helped Heroic during its time of need.
However, back in the Real World ™️:
Hence my proposal is to simulate random BigTable write (aka Mutation) time-outs at varying frequencies e.g. 0.01, 0.1, 1.0, 3.0, 10.0, 25.0 %. We could use the http://wiremock.org/ stubbing library to stub out the BigTable API calls cleanly. Note: we should try to get the actual % of Mutation API requests that failed during an episode from Google. @malish8632 - how do I do that, any ideas?
Then we see which batch size performed best overall.
Finally we default DEFAULT_BATCH_SIZE to that number and deploy to production.
The text was updated successfully, but these errors were encountered: