Experiment: find the best BigTable batch write size when Heroic is under load #707

sming · 2020-10-21T19:58:16Z

This is dependent on #703. Please see that issue for the motivation and context.

Ideally, we'd reproduce one of Heroic's metrics lag episodes with varying batch sizes, in a binary search fashion. We'd measure how good each setting is by seeing how much it helped Heroic during its time of need.

However, back in the Real World ™️:

we cannot replicate the lag episode
We have already deployed changes that target mitigating the metric lag episodes (ILM was reverted) (i.e. we're not comparing apples with apples)
We do not have a Staging environment that's a realistic copy of production

Hence my proposal is to simulate random BigTable write (aka Mutation) time-outs at varying frequencies e.g. 0.01, 0.1, 1.0, 3.0, 10.0, 25.0 %. We could use the http://wiremock.org/ stubbing library to stub out the BigTable API calls cleanly. Note: we should try to get the actual % of Mutation API requests that failed during an episode from Google. @malish8632 - how do I do that, any ideas?

Then we see which batch size performed best overall.
Finally we default DEFAULT_BATCH_SIZE to that number and deploy to production.

sming · 2020-10-21T20:00:57Z

Hello @malish8632 , @hexedpackets , @lmuhlha , what do you think of the proposal above? Crap? Genius? Meh?
Is there a cleaner/easier way of determining the best batch write size?
Is there something significant I've not considered?
Cheers!

sming added the heroic stability label Oct 21, 2020

sming self-assigned this Oct 21, 2020

sming mentioned this issue Oct 24, 2020

Experiment: find the best BigTable batch read size when Heroic is under load #712

Open

sming unassigned sming Oct 26, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment: find the best BigTable batch write size when Heroic is under load #707

Experiment: find the best BigTable batch write size when Heroic is under load #707

sming commented Oct 21, 2020 •

edited

Loading

sming commented Oct 21, 2020

Experiment: find the best BigTable batch write size when Heroic is under load #707

Experiment: find the best BigTable batch write size when Heroic is under load #707

Comments

sming commented Oct 21, 2020 • edited Loading

sming commented Oct 21, 2020

sming commented Oct 21, 2020 •

edited

Loading