You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whne the source experiences high load, it can sometimes fail to keep up and lag. This is unusable, as binlog streamer lag must be zero during cutover.
Thinking carefully about how Ghostferry works shows us why lag may occur. Ghostferry streams the binlog via the BinlogStreamer component. Once a binlog is read, it is put into a channel with a size of BinlogEventBatchSize (set in the config). This channel is read from the BinlogWriter, a separate go routine, with the following pseudo code:
The event is read from the channel and appended into a buffer.
If there are no more events, or if the buffer exceeds BinlogEventBatchSize, then write to the target.
We can note that during step 2 (during writing to MySQL), the BinlogWriter is not emptying from the channel, but the BinlogStreamer continues to fill the channel. If it takes a long time to write to MySQL, then the channel may be full. This means if the channel is full, the BinlogStreamer will be blocked, as the channel effectively serves as a buffer. In a recent move where we had lag, I took a profile via pprof for 5 seconds and graphed the Synchronization blocking profile, which showed that for 4.84s, the BinlogStreamer is blocked.
Math
There is a possible solution to this: if the buffer is larger, then the write (step 1) may finish before the buffer is full. This means the BinlogStreamer will never block. We can analyze this condition mathematically:
Let's say the write time taken is denoted as T.
The BinlogEventBatchSize is denoted as B.
The rate at which binlog event is generated on the source is R.
We can calculate the "optimal" batch size to be B = T * R (or B = TR)
We know the time it takes to write T is a function of the B, as larger batch sizes will take longer to write.
We can assume a simple linear model, where T = mB + o. We can interpret the parameter m to be the (marginal) time needed for writing a single binlog event on the target, and o to be the overhead of a single write operation (think network round trip and other overhead).
Combining the equations from 6 and 4:
B = TR
T = mB + o
B = (mB + o)R
B = mBR + oR
B - mRB = oR
(1 - mR)B = oR
B = oR / (1 - mR)
So the optimal batch size is B = oR / (1 - mR). Unfortunately, we don't have the values to any of these terms. Perhaps in the future Ghostferry can measure these.
However, if MySQL is too slow, Ghostferry cannot keep up at all. This is also shown in the equations. Specifically, if mR >= 1, then B will be a negative number. This is non-physical and thus doesn't work. This "magical expression" is more clear if I give an example. Recall m is the time required to write one event. Suppose m = 0.01 second / event. Recall R is the number of events / second on the source and suppose it is R = 200 events / second. This obviously doesn't work, because the target can only write a maximum of 100 events / second. The generation rate in this example is 200 events / second. mR = 2, which means the optimal batch size doesn't exist.
Recommendations
For cases where you see binlog streamer lag, try to increase the batch size. If that doesn't work, make MySQL write faster.
In the future, ghostferry should compute m, R, and o, to automatically determine "optimal" batch size, and possibly detect cases where it is not possible to keep up.
The text was updated successfully, but these errors were encountered:
Whne the source experiences high load, it can sometimes fail to keep up and lag. This is unusable, as binlog streamer lag must be zero during cutover.
Thinking carefully about how Ghostferry works shows us why lag may occur. Ghostferry streams the binlog via the
BinlogStreamer
component. Once a binlog is read, it is put into a channel with a size ofBinlogEventBatchSize
(set in the config). This channel is read from theBinlogWriter
, a separate go routine, with the following pseudo code:BinlogEventBatchSize
, then write to the target.We can note that during step 2 (during writing to MySQL), the
BinlogWriter
is not emptying from the channel, but theBinlogStreamer
continues to fill the channel. If it takes a long time to write to MySQL, then the channel may be full. This means if the channel is full, theBinlogStreamer
will be blocked, as the channel effectively serves as a buffer. In a recent move where we had lag, I took a profile via pprof for 5 seconds and graphed the Synchronization blocking profile, which showed that for 4.84s, the BinlogStreamer is blocked.Math
There is a possible solution to this: if the buffer is larger, then the write (step 1) may finish before the buffer is full. This means the BinlogStreamer will never block. We can analyze this condition mathematically:
T
.BinlogEventBatchSize
is denoted asB
.R
.B = T * R
(orB = TR
)T
is a function of theB
, as larger batch sizes will take longer to write.T = mB + o
. We can interpret the parameterm
to be the (marginal) time needed for writing a single binlog event on the target, ando
to be the overhead of a single write operation (think network round trip and other overhead).Combining the equations from 6 and 4:
So the optimal batch size is
B = oR / (1 - mR)
. Unfortunately, we don't have the values to any of these terms. Perhaps in the future Ghostferry can measure these.However, if MySQL is too slow, Ghostferry cannot keep up at all. This is also shown in the equations. Specifically, if
mR >= 1
, thenB
will be a negative number. This is non-physical and thus doesn't work. This "magical expression" is more clear if I give an example. Recallm
is the time required to write one event. Supposem = 0.01 second / event
. RecallR
is the number of events / second on the source and suppose it isR = 200 events / second
. This obviously doesn't work, because the target can only write a maximum of 100 events / second. The generation rate in this example is 200 events / second.mR = 2
, which means the optimal batch size doesn't exist.Recommendations
m
,R
, ando
, to automatically determine "optimal" batch size, and possibly detect cases where it is not possible to keep up.The text was updated successfully, but these errors were encountered: