Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dynamic repl applier #282

Merged
merged 3 commits into from
Apr 2, 2024
Merged

Conversation

morgo
Copy link
Collaborator

@morgo morgo commented Apr 2, 2024

A Pull Request should be associated with an Issue.

Fixes #281

I disabled key above watermark to test it:

go run main.go --database finch --table customers --target-chunk-time=100ms
INFO[0000] Starting spirit migration: concurrency=4 target-chunk-size=100ms table=finch.customers alter="engine=innodb"
INFO[0000] unable to use INPLACE: ALTER either does not support INPLACE or when performed as INPLACE could take considerable time. Use --force-inplace to override this safety check
INFO[0000] create BinlogSyncer with config {ServerID:1643 Flavor:mysql Host:127.0.0.1 Port:3306 User:msandbox Password: Localhost: Charset:utf8 SemiSyncEnabled:false RawModeEnabled:false TLSConfig:<nil> ParseTime:false TimestampStringLocation:UTC UseDecimal:false RecvBufferSize:0 HeartbeatPeriod:0s ReadTimeout:0s MaxReconnectAttempts:0 DisableRetrySync:false VerifyChecksum:false DumpCommandFlag:0 Option:<nil> Logger:0x14000312990 Dialer:0x1025ca960 RowsEventDecodeFunc:0x1027d7010 TableMapOptionalMetaDecodeFunc:<nil> DiscardGTIDSet:false EventCacheCount:10240}
WARN[0000] resuming from checkpoint. low-watermark: {"Key":["id"],"ChunkSize":21568,"LowerBound":{"Value": ["56844555"],"Inclusive":true},"UpperBound":{"Value": ["56866123"],"Inclusive":false}} log-file: binlog.000221 log-pos: 31151408 copy-rows: 56866069
INFO[0000] skip dump, use last binlog replication pos (binlog.000221, 31151408) or GTID set <nil>
INFO[0000] begin to sync binlog from position (binlog.000221, 31151408)
INFO[0000] start sync binlog at binlog file (binlog.000221, 31151408)
INFO[0030] migration status: state=copyRows copy-progress=66589446/80656478 82.56% binlog-deltas=173886 total-time=30s copier-time=30s copier-remaining-time=TBD copier-is-throttled=false conns-in-use=4
INFO[0033] finished periodic flush of binary log: total-duration=3.600518417s batch-size=684
INFO[0050] checkpoint: low-watermark={"Key":["id"],"ChunkSize":2994,"LowerBound":{"Value": ["70114443"],"Inclusive":true},"UpperBound":{"Value": ["70117437"],"Inclusive":false}} log-file=binlog.000223 log-pos=278165976 rows-copied=66823574 rows-copied-logical=70140004
INFO[0060] migration status: state=copyRows copy-progress=71713054/80656478 88.91% binlog-deltas=163991 total-time=1m0s copier-time=1m0s copier-remaining-time=56s copier-is-throttled=false conns-in-use=4
INFO[0063] finished periodic flush of binary log: total-duration=3.443187041s batch-size=2209
INFO[0090] migration status: state=copyRows copy-progress=76778090/80656478 95.19% binlog-deltas=162733 total-time=1m30s copier-time=1m30s copier-remaining-time=19s copier-is-throttled=false conns-in-use=4
INFO[0092] finished periodic flush of binary log: total-duration=2.290715542s batch-size=2739
INFO[0100] checkpoint: low-watermark={"Key":["id"],"ChunkSize":2362,"LowerBound":{"Value": ["78527913"],"Inclusive":true},"UpperBound":{"Value": ["78530275"],"Inclusive":false}} log-file=binlog.000225 log-pos=337667856 rows-copied=75236396 rows-copied-logical=78552842
INFO[0106] approaching the end of the table, synchronously updating statistics
INFO[0106] copy rows complete
INFO[0106] starting to flush changeset
INFO[0109] blocking until we have read all binary logs: current-pos=(binlog.000225, 1029254347) target-pos=(binlog.000225, 1029202410)
INFO[0110] blocking until we have read all binary logs: current-pos=(binlog.000225, 1034846849) target-pos=(binlog.000225, 1034831416)
INFO[0110] Running ANALYZE TABLE
INFO[0110] starting to flush changeset
INFO[0110] blocking until we have read all binary logs: current-pos=(binlog.000225, 1035296997) target-pos=(binlog.000225, 1035248102)
INFO[0110] starting checksum operation, this will require a table lock
WARN[0110] trying to acquire table lock, timeout: 30
WARN[0110] table lock acquired
WARN[0110] table lock released
INFO[0110] table unlocked, starting checksum
INFO[0120] migration status: state=checksum checksum-progress=10028571/81163794 binlog-deltas=66304 total-time=2m0s checksum-time=10s conns-in-use=4
INFO[0120] finished periodic flush of binary log: total-duration=586.939083ms batch-size=3584
INFO[0141] finished periodic flush of binary log: total-duration=1.059294292s batch-size=3116
INFO[0150] checkpoint: low-watermark={"Key":["id"],"ChunkSize":32965,"LowerBound":{"Value": ["81135109"],"Inclusive":true},"UpperBound":{"Value": ["81168074"],"Inclusive":false}} log-file=binlog.000226 log-pos=78060617 rows-copied=76354568 rows-copied-logical=81223606
INFO[0150] migration status: state=checksum checksum-progress=43509467/81163794 binlog-deltas=79482 total-time=2m30s checksum-time=40s conns-in-use=4
INFO[0150] finished periodic flush of binary log: total-duration=685.09925ms batch-size=3457
INFO[0171] finished periodic flush of binary log: total-duration=1.299912542s batch-size=3171
INFO[0180] migration status: state=checksum checksum-progress=80563765/81163794 binlog-deltas=70261 total-time=3m0s checksum-time=1m10s conns-in-use=4
INFO[0180] approaching the end of the table, synchronously updating statistics
INFO[0180] finished periodic flush of binary log: total-duration=663.692625ms batch-size=3089
INFO[0180] checksum passed
INFO[0180] starting to flush changeset
INFO[0180] blocking until we have read all binary logs: current-pos=(binlog.000226, 324443372) target-pos=(binlog.000226, 324406138)
INFO[0180] starting to flush changeset
INFO[0180] blocking until we have read all binary logs: current-pos=(binlog.000226, 324558825) target-pos=(binlog.000226, 324548969)
WARN[0180] Attempting final cut over operation (attempt 1/5)
WARN[0180] trying to acquire table lock, timeout: 30
WARN[0180] table lock acquired
WARN[0180] table lock released
WARN[0180] final cut over operation complete
INFO[0189] successfully dropped old table
INFO[0189] apply complete: instant-ddl=false inplace-ddl=false total-chunks=5172 copy-rows-time=1m47s checksum-time=1m11s total-time=3m9s conns-in-use=0
INFO[0189] closing canal
INFO[0189] syncer is closing...
INFO[0189] kill last connection id 512
INFO[0189] syncer is closed

If the target-chunk-time is changed to 2s:

go run main.go --database finch --table customers --target-chunk-time=2s
INFO[0000] Starting spirit migration: concurrency=4 target-chunk-size=2s table=finch.customers alter="engine=innodb"
INFO[0000] unable to use INPLACE: ALTER either does not support INPLACE or when performed as INPLACE could take considerable time. Use --force-inplace to override this safety check
INFO[0000] could not resume from checkpoint: reason=could not read from table '_customers_new'
INFO[0000] create BinlogSyncer with config {ServerID:1057 Flavor:mysql Host:127.0.0.1 Port:3306 User:msandbox Password: Localhost: Charset:utf8 SemiSyncEnabled:false RawModeEnabled:false TLSConfig:<nil> ParseTime:false TimestampStringLocation:UTC UseDecimal:false RecvBufferSize:0 HeartbeatPeriod:0s ReadTimeout:0s MaxReconnectAttempts:0 DisableRetrySync:false VerifyChecksum:false DumpCommandFlag:0 Option:<nil> Logger:0x14000316720 Dialer:0x100972960 RowsEventDecodeFunc:0x100b7f010 TableMapOptionalMetaDecodeFunc:<nil> DiscardGTIDSet:false EventCacheCount:10240}
INFO[0000] skip dump, use last binlog replication pos (binlog.000227, 225862138) or GTID set <nil>
INFO[0000] begin to sync binlog from position (binlog.000227, 225862138)
INFO[0000] start sync binlog at binlog file (binlog.000227, 225862138)
INFO[0030] migration status: state=copyRows copy-progress=7304258/83636461 8.73% binlog-deltas=132706 total-time=30s copier-time=30s copier-remaining-time=TBD copier-is-throttled=false conns-in-use=4
INFO[0033] finished periodic flush of binary log: total-duration=3.88942775s batch-size=16986
INFO[0050] checkpoint: low-watermark={"Key":["id"],"ChunkSize":100000,"LowerBound":{"Value": ["11852255"],"Inclusive":true},"UpperBound":{"Value": ["11952255"],"Inclusive":false}} log-file=binlog.000228 log-pos=627919167 rows-copied=11952242 rows-copied-logical=11953254
INFO[0060] migration status: state=copyRows copy-progress=14253254/83636461 17.04% binlog-deltas=137417 total-time=1m0s copier-time=1m0s copier-remaining-time=5m1s copier-is-throttled=false conns-in-use=4
INFO[0062] finished periodic flush of binary log: total-duration=2.647773334s batch-size=18560
INFO[0090] migration status: state=copyRows copy-progress=21304292/83636461 25.47% binlog-deltas=131884 total-time=1m30s copier-time=1m30s copier-remaining-time=3m59s copier-is-throttled=false conns-in-use=4
INFO[0092] finished periodic flush of binary log: total-duration=2.568579167s batch-size=44482
INFO[0100] checkpoint: low-watermark={"Key":["id"],"ChunkSize":93516,"LowerBound":{"Value": ["23519001"],"Inclusive":true},"UpperBound":{"Value": ["23612517"],"Inclusive":false}} log-file=binlog.000231 log-pos=284660106 rows-copied=23612504 rows-copied-logical=23613516
INFO[0120] migration status: state=copyRows copy-progress=28513516/83636461 34.09% binlog-deltas=140048 total-time=2m0s copier-time=2m0s copier-remaining-time=3m49s copier-is-throttled=false conns-in-use=4
INFO[0122] finished periodic flush of binary log: total-duration=2.666008459s batch-size=44482
INFO[0150] checkpoint: low-watermark={"Key":["id"],"ChunkSize":96778,"LowerBound":{"Value": ["35619111"],"Inclusive":true},"UpperBound":{"Value": ["35715889"],"Inclusive":false}} log-file=binlog.000232 log-pos=693201522 rows-copied=35715876 rows-copied-logical=35716888
INFO[0150] migration status: state=copyRows copy-progress=35716888/83636461 42.70% binlog-deltas=143859 total-time=2m30s copier-time=2m30s copier-remaining-time=3m15s copier-is-throttled=false conns-in-use=4
INFO[0153] finished periodic flush of binary log: total-duration=3.055304291s batch-size=44482
INFO[0180] migration status: state=copyRows copy-progress=43101162/83636461 51.53% binlog-deltas=144307 total-time=3m0s copier-time=3m0s copier-remaining-time=2m42s copier-is-throttled=false conns-in-use=4
INFO[0182] finished periodic flush of binary log: total-duration=2.120207417s batch-size=35792
INFO[0200] checkpoint: low-watermark={"Key":["id"],"ChunkSize":100000,"LowerBound":{"Value": ["47700163"],"Inclusive":true},"UpperBound":{"Value": ["47800163"],"Inclusive":false}} log-file=binlog.000235 log-pos=468460007 rows-copied=47800123 rows-copied-logical=47801162
INFO[0210] migration status: state=copyRows copy-progress=50301162/83636461 60.14% binlog-deltas=145562 total-time=3m30s copier-time=3m30s copier-remaining-time=2m13s copier-is-throttled=false conns-in-use=4
INFO[0212] finished periodic flush of binary log: total-duration=2.474144s batch-size=35792
INFO[0240] migration status: state=copyRows copy-progress=57293344/83636461 68.50% binlog-deltas=139396 total-time=4m0s copier-time=4m0s copier-remaining-time=1m49s copier-is-throttled=false conns-in-use=4
INFO[0242] finished periodic flush of binary log: total-duration=2.389713709s batch-size=52375
INFO[0250] checkpoint: low-watermark={"Key":["id"],"ChunkSize":92458,"LowerBound":{"Value": ["59616925"],"Inclusive":true},"UpperBound":{"Value": ["59709383"],"Inclusive":false}} log-file=binlog.000238 log-pos=164185710 rows-copied=59709329 rows-copied-logical=59710382

So you can see from this log line the batch size is changing:

INFO[0150] finished periodic flush of binary log: total-duration=685.09925ms batch-size=3457 -- first example
INFO[0033] finished periodic flush of binary log: total-duration=3.88942775s batch-size=16986 -- second

It currently only dynamically the cases where we are parallel writing the changes (i.e. not during the cutover flush under lock, and not when we have to use the queue instead of the map). These other cases could be fixed in future, but it would require more code reorganization.

pkg/repl/client.go Outdated Show resolved Hide resolved
pkg/repl/client.go Outdated Show resolved Hide resolved
BatchSize: repl.DefaultBatchSize,
Logger: r.logger,
Concurrency: r.migration.Threads,
TargetBatchTime: r.migration.TargetChunkTime,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, using the same targetChunkTime config for both copier and flusher

@morgo morgo enabled auto-merge April 2, 2024 22:35
@morgo morgo merged commit 22bfae7 into main Apr 2, 2024
9 checks passed
@morgo morgo deleted the mtocker-implement-dynamic-applier-sizing branch April 3, 2024 13:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support dynamic batch resizing in applier
2 participants