You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This was on an AWS RDS instance, but unsure if that's a prerequisite.
Problem: when a max_execution_time is configured globally on the source MySQL, Vitess seems to miss Error 3024: Query execution was interrupted, maximum statement execution time exceeded when dumping table. This resulted is the (atomic) copier getting confused. It thought it was done when it wasn't, and then restarted itself from the beginning, running into duplicate key errors. I reported that separately: #17864
Perhaps there are also problems in non-atomic mode. Maybe the copyNext in vreplicator.go will actually also be confused about where it was and resume incorrectly. Regardless, missing reported errors has the potential for many problems, especially in the copy phase of moving tables.
When the query timed out, the source tablet did not register anything in the logs about it.
The source tablet is started with --vreplication_copy_phase_duration 0h5m0s. Not sure if that's related. There is additional unexpected behavior about that in MoveTables, for which I'll create another bug report.
Reproduction Steps
(Note that these example use a custom (patch) flag --vitess-olap to mysqldump. Newer mysqldump versions have an init-str command, or you can achieve something similar with mysql -e 'select * from x'.)
On our instance, max_execution_time was set at 600s (600000ms). When I dump a table, it starts fine, but then it just hangs there after 10 minutes:
The second line is output from pv. It shows '0 B/s' after 600s. In this example, it has been for 3:30.
The query was visible in show processlist on the source, but disappeared right on the 600s mark.
If, however, I run this on the backing DB directly, it does error out:
mysqldump -h rds_db --no-tablespaces --set-gtid-purged=OFF --lock-tables=off --set-gtid-purged=OFF backing_db_name eventLog | pv --rate-limit 5m | gzip > /tmp/dummy.sql.gz
mysqldump: Error 3024: Query execution was interrupted, maximum statement execution time exceeded when dumping table `eventLog` at row: 10785737 ]
2,94GiB 0:10:01 [5,00MiB/s] [ <=>
Binary Version
vttablet version Version: 21.0.1 (Git revision 3d4f41db2fbc32611c7d2ea2af3dc68b9d962415 branch 'HEAD') built on Tue Dec 3 05:39:35 UTC 2024 by runner@fv-az2029-313 using go1.23.3 linux/amd64
Overview of the Issue
This was on an AWS RDS instance, but unsure if that's a prerequisite.
Problem: when a
max_execution_time
is configured globally on the source MySQL, Vitess seems to missError 3024: Query execution was interrupted, maximum statement execution time exceeded when dumping table
. This resulted is the (atomic) copier getting confused. It thought it was done when it wasn't, and then restarted itself from the beginning, running into duplicate key errors. I reported that separately: #17864Perhaps there are also problems in non-atomic mode. Maybe the
copyNext
invreplicator.go
will actually also be confused about where it was and resume incorrectly. Regardless, missing reported errors has the potential for many problems, especially in the copy phase of moving tables.When the query timed out, the source tablet did not register anything in the logs about it.
The source tablet is started with
--vreplication_copy_phase_duration 0h5m0s
. Not sure if that's related. There is additional unexpected behavior about that inMoveTables
, for which I'll create another bug report.Reproduction Steps
(Note that these example use a custom (patch) flag
--vitess-olap
tomysqldump
. Newermysqldump
versions have an init-str command, or you can achieve something similar withmysql -e 'select * from x'
.)On our instance,
max_execution_time
was set at 600s (600000ms). When I dump a table, it starts fine, but then it just hangs there after 10 minutes:The second line is output from
pv
. It shows '0 B/s' after 600s. In this example, it has been for 3:30.The query was visible in
show processlist
on the source, but disappeared right on the 600s mark.If, however, I run this on the backing DB directly, it does error out:
Binary Version
vttablet version Version: 21.0.1 (Git revision 3d4f41db2fbc32611c7d2ea2af3dc68b9d962415 branch 'HEAD') built on Tue Dec 3 05:39:35 UTC 2024 by runner@fv-az2029-313 using go1.23.3 linux/amd64
Operating System and Environment details
DISTRIB_ID=Ubuntu DISTRIB_RELEASE=24.04 DISTRIB_CODENAME=noble DISTRIB_DESCRIPTION="Ubuntu 24.04.2 LTS"
Log Fragments
Weirdly none. The tablet didn't log anything at the 10 minute mark.
Slack discussion
https://vitess.slack.com/archives/C0PQY0PTK/p1740042281348869
The text was updated successfully, but these errors were encountered: