Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Vitess misses (RDS) MySQL error 3024 (maximum statement execution time exceeded) #17862

Open
wiebeytec opened this issue Feb 25, 2025 · 0 comments
Labels
Needs Triage This issue needs to be correctly labelled and triaged Type: Bug

Comments

@wiebeytec
Copy link
Contributor

wiebeytec commented Feb 25, 2025

Overview of the Issue

This was on an AWS RDS instance, but unsure if that's a prerequisite.

Problem: when a max_execution_time is configured globally on the source MySQL, Vitess seems to miss Error 3024: Query execution was interrupted, maximum statement execution time exceeded when dumping table. This resulted is the (atomic) copier getting confused. It thought it was done when it wasn't, and then restarted itself from the beginning, running into duplicate key errors. I reported that separately: #17864

Perhaps there are also problems in non-atomic mode. Maybe the copyNext in vreplicator.go will actually also be confused about where it was and resume incorrectly. Regardless, missing reported errors has the potential for many problems, especially in the copy phase of moving tables.

When the query timed out, the source tablet did not register anything in the logs about it.

The source tablet is started with --vreplication_copy_phase_duration 0h5m0s. Not sure if that's related. There is additional unexpected behavior about that in MoveTables, for which I'll create another bug report.

Reproduction Steps

(Note that these example use a custom (patch) flag --vitess-olap to mysqldump. Newer mysqldump versions have an init-str command, or you can achieve something similar with mysql -e 'select * from x'.)

On our instance, max_execution_time was set at 600s (600000ms). When I dump a table, it starts fine, but then it just hangs there after 10 minutes:

# mysqldump -h vitessgate --vitess-olap --no-tablespaces --set-gtid-purged=OFF --lock-tables=off  --set-gtid-purged=OFF legacy_keyspace eventLog | pv --rate-limit 10m | gzip > /tmp/dummy.sql.gz
5,88GiB 0:13:30 [0,00  B/s] [            <=>

The second line is output from pv. It shows '0 B/s' after 600s. In this example, it has been for 3:30.

The query was visible in show processlist on the source, but disappeared right on the 600s mark.

If, however, I run this on the backing DB directly, it does error out:

mysqldump -h rds_db --no-tablespaces --set-gtid-purged=OFF --lock-tables=off --set-gtid-purged=OFF backing_db_name eventLog | pv --rate-limit 5m | gzip > /tmp/dummy.sql.gz
mysqldump: Error 3024: Query execution was interrupted, maximum statement execution time exceeded when dumping table `eventLog` at row: 10785737                                                                                           ]
2,94GiB 0:10:01 [5,00MiB/s] [        <=>

Binary Version

vttablet version Version: 21.0.1 (Git revision 3d4f41db2fbc32611c7d2ea2af3dc68b9d962415 branch 'HEAD') built on Tue Dec  3 05:39:35 UTC 2024 by runner@fv-az2029-313 using go1.23.3 linux/amd64

Operating System and Environment details

DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=24.04
DISTRIB_CODENAME=noble
DISTRIB_DESCRIPTION="Ubuntu 24.04.2 LTS"

Log Fragments

Weirdly none. The tablet didn't log anything at the 10 minute mark.

Slack discussion

https://vitess.slack.com/archives/C0PQY0PTK/p1740042281348869

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Needs Triage This issue needs to be correctly labelled and triaged Type: Bug
Projects
None yet
Development

No branches or pull requests

1 participant