Skip to content

Commit

Permalink
Introduce timeout safeguard for migration sql queries (#3633)
Browse files Browse the repository at this point in the history
The mysql statement timeout just works for selects. Additionally
mysql does not support transactional rollbacks of DDL statements.
Thus there is a risk to exit in an unclean state which might cause
CC runtime errors and addtionally the timeout on selects only is not
worth much functionally.

Add max_migration_statement_runtime_in_seconds into default config

For consitency since other optional parameters are also
defined there explicitly.
  • Loading branch information
FloThinksPi authored Apr 9, 2024
1 parent fe1cf1c commit 18a2fda
Show file tree
Hide file tree
Showing 6 changed files with 34 additions and 9 deletions.
1 change: 1 addition & 0 deletions config/cloud_controller.yml
Original file line number Diff line number Diff line change
Expand Up @@ -322,6 +322,7 @@ newrelic_enabled: false
max_annotations_per_resource: 200
max_labels_per_resource: 50
max_migration_duration_in_minutes: 45
max_migration_statement_runtime_in_seconds: 30
db:
log_level: 'debug'
ssl_verify_hostname: false
Expand Down
1 change: 1 addition & 0 deletions lib/cloud_controller/config_schemas/base/api_schema.rb
Original file line number Diff line number Diff line change
Expand Up @@ -86,6 +86,7 @@ class ApiSchema < VCAP::Config
newrelic_enabled: bool,

optional(:max_migration_duration_in_minutes) => Integer,
optional(:max_migration_statement_runtime_in_seconds) => Integer,
db: {
optional(:database) => Hash, # db connection hash for sequel
max_connections: Integer, # max connections in the connection pool
Expand Down
1 change: 1 addition & 0 deletions lib/cloud_controller/config_schemas/base/migrate_schema.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ class MigrateSchema < VCAP::Config
define_schema do
{
optional(:max_migration_duration_in_minutes) => Integer,
optional(:max_migration_statement_runtime_in_seconds) => Integer,

db: {
optional(:database) => Hash, # db connection hash for sequel
Expand Down
14 changes: 10 additions & 4 deletions lib/cloud_controller/db_migrator.rb
Original file line number Diff line number Diff line change
Expand Up @@ -7,12 +7,20 @@ class DBMigrator
def self.from_config(config, db_logger)
VCAP::CloudController::Encryptor.db_encryption_key = config.get(:db_encryption_key)
db = VCAP::CloudController::DB.connect(config.get(:db), db_logger)
new(db, config.get(:max_migration_duration_in_minutes))
new(db, config.get(:max_migration_duration_in_minutes), config.get(:max_migration_statement_runtime_in_seconds))
end

def initialize(db, max_migration_duration_in_minutes=nil)
def initialize(db, max_migration_duration_in_minutes=nil, max_migration_statement_runtime_in_seconds=nil)
@db = db
@timeout_in_minutes = default_two_weeks(max_migration_duration_in_minutes)

@max_statement_runtime_in_milliseconds = if max_migration_statement_runtime_in_seconds.nil? || max_migration_statement_runtime_in_seconds <= 0
30_000
else
max_migration_statement_runtime_in_seconds * 1000
end

@db.run("SET statement_timeout TO #{@max_statement_runtime_in_milliseconds}") if @db.database_type == :postgres
end

def apply_migrations(opts={})
Expand All @@ -32,12 +40,10 @@ def wait_for_migrations!
logger = Steno.logger('cc.db.wait_until_current')

logger.info('waiting indefinitely for database schema to be current') unless db_is_current_or_newer_than_local_migrations?

timeout_message = 'ccdb.max_migration_duration_in_minutes exceeded'
Timeout.timeout(@timeout_in_minutes * 60, message: timeout_message) do
sleep(1) until db_is_current_or_newer_than_local_migrations?
end

logger.info('database schema is as new or newer than locally available migrations')
end

Expand Down
9 changes: 5 additions & 4 deletions spec/migrations/Readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Sequel migrations modify the database schema. To write high-quality migrations,
At present, the Cloud Controller (CC) supports both Postgres and MySQL in various versions. Since these databases behave differently and offer different feature sets, it's necessary to reflect these differences in the migrations. Failing to do so may result in differing schemas for MySQL and Postgres after the same operation.

1. Postgres supports transactional DDL (Data Definition Language) statements but MySQL does not. This means when a failure occurs and a transaction is rolled back, only Postgres will correctly roll back the altered changes. MySQL commits every DDL statement immediately, which makes it impossible to roll back any schema change.
1. While Postgres automatically resets table locks on rollback or commit of a transaction, MySQL does not. If a migration that locks tables doesn't unlock them as part of error handling, MySQL can permanently block a table until it is manually unlocked again. Moreover, Sequel doesn't recognize how to lock tables, necessitating use of raw SQL queries, with differing syntaxes for both DBs.
1. While Postgres automatically resets table locks on rollback or commit of a transaction, MySQL does not. If a migration that locks tables doesn't unlock them as part of error handling, MySQL can permanently block a table until it is manually unlocked again. Moreover, Sequel doesn't recognize how to lock tables, necessitating use of raw SQL queries, with differing syntax for both DBs.
1. If a MySQL table is locked and sub-queries are used, those sub-queries (alias them) must be named and locked, even if they don't exist at the time of locking.
1. When dropping columns, Postgres automatically drops all related indices and constraints entirely while MySQL only removes the affected columns from the index/constraint.
1. When renaming columns, both MySQL and Postgres auto-rename columns in indices. However, while Postgres cascades the rename into views, MySQL doesn't and the view still references the old column name, thus breaking it. Views thus must generally be dropped and recreated after renaming the column. Postgres just cascades a rename properly.
Expand All @@ -24,8 +24,8 @@ To create resilient and reliable migrations, follow these guidelines:

1. Do not use any Cloud Controller code in a migration. Changes in the model might modify old migrations and change their behavior and the behavior of their tests. Use raw Sequel operations instead, as you're aware of the exact database schema at that migration time. For instance, if you use a model in a migration and a later migration changes that model's table name, the initial migration would fail because the table does not yet exist when the migration runs and uses the model from the CC's codebase.
1. Aim to write separate [up and down](https://sequel.jeremyevans.net/rdoc/files/doc/migration_rdoc.html#top) Sequel migrations. The `change do` command automatically generates the down part but occasionally behaves unpredictably and isn't always able to compute the opposite of what the migration accomplished. By defining `up do` and `down do` separately, you'll have full control over the down part.
1. Opt for a larger number of smaller transactions in migrations rather than larger transactions. If a single change fails, the entire migration must be manually rolled back. Atomic migrations are easier to roll back. Each migration runs in a database synchronisation call.
1. Write idempotent migrations. Even though a row in the database prevents a migration from running twice, you can't be sure the database remains unchanged after an error in the migration since schema changes can't always be transactionally rolled back e.g. on MySQL. Make your migrations rerunnable. This often isn't the case when using Sequel functions with default values. For instance, [drop_index](https://www.rubydoc.info/github/jeremyevans/sequel/Sequel%2FSchema%2FAlterTableGenerator:drop_index) by default throws an error if an index you want to drop has already been dropped. In a migration, you'll want to pass the optional parameter `if_exists: true` so the statement is idempotent and if a migration fails in the middle, it can be rerun. Sometimes you will also find different functions that offer idempotency. Like `create_table` (just create table), `create_table!` (drop in any case and create table) and `create_table?` (create table if not already exists) so also look out for such options. In case the needed Sequel function does not offer a parameter/function for idempotency, one must test against the schema and just run statements conditionally. E.g. check if an index already exists with `... if self.indexes(:my_table)[:my_index].nil?` before creating it.
1. Opt for a larger number of smaller transactions in migrations rather than larger transactions. If a single change fails, the entire migration must be manually rolled back. Atomic migrations are easier to roll back. Each migration runs in a database synchronization call.
1. Write idempotent migrations. Even though a row in the database prevents a migration from running twice, you can't be sure the database remains unchanged after an error in the migration since schema changes can't always be transitionally rolled back e.g. on MySQL. Make your migrations runnable. This often isn't the case when using Sequel functions with default values. For instance, [drop_index](https://www.rubydoc.info/github/jeremyevans/sequel/Sequel%2FSchema%2FAlterTableGenerator:drop_index) by default throws an error if an index you want to drop has already been dropped. In a migration, you'll want to pass the optional parameter `if_exists: true` so the statement is idempotent and if a migration fails in the middle, it can be rerun. Sometimes you will also find different functions that offer idempotency. Like `create_table` (just create table), `create_table!` (drop in any case and create table) and `create_table?` (create table if not already exists) so also look out for such options. In case the needed Sequel function does not offer a parameter/function for idempotency, one must test against the schema and just run statements conditionally. E.g. check if an index already exists with `... if self.indexes(:my_table)[:my_index].nil?` before creating it.
1. During migrations that execute DML (Data Manipulation Language) statements, consider locking the table while altering data. For instance, if you're removing duplicates and setting a unique constraint, you must prevent new inserts while the migration runs. Otherwise, setting the unique constraint later could fail. Note that Sequel doesn't offer built-in functions for table locks, so you'll need to use raw SQL queries that vary depending on the Database backend they run on.
1. Separate data-changing migrations from schema-changing DDL statements to leverage at least the database rollback function for DML Statements. Explicitly wrap data changes in a transaction that doesn't include DDL statements. Avoid including DDL in that transaction otherwise MySQL will autocommit data changes and a rollback is impossible.
1. Try to handle DML (Data Manipulation Language) statements entirely within the DB, avoiding any looping over returned data in the migration process. This helps to reduce the time a table is locked. For example, instead of finding all duplicates with a select and then iterating over the results in Ruby and deleting the rows, use a subquery to let the DB perform the heavy lifting. Since these operations should be performed with a table lock to prevent new duplicates, it is advantageous to minimize runtime as much as possible. Involving Ruby and dealing with large tables could increase runtimes to several minutes, leading to locked API requests and potential unavailability.
Expand Down Expand Up @@ -67,7 +67,8 @@ To create resilient and reliable migrations, follow these guidelines:
end
```
1. If you're writing a uniqueness constraint where some of the values can be null, remember that `null != null`. For instance, the values `[1, 1, null]` and `[1, 1, null]` are considered unique. Uniqueness constraints only work on columns that do not allow `NULL` as a value. If this is the case, change the column to disallow `NULL` and set the default to an empty string instead.
1. If you need to execute different operations for MySQL and Postgres, you can check the database type as follows: `... if database_type == :postgres` or `... if database_type == :mysql`.
1. If you need to execute different operations for MySQL and Postgres, you can check the database type as follows: `... if database_type == :postgres` or `... if database_type == :mysql`. If the differences are too big, consider writing separate migrations for each database type.
1. Be sure that with real world table sizes and load each sql query will finish in reasonable time inside your migration. **There is a hard limit of 30s in place to protect against outages caused by long-running migrations**. If you need to run a long-running migration, consider breaking it up into smaller parts. If you make use of table locking, be sure to run any query with real world table sizes sub 2 seconds to not cause issues due to table locks and waiting queries. In case a single statement exceeds 30s(default), the migration is aborted. An operator can overwrite this behaviour by setting the `max_migration_statement_runtime_in_seconds` config property.

# Sequel Migration Tests

Expand Down
17 changes: 16 additions & 1 deletion spec/unit/lib/cloud_controller/db_migrator_spec.rb
Original file line number Diff line number Diff line change
@@ -1,9 +1,10 @@
require 'spec_helper'

RSpec.describe DBMigrator do
let(:db) { Sequel::Model.db }

describe '#wait_for_migrations!' do
let(:migrator) { DBMigrator.new(db, 1) }
let(:db) { double(:db) }

it 'blocks until migrations are current or newer' do
expect(Sequel::Migrator).to receive(:is_current?).with(
Expand Down Expand Up @@ -47,4 +48,18 @@
end.to raise_error(UncaughtThrowError)
end
end

describe 'postgresql' do
it 'sets a default statement timeout' do
skip if db.database_type != :postgres
expect(db).to receive(:run).with('SET statement_timeout TO 30000')
DBMigrator.new(db)
end

it 'sets a config provided statement timeout' do
skip if db.database_type != :postgres
expect(db).to receive(:run).with('SET statement_timeout TO 60000')
DBMigrator.new(db, nil, 60)
end
end
end

0 comments on commit 18a2fda

Please sign in to comment.