Skip to content

Commit

Permalink
Fix "more than two versions is not allowed" (#75)
Browse files Browse the repository at this point in the history
This error was reported from "make bench" and is caused by concurrent
transactions attempting to modify the same record. This is disallowed
because it would break the serialization rules of a transaction.

Normally (in a server environment) it might block the second transaction
until the in-flight row becomes frozen (returns to a single version) -
then be able to proceed (only in some isolation modes). However, this is
not as trivial in a share-nothing architecture (as least not right now).
So, a proper (new) SQLSTATE 40001 "serialization failure" is returned
instead. Clients that receive this must restart the transaction again.

Better reporting of error is not the bug. The actual bug is why the
error is raised in the first place, since "make bench" runs sequential
transactions that should never happen. This was caused by UPDATE
(which was a DELETE/INSERT) underneath not being able to handle the case
where the record to be updated is the in-flight version and not the
frozen version. So, in a nutshell, UPDATE was in some cases just trying to
always create a new version.

This also fixes another related bug where implicit transactions were not
revisiting pages to perform the neeccesary cleanup (like that which
happens on a COMMIT/ROLLBACK) also potentially leaving expired versions
behind.

Furthemore, a SQLSTATE 25P02 "in failed sql transaction" is returned for
all statements after an error is returned during a transaction, until a
COMMIT or ROLLBACK is issued also triggering the correct cleanup.

Fixes #74
  • Loading branch information
elliotchance authored Nov 1, 2021
1 parent 27b0e2d commit 51d1eea
Show file tree
Hide file tree
Showing 15 changed files with 447 additions and 87 deletions.
34 changes: 19 additions & 15 deletions docs/benchmark.rst
Original file line number Diff line number Diff line change
Expand Up @@ -29,13 +29,15 @@ And executes the following:
4. Create a ``HISTORY`` table that is empty.
5. Will run as many transactions as it can within 60 seconds.

A transaction consists of 5 statements (in order):
A transaction consists of 7 statements (in order):

1. ``UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid``
2. ``SELECT abalance FROM accounts WHERE aid = :aid``
3. ``UPDATE tellers SET tbalance = tbalance + $delta WHERE tid = :tid``
4. ``UPDATE branches SET bbalance = bbalance + $delta WHERE bid = :bid``
5. ``INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP)``
1. ``START TRANSATION``
2. ``UPDATE accounts SET abalance = abalance + :delta WHERE aid = :aid``
3. ``SELECT abalance FROM accounts WHERE aid = :aid``
4. ``UPDATE tellers SET tbalance = tbalance + $delta WHERE tid = :tid``
5. ``UPDATE branches SET bbalance = bbalance + $delta WHERE bid = :bid``
6. ``INSERT INTO history (tid, bid, aid, delta, mtime) VALUES (:tid, :bid, :aid, :delta, CURRENT_TIMESTAMP)``
7. ``COMMIT``

Where ``:aid``, ``:tid`` and ``:bid`` are random integers that are within the
their respective ranges and ``:delta`` is a random value between -5000 and 5000.
Expand All @@ -57,19 +59,14 @@ Where:
Notes
-----

1. It may look initially suspicious that the ``INSERT`` and ``SELECT`` speeds
seem reasonable but ``TCP-B (sort of)`` is extremely slow. This is because vsql
does not have any indexes yet, so each transaction is effectievly reading all
rows 4 times. This will be substantially improved in the future.

2. ``CURRENT_TIMESTAMP`` is not yet supported, so a V generated timestamp as
1. ``CURRENT_TIMESTAMP`` is not yet supported, so a V generated timestamp as
``VARCHAR`` is used instead.

3. ``INSERT`` only inserts one row per statement (rather than bulk inserts with
2. ``INSERT`` only inserts one row per statement (rather than bulk inserts with
a single statement). Bulk inserts are not yet supported, although when they are
this mechanic will probably not change.

4. The ``bench`` command will create a file called ``bench.vsql`` for the test.
3. The ``bench`` command will create a file called ``bench.vsql`` for the test.
If the file already exists the command will fail as it tries to create tables
that already exist. You must delete ``bench.vsql`` between test runs.

Expand All @@ -82,13 +79,16 @@ These were run on:
- 2.3 GHz Quad-Core Intel Core i7
- 16 GB 1600 MHz DDR3

**INSERT** and **SELECT** are in rows per second and **TCP-B** is in transactions per second.
**INSERT** and **SELECT** are in rows per second and **TCP-B** is in
transactions per second.

+------------+---------+-------------------------+-------------------------+-------+
| | | On-disk | In-memory | |
| Date | Version +--------+--------+-------+--------+--------+-------+ Notes |
| | | INSERT | SELECT | TCP-B | INSERT | SELECT | TCP-B | |
+============+=========+========+========+=======+========+========+=======+=======+
| 2021-10-31 | v0.16.1 | 891 | 68263 | 109 | 878 | 68368 | 106 | [6]_ |
+------------+---------+--------+--------+-------+--------+--------+-------+-------+
| 2021-10-04 | v0.14.2 | 974 | 65775 | 97 | 939 | 64267 | 97 | [5]_ |
+------------+---------+--------+--------+-------+--------+--------+-------+-------+
| 2021-09-19 | v0.14.0 | 995 | 61782 | 94 | 992 | 62253 | 91 | [4]_ |
Expand All @@ -100,6 +100,10 @@ These were run on:
| 2021-09-04 | v0.11.0 | 5107 | 129252 | 0.378 | | | | [1]_ |
+------------+---------+--------+--------+-------+--------+--------+-------+-------+

.. [6] v0.15.0 introduced transactions, but this wasn't stabalised until
v0.16.1. The benchmark now runs true transactions (which is technically two
more SQL statements than before).
.. [5] The recent two patches focused on fixes to do with concurrent read/write
to the same file. This is critical general reliability and for transactions
(coming soon). Fortunately, the added locking mechanics did not have any
Expand Down
28 changes: 28 additions & 0 deletions docs/sqlstate.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,24 @@ already active transaction.
START TRANSACTION;
-- error 25001: invalid transaction state: active sql transaction
``25P02`` in failed sql transaction
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``25P02`` will be returned for all commands within a transaction after a failure
of a previous SQL statement. You must ``COMMIT`` or ``ROLLBACK``, however,
``COMMIT`` will be treated as a ``ROLLBACK``.

**Examples**

.. code-block:: sql
CREATE TABLE foo (b BOOLEAN);
INSERT INTO foo (b) VALUES (123, 456);
SELECT * FROM foo;
-- msg: CREATE TABLE 1
-- error 42601: syntax error: INSERT has more values than columns
-- error 25P02: transaction is aborted, commands ignored until end of transaction block
``2D000`` invalid transaction termination
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -110,6 +128,16 @@ in an active transaction.
COMMIT;
-- error 2D000: invalid transaction termination
``40001`` serialization failure
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

``40001`` occurs if concurrent transactions attempt to update the same row. If
allowed, this would lead to an inconsistency. It's possible that this also might
be a deadlock in some situations. However, the deadlock is always avoided
because the current transaction that receives this error will be rolled back.

A client that receives this error should retry the transaction.

``42601`` syntax error
^^^^^^^^^^^^^^^^^^^^^^

Expand Down
13 changes: 13 additions & 0 deletions docs/update.rst
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,16 @@ EXPLAIN

The query planner will decide the best strategy to execute the ``UPDATE``. You
can see this plan by using the ``EXPLAIN`` prefix. See :doc:`explain`.

Errors
------

Only one transaction is allowed to hold a modified version of a record at one
time. This allows all other transactions to see the previous (frozen) version
and the in-flight transaction to see it's own version.

It's not possible for multiple in-flight transactions to hold different versions
of the same semantic row. This would break the serialization rules of the
transaction since this would cause a conflict as to whos version is "correct".
This situation will return a SQLSTATE 40001 "serialization failure" error.
Clients that receive this error should start the entire transaction again.
48 changes: 48 additions & 0 deletions tests/isolation-read-committed.sql
Original file line number Diff line number Diff line change
Expand Up @@ -148,3 +148,51 @@ SELECT * FROM foo;
-- 1: msg: INSERT 1
-- 2: msg: UPDATE 0
-- 1: BAR: 123

/* connection 1 */
CREATE TABLE foo (bar INT);
INSERT INTO foo (bar) VALUES (123);
START TRANSACTION;
UPDATE foo SET bar = 456;
SELECT * FROM foo;
/* connection 2 */
START TRANSACTION;
UPDATE foo SET bar = 789;
SELECT * FROM foo;
/* connection 1 */
SELECT * FROM foo;
-- 1: msg: CREATE TABLE 1
-- 1: msg: INSERT 1
-- 1: msg: START TRANSACTION
-- 1: msg: UPDATE 1
-- 1: BAR: 456
-- 2: msg: START TRANSACTION
-- 2: error 40001: serialization failure: avoiding concurrent write on individual row
-- 2: error 25P02: transaction is aborted, commands ignored until end of transaction block
-- 1: BAR: 456

/* connection 1 */
CREATE TABLE foo (bar INT);
INSERT INTO foo (bar) VALUES (123);
START TRANSACTION;
UPDATE foo SET bar = 456;
UPDATE foo SET bar = 789;
UPDATE foo SET bar = 234;
SELECT * FROM foo;
/* connection 2 */
START TRANSACTION;
UPDATE foo SET bar = 345;
SELECT * FROM foo;
/* connection 1 */
SELECT * FROM foo;
-- 1: msg: CREATE TABLE 1
-- 1: msg: INSERT 1
-- 1: msg: START TRANSACTION
-- 1: msg: UPDATE 1
-- 1: msg: UPDATE 1
-- 1: msg: UPDATE 1
-- 1: BAR: 234
-- 2: msg: START TRANSACTION
-- 2: error 40001: serialization failure: avoiding concurrent write on individual row
-- 2: error 25P02: transaction is aborted, commands ignored until end of transaction block
-- 1: BAR: 234
15 changes: 15 additions & 0 deletions tests/transaction.sql
Original file line number Diff line number Diff line change
Expand Up @@ -87,3 +87,18 @@ SELECT * FROM foo;
-- 2: msg: START TRANSACTION
-- 2: msg: DROP TABLE 1
-- 2: error 42P01: no such table: FOO

CREATE TABLE foo (b BOOLEAN);
INSERT INTO foo (b) VALUES (123, 456);
SELECT * FROM foo;
-- msg: CREATE TABLE 1
-- error 42601: syntax error: INSERT has more values than columns

START TRANSACTION;
CREATE TABLE foo (b BOOLEAN);
INSERT INTO foo (b) VALUES (123, 456);
SELECT * FROM foo;
-- msg: START TRANSACTION
-- msg: CREATE TABLE 1
-- error 42601: syntax error: INSERT has more values than columns
-- error 25P02: transaction is aborted, commands ignored until end of transaction block
2 changes: 1 addition & 1 deletion tests/update.sql
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,8 @@ SELECT * FROM foo;
-- msg: INSERT 1
-- msg: UPDATE 1
-- msg: UPDATE 0
-- BAZ: 78
-- BAZ: 100
-- BAZ: 78

CREATE TABLE foo (baz FLOAT);
UPDATE foo SET baz = true;
Expand Down
4 changes: 4 additions & 0 deletions vsql/bench.v
Original file line number Diff line number Diff line change
Expand Up @@ -84,13 +84,17 @@ fn (mut b Benchmark) run_transaction() ? {
tid := b.random(1, b.teller_rows)
delta := b.random(-5000, 5000)

b.conn.query('START TRANSACTION') ?

b.conn.query('UPDATE accounts SET abalance = abalance + $delta WHERE aid = $aid') ?
b.conn.query('SELECT abalance FROM accounts WHERE aid = $aid') ?
b.conn.query('UPDATE tellers SET tbalance = tbalance + $delta WHERE tid = $tid') ?
b.conn.query('UPDATE branches SET bbalance = bbalance + $delta WHERE bid = $bid') ?

// TODO(elliotchance): Should use CURRENT_TIMESTAMP once supported.
b.conn.query('INSERT INTO history (tid, bid, aid, delta, mtime) VALUES ($tid, $bid, $aid, $delta, \'$time.now()\')') ?

b.conn.query('COMMIT') ?
}

fn (b Benchmark) random(min int, max int) int {
Expand Down
72 changes: 66 additions & 6 deletions vsql/btree.v
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,58 @@ fn (mut p Btree) search_page(key []byte) ?([]int, []int) {
return path, depth_iterator
}

// The old and new must be provided separately because the key may change if the
// PRIMARY KEY value has changed.
fn (mut p Btree) update(old PageObject, new PageObject, tid int) ?[]int {
if p.pager.total_pages() == 0 {
// The object does not exist to update.
return []int{}
}

// If the pages are the same and we're updating an in-flight row we need to
// actually delete the existing row.
if compare_bytes(old.key, new.key) == 0 {
page_number := p.update_single(new, tid) ?

return [page_number]
}

// Otherwise we have to deal with two pages, but we're dealing with them
// differently because the operations and not symmetrical.
old_page_number := p.expire(old.key, old.tid, tid) ?
new_page_number := p.add(new) ?

mut page_numbers := [new_page_number]
if old_page_number >= 0 && old_page_number != new_page_number {
page_numbers << old_page_number
}

return page_numbers
}

fn (mut p Btree) update_single(obj PageObject, tid int) ?int {
if p.pager.total_pages() == 0 {
// The object does not exist to update.
return 0
}

// Find the page that would contain our old object (if it exists).
mut path, _ := p.search_page(obj.key) ?
page_number := path[path.len - 1]
mut page := p.pager.fetch_page(page_number) ?
previous_page_head := page.head().key.clone()
previous_root_page := p.pager.root_page()

page.update(obj, obj, obj.tid) ?
p.pager.store_page(page_number, page) ?

p.add(obj) ?

p.fix_parent_pages(previous_page_head, previous_root_page, obj, path) ?

return page_number
}

// add returns the page number that the object was added to.
fn (mut p Btree) add(obj PageObject) ?int {
// First page is a special condition.
Expand Down Expand Up @@ -129,6 +181,12 @@ fn (mut p Btree) add(obj PageObject) ?int {
p.split_page(path, &page, obj, kind_leaf) ?
}

p.fix_parent_pages(previous_page_head, previous_root_page, obj, path) ?

return left_page_number
}

fn (mut p Btree) fix_parent_pages(previous_page_head []byte, previous_root_page int, obj PageObject, path []int) ? {
// Make sure we correct the minimum bound up the tree, if needed.
if compare_bytes(obj.key, previous_page_head) < 0 {
for path_index in 0 .. path.len - 1 {
Expand Down Expand Up @@ -156,8 +214,6 @@ fn (mut p Btree) add(obj PageObject) ?int {
p.pager.store_page(p.pager.root_page(), new_root_page) ?
}
}

return left_page_number
}

fn (mut p Btree) split_page(path []int, page &Page, obj PageObject, kind byte) ? {
Expand Down Expand Up @@ -219,8 +275,8 @@ fn (mut p Btree) split_page(path []int, page &Page, obj PageObject, kind byte) ?
if path.len > 1 {
mut page3 := p.pager.fetch_page(path[path.len - 2]) ?

// 30 is the length of p1 + p2.
if page3.used >= p.page_size - 30 {
// 36 is the length of p1 + p2.
if page3.used >= p.page_size - 36 {
mut new_path := path[..path.len - 1]
p.split_page(new_path, &page3, p2, kind_not_leaf) ?
} else {
Expand Down Expand Up @@ -348,8 +404,9 @@ fn (mut p Btree) remove(key []byte, tid int) ? {
}

// expire will set the deleted transaction ID for the key, effectively making
// the object invisible to the current transaction.
fn (mut p Btree) expire(key []byte, tid int, xid int) ? {
// the object invisible to the current transaction. The page modified will be
// returned, or -1 if the object does not exist.
fn (mut p Btree) expire(key []byte, tid int, xid int) ?int {
// Find the page that will contain the key, if it exists.
mut path, _ := p.search_page(key) ?
page_number := path[path.len - 1]
Expand All @@ -358,5 +415,8 @@ fn (mut p Btree) expire(key []byte, tid int, xid int) ? {

if page.expire(key, tid, xid) {
p.pager.store_page(page_number, page) ?
return page_number
}

return -1
}
19 changes: 16 additions & 3 deletions vsql/connection.v
Original file line number Diff line number Diff line change
Expand Up @@ -133,16 +133,29 @@ fn (mut c Connection) release_read_connection() {

pub fn (mut c Connection) prepare(sql string) ?PreparedStmt {
t := start_timer()
stmt, params, explain := c.query_cache.parse(sql) ?
stmt, params, explain := c.query_cache.parse(sql) or {
c.storage.transaction_aborted()
return err
}
elapsed_parse := t.elapsed()

return PreparedStmt{stmt, params, explain, &c, elapsed_parse}
}

pub fn (mut c Connection) query(sql string) ?Result {
mut prepared := c.prepare(sql) ?
if c.storage.transaction_state == .aborted {
return sqlstate_25p02()
}

return prepared.query(map[string]Value{})
mut prepared := c.prepare(sql) or {
c.storage.transaction_aborted()
return err
}

return prepared.query(map[string]Value{}) or {
c.storage.transaction_aborted()
return err
}
}

pub fn (mut c Connection) register_func(func Func) ? {
Expand Down
Loading

0 comments on commit 51d1eea

Please sign in to comment.