batch bulk insert gaps #263

Gabilabrebi · 2023-11-30T07:54:59Z

When using batch to bulk insert, gaps in AUTO_INCREMENT IDs are created when reaching 50+ inserts even with innodb_autoinc_lock_mode set to 1 (consecutive)

The mySQL, documentation says :

With innodb_autoinc_lock_mode set to 0 (“traditional”) or 1 (“consecutive”), the auto-increment values generated by any given statement are consecutive, without gaps, because the table-level AUTO-INC lock is held until the end of the statement, and only one such statement can execute at a time.

So I'm not sure that's intended. Due to the fact that insertId, only return one ID, i think it's very important to not have gaps in IDs so we can deduce inserted ones.

rusher · 2023-12-05T14:41:32Z

Actually I'm not sure recreate the insert ids is the way to go.
If ID's are needed, better to disable bulk, setting option bulk to false. This will permit to retrieve all id's, even if slower (this is still faster than normal INSERT * number of bunch of parameter, because of pipelining).
In the near futur, this will be permitted with https://jira.mariadb.org/browse/MDEV-30366

crushton-qg · 2024-04-17T15:03:47Z

@rusher Hi, I am currently investigating an issue in our DB where we have gaps in inserted auto increment IDs using the batch function. We have managed to recreate the the bug twice, I have attached an image which shows we have run two separate batch queries with the same 5 rows of data, on both occasions the same ID gaps occurred at the same point.

I just want to check whether this is a bug in the batch function itself, or whether I need to investigate this further inside our DB / code config.

crushton-qg · 2024-04-17T15:43:08Z

@rusher forgot to add that in the table in the image above, we currently have about 400k records but the highest id is currently over 500k, meaning about a 20% difference in count and IDs

crushton-qg · 2024-04-18T10:12:27Z

@Gabilabrebi did you manage to find a fix / resolution for this?

Gabilabrebi · 2024-04-18T10:20:08Z

@crushton-qg Yeah i just stopped using batch

for (let i = 0; i < order.content.length; i++) {
const x = order.content[i];
contentInserts.push("(DEFAULT, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 0)");
contentParameters.push(place, orderId, i, x.item, x.version, x.comment, x.quantity, x.price, x.a, x.b, x.c);
}

db.query(INSERT INTO orders_contents VALUES ${contentInserts.join(",")}, contentParameters);

Something like this basically achieve the same thing for me, but without any id issues. It's still one query and performance are pretty much the same as far as i tested.

crushton-qg · 2024-04-18T15:07:21Z

@Gabilabrebi Sorry to be a pain, can you provide a full code sample please? I have tried to replicate what you have but I cannot get it to work. I am struggling to follow your sample where you are providing multiple (DEFAULT, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, 0) in your query string.

I have an array of arrays, each inner array containing the row data, i have changed table names, columns to generic name/data

I have tried adapting my query to:

            var contentInserts = [], contentParameters = [];
            for (const row of row_array)
            {
                contentInserts.push("(?,?,?,?,?,?,?,?,?)");
                contentParameters.push(row);
            }
            var dlo_rule_log_insert = await conn.query(`INSERT INTO table VALUES ${contentInserts.join(",")}`, contentParameters);

but I am getting a syntax error every time:
{
"name": "SqlError",
"sqlMessage": "Parameter at position 3619 is not set",
"sql": "INSERT INTO table VALUES (?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(?,?,?,?,?,?,?,?,?),(...",
"fatal": false,
"errno": 45016,
"sqlState": "HY000",
"code": "ER_MISSING_PARAMETER"
}

Gabilabrebi · 2024-04-18T15:42:55Z

@crushton-qg I guess you just need to change

contentParameters.push(row);
to
contentParameters.push(...row);

crushton-qg · 2024-04-18T16:01:06Z

@Gabilabrebi that was it, brilliant thanks for the help!

Gabilabrebi · 2024-05-28T10:58:49Z

@rusher Does using bulk = false as today ensure there is not gap between ids?

Is there any documentation on this option and what it does exactly?

I'm trying to use .batch again but still need contiguous ids

Regards

rusher · 2024-05-29T13:25:58Z

This is a little more complex than that, i wouldn't rely on id's without gap, servers ensure uniqueness, not strict "sequencing".

Server have AUTO_INCREMENT defaulting to 1 that indicate the increment between each IDs.
For a single server, after each INSERT command, IDs will increment by one.
This is not always the case, for galera servers for example (multi-master), each server has an increment offset, to permit avoiding to synchronized ID's current value on each INSERT command. For example for 3 galera servers,
2 INSERT on server A (offset 0), 4 on server B(offset +1) and 1 on SERVER C(offset +2):
ID's created on server A will be 1 and 4.
ID's created on server B will be 2, 5 and 8.
ID's created on server C will be 3.
So you'll have the resulting ids 1,2,3,4,5,8. synchronized, next id on server A will be 10.

The bulk option correspond to real batch command. I'm not completly sure about how this is implemented for bulk, server guys would be more able to help with the details, but what i suppose is that in order to deals with connections inserting to the same table, server can reserve a block of some ids when using bulk and that might result in gap in ids.
Like, a server might reserved a block of 128 IDs for example and bulk will finally only result in 100 inserts, and during that time, other INSERTs might have been done. This is always for the sake of not blocking other connections if there is no need.

So, yes, if using only one server, no galera, without bulk, you will normally not have any gap, but i would clearly nor rely on that.

Gabilabrebi · 2024-05-29T13:41:11Z

@rusher Thank you for your answer.

In my case I've table A, B, and C.
I insert many things in A, many things for each A in B with a reference to A.id, and many things in C for each B with a reference to b.id

It's currently 3 queries, one for A, for B, and one for C, but yes i rely on contiguous id.

If i shouldn't rely on contiguous ids, do I have to make tones of single query in loops to ensure to proper relation? Or is there any better alternative?

rusher · 2024-05-30T09:09:54Z

There is various solutions, but all of them rely on knowing the real insert id.
for example, you can use a sequence to handle yourself id's, another solution is just take in account auto increment ids.

for batch, an option fullResult permit to retrieve all ids:
Here is an example (could be simplified, i just write it quickly):

    await shareConn.query('DROP TABLE IF EXISTS B');
    await shareConn.query('DROP TABLE IF EXISTS A');

    const jsonVal = {
      Alan: ['Al'],
      Archibald: ['Archie', 'Baldo'],
      Benjamin: ['Ben', 'Bennie', 'Benjy', 'Jamie']
    };

    await shareConn.query('CREATE TABLE A(id int not null primary key auto_increment, val varchar(20))');
    await shareConn.query(
      'create table B (id int not null primary key auto_increment, nickname varchar(20), id_refA int, foreign key (id_refA) references A(id))'
    );

    const keys = Object.keys(jsonVal).map((x) => [x]);
    //keys : [ [ 'Alan' ], [ 'Archibald' ], [ 'Benjamin' ] ]

    const res = await shareConn.batch({ sql: 'INSERT INTO A(val) value (?)', fullResult: true }, keys);
    // res= [
    //   OkPacket { affectedRows: 1, insertId: 1n, warningStatus: 0 },
    //   OkPacket { affectedRows: 1, insertId: 2n, warningStatus: 0 },
    //   OkPacket { affectedRows: 1, insertId: 3n, warningStatus: 0 }
    // ]
    const aliases = keys
      .map((x, idx) => {
        const key = x[0];
        return jsonVal[key].map((alias) => [alias, res[idx].insertId]);
      })
      .flat();
    console.log(aliases);
    // aliases = [
    //   [ 'Al', 1n ],
    //   [ 'Archie', 2n ],
    //   [ 'Baldo', 2n ],
    //   [ 'Ben', 3n ],
    //   [ 'Bennie', 3n ],
    //   [ 'Benjy', 3n ],
    //   [ 'Jamie', 3n ]
    // ]

    const res2 = await shareConn.batch(
      { sql: 'INSERT INTO B(nickname, id_refA) value (?, ?)', fullResult: true },
      aliases
    );

Gabilabrebi · 2024-05-30T10:01:55Z

OH i didn't know about this fullResult: true option.

This is exactly what i was looking for. With this, no more need for contiguous ids. I just add this option when i want ids, and can still benefit full speed of batch when i don't need them.

Thanks a ton!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

batch bulk insert gaps #263

batch bulk insert gaps #263

Gabilabrebi commented Nov 30, 2023 •

edited

Loading

rusher commented Dec 5, 2023

crushton-qg commented Apr 17, 2024

crushton-qg commented Apr 17, 2024

crushton-qg commented Apr 18, 2024

Gabilabrebi commented Apr 18, 2024 •

edited

Loading

crushton-qg commented Apr 18, 2024 •

edited

Loading

Gabilabrebi commented Apr 18, 2024

crushton-qg commented Apr 18, 2024

Gabilabrebi commented May 28, 2024

rusher commented May 29, 2024

Gabilabrebi commented May 29, 2024

rusher commented May 30, 2024

Gabilabrebi commented May 30, 2024

batch bulk insert gaps #263

batch bulk insert gaps #263

Comments

Gabilabrebi commented Nov 30, 2023 • edited Loading

rusher commented Dec 5, 2023

crushton-qg commented Apr 17, 2024

crushton-qg commented Apr 17, 2024

crushton-qg commented Apr 18, 2024

Gabilabrebi commented Apr 18, 2024 • edited Loading

crushton-qg commented Apr 18, 2024 • edited Loading

Gabilabrebi commented Apr 18, 2024

crushton-qg commented Apr 18, 2024

Gabilabrebi commented May 28, 2024

rusher commented May 29, 2024

Gabilabrebi commented May 29, 2024

rusher commented May 30, 2024

Gabilabrebi commented May 30, 2024

Gabilabrebi commented Nov 30, 2023 •

edited

Loading

Gabilabrebi commented Apr 18, 2024 •

edited

Loading

crushton-qg commented Apr 18, 2024 •

edited

Loading