Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

insertBatch to generate multiple INSERT statements #888

Open
samdark opened this issue Oct 17, 2024 · 8 comments
Open

insertBatch to generate multiple INSERT statements #888

samdark opened this issue Oct 17, 2024 · 8 comments
Labels
status:ready for adoption Feel free to implement this issue. type:feature New feature

Comments

@samdark
Copy link
Member

samdark commented Oct 17, 2024

Right now, if the batch is too big, we're getting into limits of DBMS like the following:

image

The screenshot is from PostgreSQL.

I suggest doing two things:

  1. Introduce configurable parameter to configure max number of parameters.
  2. Default it per DBMS. For PostgreSQL it could be 65535.
  3. If max is reached, move the rest of the data to the next INSERT statement.
@samdark samdark added status:ready for adoption Feel free to implement this issue. type:feature New feature labels Oct 17, 2024
@rob006
Copy link
Contributor

rob006 commented Oct 18, 2024

That will change the characteristic of this operation: from atomic to non-atomic.

@samdark
Copy link
Member Author

samdark commented Oct 18, 2024

You can't make it atomic (except transactions) in case of exceeding max number of parameters anyway.

@rob006
Copy link
Contributor

rob006 commented Oct 18, 2024

Yest, but implicit changing atomic transaction to non-atomic is just a massive footgun. This should be explicitly enabled by programmer who makes the call, so developer will know that this batch insert may result multiple queries.

Also, splitting one massive insert to multiple smaller ones is useful not only to handle parameters limit. I generally avoid really big inserts as they may block table for other queries, so system may be more responsive if you do 10 small inserts instead of one big one. Having a method that would do this for you would be useful even if I don't exceed parameters limit in your queries.

@samdark
Copy link
Member Author

samdark commented Oct 18, 2024

Makes sense. Default value of such option must be null then to disable splitting.

@Tigrov
Copy link
Member

Tigrov commented Oct 19, 2024

Good point, but

  1. It is better if it works by default. Currently it does not work.
  2. The documentation should mention this.

By default it is better to specify driver specific max value and allow the developer to change it.

@samdark
Copy link
Member Author

samdark commented Oct 19, 2024

Either way is fine for me, if we'll put a warning about it being atomic or not in the docs.

@rob006
Copy link
Contributor

rob006 commented Oct 19, 2024

By default it is better to specify driver specific max value and allow the developer to change it.

To be clear: I was talking about limit based on number of inserted rows, not number of params used by query. If I want to insert 50k records in 50 queries (1k records per query), then limit I need to pass to the function should 1000.

@samdark
Copy link
Member Author

samdark commented Oct 20, 2024

Yes. That sounds useful as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status:ready for adoption Feel free to implement this issue. type:feature New feature
Projects
None yet
Development

No branches or pull requests

3 participants