Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add batch bulk insert, refactor insert #14

Draft
wants to merge 4 commits into
base: dev-0.5.0
Choose a base branch
from

Conversation

Sylvyrfysh
Copy link
Contributor

This begins adding batch and bulk insert operations, which on data size of 250 items, increase performance by 5% and 5000%, respectively. Since bulk does things so much better, I considered dropping batch, but it may be necessary in some situations I do not yet know about.

I have marked this as not ready for pull for 2 reasons:

  1. Tests are not complete
  2. These are probably not where they should be yet. I kept them alongside the other inserts, but they're really not on one model.

Let me know if you'd like anything changed, or feel free to make them yourself.

@Sylvyrfysh
Copy link
Contributor Author

This is built on my other request branch, so it needs to be pulled after that or refactored.

@Sylvyrfysh
Copy link
Contributor Author

In my testing, if using HikariCP, batch is by far the fastest.

@vsch
Copy link
Owner

vsch commented Oct 12, 2019

@Sylvyrfysh, this is awesome. Give me until tomorrow to merge it in. I was doing some fire hose cleanup in 0.5.0 thinking I have a day or two before releasing a new version.

I left some original code from KotliQuery and focused on adding much needed features. Now going through the code I decided it was time to do some cleanup of its limitations and unnecessary overheads, take care of kludges I added and add some syntactic sugar (I thought) which turned out to be an improved parameter type preservation mechanism.

I too found that listQuery generated wrong SQL. The code looked like unfinished cut/paste with a few places failing to pass quote or alias to downstream implementations. Really need to add tests for all methods and parameter variations. I will focus more on tests.

Here is the extract from version history for the version committed to dev-0.5.0:

0.5.0-beta-8

  • Fix: SqlCall out params are now passed by index instead of name to make them independent of actual sql procedure parameter names.

  • Fix: param() to work for non-null values when calling context's capture type is *, previous implementation would always result in Any type, fixed implementation uses the value's class for type's class.

  • Fix: param() to not construct a new instance if passed value is already Parameter<*>

  • Fix: SQL generation for model listQuery with where clause and/or alias

  • Add: InOut type to Parameter(), defaults to InOut.IN so parameter carries direction of parameter and type.

  • Add: inTo, outTo and inOutTo infix functions to use instead of to for param creation. These create parameters with in/inOut/out type eliminating the need for multiple params() functions for directional information. Default direction is in if not provided.

    They also create an instance of Parameter when the captured type is known. Functions using param() from a collection iteration have * for type capture and loose actual type, resorting to Any (ie. Object).

    These also have Collection<T> versions which create a Parameter() with a collection value but type is T::class.java. When generating parameters for prepared stmt this type is used to determine sql type for the element of the collection instead of generic Any as was before, again because of * captured type from a collection.

  • Fix: extract SqlQueryBase<T> for SqlQuery and SqlCall to eliminate the need to override function just to cast them to the right class. SqlCall no longer extends SqlQuery.

  • Fix: SqlQueryBase now stores each parameter in Parameter() data class format. Eliminates creating a new instance with param() since if it is already parameter it just returns the instance.

  • Fix: deprecate old methods in favor of directional parameter declaration.

  • Fix: change Session methods to use SqlQueryBase<*> when either SqlQuery or SqlCall can be used. For some, like updates with get keys, SqlCall never returns any keys even when stored procedure executes DML that does, so these are now SqlQuery only. Similarly, those only applicable to SqlCall are now SqlCall typed and will not take SqlQuery

  • Fix: deprecate Session.forEach(SqlCall, (stmt: CallableStatement) -> Unit, (rs: ResultSet, index: Int) -> Unit)

  • Add: Session.executeCall(SqlCall, (results: SqlCallResults) -> Unit) to replace Session.forEach(SqlCall, (stmt: CallableStatement) -> Unit, (rs: ResultSet, index: Int) -> Unit). Use SqlCallResults.forEach to process result sets returned by the procedure call. To get values of out params use SqlCallResults.get{Type}(paramName) to get non-null values or SqlCallResults.get{Type}OrNull(paramName) to get nullable values.

    For example, when using old forEach the code was:

    val sqlQuery = sqlCall("""CALL processInstances(:clientId, :types)""")
      .inParams("clientId" to 35)
      .inOutParams("types" to "")
    
    val jsonResult = MutableJsObject()
    var types: List<String> = listOf()
    
    session.forEach(sqlQuery, { stmt ->
        types = stmt.getString("types").split(',')
    }) { rs, index ->
        val key = types[index]
        jsonResult[key] = MutableJsArray(Rows(rs).map(toJsonObject).toList())
    }

    Using the new executeCall and directional param declarations, the code changes to:

    val sqlQuery = sqlCall("""CALL processInstances(:clientId, :types)""",
              mapOf("clientId" inTo 35, "types" inOutTo ""))
    
    val jsonResult = MutableJsObject()
    
    session.executeCall(sqlQuery) { results:SqlCallResults ->
        val types = results.getString("types").split(',')
        results.forEach { rs, index ->
            val key = types[index]
            jsonResult[key] = MutableJsArray(Rows(rs).map(toJsonObject).toList())
        }
    }

I will continue working on it tomorrow and merge your changes into the code.

@Sylvyrfysh
Copy link
Contributor Author

Sounds good! I am no SQL expert myself. and will trust that you'll update things as they come. I'm mostly adding things that are of use in my Database Systems class for this one project, and will continue contributing as long as I can. I've found a couple of issues around HikariCP and default schemas that I might contribute if I can both find the issue and fix it, otherwise I'll open an issue. Thank you for creating as much as you have though, this is much nicer than a standard Java API!

@vsch
Copy link
Owner

vsch commented Oct 13, 2019

I'm mostly adding things that are of use in my Database Systems class for this one project

This is how this project got started and evolved. I did it originally trying to convince a client to ditch Scala Slick because it is to SQL what pig-latin is to English. Unfortunately they did not go for it thinking it will delay the project. Reality is the reverse and in the last two years I wasted a lot of (billable) time trying to figure out how to do a simple SQL query in Scala Slick. In the end it always comes out undecipherable gibberish only a Scala disciple can unravel.

I appreciate all input and contributions. My time of these days is scarce and I wish I could spend more on it. Kotlin has enough flexibility to make a very intuitive SQL API.

@vsch
Copy link
Owner

vsch commented Dec 17, 2019

@Sylvyrfysh, I will add my comments and questions. I like the idea because bulk and batch inserts work so much faster. This is especially noticeable over remote DB connections.

However, I am not sure if the current implementation addresses the general use cases.

@Sylvyrfysh
Copy link
Contributor Author

I don't know that it does, but with 0.5.0 and releasing the companion, I'm not sure of a better way

@vsch
Copy link
Owner

vsch commented Dec 19, 2019

@Sylvyrfysh, I will think about it but can see one way is to group the model instances by matched insert column list. Then all the models in the group can be bulk inserted and the code will work for all model definitions.

I am not sure about the batch insert since I never used it but is this not just a driver optimization to send multiple inserts as a single request or does it too have one insert column list followed by multiple sets of insert values?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants